It all started with DeepSeek-R1-Zero, which learned to show its work step-by-step and only got rewarded for correct answers. After thousands of tries, it boosted its score on the 2024 American Invitational Mathematics Examination from just 15.6% to an impressive 86.7%.

R1 can now tackle tough tasks in math, coding, and more

R1-Zero even started "thinking out loud" and correcting itself mid-way—kind of like saying "wait" when you catch a mistake.

By learning from answers alone, R1 skips the need for tons of expensive human-labeled data and can handle tough tasks in math, coding, and beyond—all on its own.