Summarize

OpenAI's model achieves gold medal-level performance at International Mathematical Olympiad

By Akash Pandey

Jul 20, 2025

05:56 pm

What's the story

OpenAI has announced that its latest experimental model has achieved 'gold medal-level performance' in the International Mathematical Olympiad (IMO), one of the world's most prestigious math competitions. The achievement was announced by Alexander Wei, a research scientist on OpenAI's Reasoning team, who shared the news on X. Wei emphasized that their models were tested under the same conditions as human participants.

Model performance

Model tackled 5 out of 6 problems from 2025 IMO

Wei revealed that the experimental model tackled five out of six problems in the 2025 IMO. The scores were finalized by a unanimous consensus from IMO medalists who graded it, with the model earning a total of 35/42 points, just enough for gold. This is a major achievement considering that IMO problems are usually based on four broad topics: Algebra, Combinatorics, Number Theory, and Geometry.

Challenge

AI model graded by human experts

The IMO problems are known for their difficulty and require creative problem-solving skills. There are no clear-cut, verifiable answers, which means the artificial intelligence (AI) model has to come up with "intricate, watertight arguments" like human mathematicians. This is a major leap in the capabilities of AI models, which have mostly been known for handling large amounts of data and repetitive tasks until now.

AI comparison

AlphaGeometry2 solved problems from last 25 years of IMO

While OpenAI's achievement is unprecedented, it's worth noting that in January 2025, Google DeepMind's AlphaGeometry2 model solved 42 out of 50 problems from the last 25 years of IMO. That was above the average gold medalist score of 40.9. However, no other tech giant has yet claimed a similar feat with this year's IMO problems as OpenAI has done.

Skepticism

LLMs are usually not good at logical reasoning

Madhavan Mukund, Director and Professor of Computer Science at Chennai Mathematical Institute, noted that large language models (LLMs) are usually not good at logical reasoning. He said there have been cases where LLMs have struggled with puzzles when the question was slightly twisted. Despite this skepticism, Mukund acknowledged that the model's ability to tackle such complex problems is a major advancement and surprise in the field of AI.

Future plans

OpenAI doesn't plan to release anything with this level

Wei also clarified that the IMO gold LLM is an experimental research model and OpenAI doesn't plan to release anything with this level of math capability "for several months." This means while the achievement is a major milestone in AI development, it will be some time before we see practical applications of such advanced mathematical problem-solving capabilities in real-world scenarios.