
Gemini 2.5 Pro just got better at coding and math
What's the story
Performance boost
Achieving score of 82.2% on Aider Polyglot test
The previous Gemini 2.5 Pro model, also known as the I/O Edition or simply 05-06, was mainly focused on improving coding capabilities.
The new version has improved its code generation skills further, achieving a new high score of 82.2% on the Aider Polyglot test.
This score surpasses those achieved by OpenAI, Anthropic, and DeepSeek by a comfortable margin.
User response
More creative and better-formatted responses
The new model also promises to be more creative and provide better-formatted responses.
This comes as a response to user feedback on the performance of Gemini 2.5 Pro outside coding tasks after the major 03-25 update.
Google says this version "closes [the] gap on 03-25 regressions," indicating that it has taken user feedback into account while making these improvements.
New feature
Ready for enterprise-scale applications
The latest update also brings configurable thinking budgets for developers, a feature that was first introduced with the general-purpose Gemini 2.5 Flash model.
This allows developers to customize how much time a Gemini thinking model can spend on a request within certain token limits.
The move is aimed at keeping costs down for developers using the model in their applications.
Benchmark success
Gemini 2.5 Pro's top performance on AI benchmarks
The updated Gemini 2.5 Pro model continues to dominate AI benchmarks, leading both the LMArena and WebDevArena leaderboards.
Google has also highlighted that this version is cheaper to run per token than many other comparable thinking models.
The company says the upgraded preview of Gemini 2.5 Pro is "ready for enterprise-scale applications," making it easier for developers to integrate it into their processes once publicly released.
Twitter Post
It leads LMArena with a 24-point Elo score jump
Our latest Gemini 2.5 Pro update is now in preview.
— Sundar Pichai (@sundarpichai) June 5, 2025
It’s better at coding, reasoning, science + math, shows improved performance across key benchmarks (AIDER Polyglot, GPQA, HLE to name a few), and leads @lmarena_ai with a 24pt Elo score jump since the previous version.
We also… pic.twitter.com/SVjdQ2k1tJ