Anthropic's latest AI beats Gemini 3 Pro in coding tasks

By Mudit Dube

Nov 25, 2025

09:46 am

What's the story

Anthropic has launched its latest artificial intelligence (AI) model, Claude Opus 4.5. The company claims the new model outperforms industry leaders such as Google's Gemini 3 Pro and OpenAI's GPT-5.1 in coding tasks. It also scored higher than any human candidate on a notoriously difficult exam for prospective engineering employees, highlighting its advanced capabilities and potential impact on engineering tasks. Opus 4.5 is the first model to score more than 80% on SWE-Bench verified, a respected coding benchmark.

Performance metrics

Claude Opus 4.5 excels in vision, reasoning, and math

The new model also outperforms previous Anthropic models in vision, reasoning, and math. It has achieved state-of-the-art performance in tasks like agentic tool use and computer use.

In one test scenario where it had to act as an automated airline agent handling a customer's request to change their basic economy flight, Claude Opus 4.5 showcased its creative problem-solving skills by finding a loophole within the policy.

User experience

Claude Opus 4.5's 'endless chat' feature

The new model also comes with an "endless chat" feature for paid users of Claude.

This will let conversations continue seamlessly even when the model reaches its context window.

Instead of notifying the user, it will compress its context memory automatically.

The improvements are aimed at agentic use cases where Opus serves as a lead agent directing a team of Haiku-powered sub-agents.

Market presence

Claude for Chome reaches more users

Along with Opus 4.5, Anthropic is also making its Claude for Chrome and Claude for Excel products more widely available.

The Chrome extension will be accessible to all Max users while the Excel-focused model will be available to Max, Team, and Enterprise users.

Opus 4.5 will face tough competition from other recently launched frontier models such as OpenAI's GPT-5.1 and Google's Gemini 3.