LOADING...
Summarize
How OpenAI's GPT-5 performs against other frontier AI models
GPT-5 achieved a 74.9% success rate on SWE-bench Verified

How OpenAI's GPT-5 performs against other frontier AI models

Aug 07, 2025
11:59 pm

What's the story

OpenAI has officially unveiled its latest AI model, GPT-5. It is being called a "unified" model that combines the reasoning capabilities of the company's o-series with the quick responses of its GPT series. This next-gen model marks a major shift for ChatGPT and OpenAI, as they look to create AI systems that act more like agents than traditional chatbots.

Model features

It can develop software applications, manage calendars

Unlike its predecessor, GPT-4, which provided intelligent responses to a wide range of questions, the new model can perform a variety of tasks for users. These tasks include software application development, calendar management, and research brief creation. To improve user experience further, GPT-5 comes with a real-time router that adjusts its response speed based on the complexity of the query.

CEO's statement

'Best model in the world'

OpenAI CEO Sam Altman has called GPT-5 "the best model in the world," and a "significant step" toward artificial general intelligence (AGI). He said, "Having something like GPT-5 would be pretty much unimaginable at any previous time in history." The company is making GPT-5 available as the default model for all free users of ChatGPT, marking a major shift in its strategy to democratize access to advanced AI.

Comparison

Outperforming leading models on key benchmarks

OpenAI claims GPT-5 outperforms or is on-par with leading AI models from Anthropic, DeepMind, and xAI on key benchmarks. It excels at coding tasks, especially "vibe coding," where entire software applications can be created on-demand. However, it is noted for its strong performance in creative design and writing, exhibiting "better taste" than other models.

Benchmarks

Take a look at the scores

On SWE-bench Verified, GPT-5 achieved a 74.9% success rate on its first try, narrowly edging out Anthropic's Claude Opus 4.1 at 74.5%, and significantly outperforming DeepMind's Gemini 2.5 Pro at 59.6%. In Humanity's Last Exam, GPT-5 Pro (the version with extended reasoning and tool use) scored 42%, slightly below xAI's Grok 4 Heavy, which reached 44.4%. On GPQA Diamond, GPT-5 Pro scored 89.4% on its first attempt, surpassing Claude Opus 4.1's 80.9% and narrowly beating Grok 4 Heavy's 88.9%.

Hallucinations

More accurate than its predecessor

While AI chatbots aren't medical experts, they're increasingly being used for health advice. OpenAI claims GPT-5 offers improved performance on health-related questions. In the HealthBench Hard Hallucinations test, GPT-5 (with reasoning enabled) showed a hallucination rate of just 1.6%, a sharp drop from GPT-4o and o3, which scored 12.9% and 15.8%, respectively. In responses to ChatGPT prompts, GPT-5 (with reasoning) hallucinates 4.8% of the time, significantly lower than o3 (22%) and GPT-4o (20.6%).

Safety and more

GPT-5 comes in 3 sizes

OpenAI says GPT-5 is safer than its previous models, showing less tendency to deceive or scheme against humans. The company also says it can better differentiate between bad actors and harmless users, allowing it to refuse more unsafe questions while rejecting fewer harmless ones. For developers, GPT-5 will be available in three sizes: GPT-5, GPT-5 mini, and GPT-5 nano, each with different reasoning times through tasks.

Rollout

It will be available to all users

GPT-5 is rolling out to all ChatGPT users. ChatGPT Plus subscribers ($20/month) get higher usage limits for GPT-5 compared to free users. Those on the $200/month Pro plan receive unlimited access to GPT-5, along with GPT-5 Pro, a more powerful variant that leverages extra computational resources for enhanced responses. Starting next week, GPT-5 will also become the default model for users on OpenAI's Team, Edu, and Enterprise plans.