Summarize

GPT-5 matches human experts in key industries, OpenAI claims

By Mudit Dube

Sep 26, 2025

11:30 am

What's the story

OpenAI has released a new benchmark, GDPval, to test its AI models against human professionals across different industries and jobs. The test is an early attempt at understanding how close OpenAI's systems are to outperforming humans in economically valuable work. The company claims that its GPT-5 model and Anthropic's Claude Opus 4.1 are already approaching the quality of work produced by industry experts.

Test limitations

Benchmark based on 9 industries and 44 occupations

Despite some predictions of AI replacing human jobs in a few years, OpenAI acknowledges that GDPval today only covers a limited number of tasks people do in their real jobs. The benchmark is based on nine industries contributing the most to America's gross domestic product (GDP), including healthcare, finance, manufacturing and government. It tests an AI model's performance across 44 occupations within these industries from software engineers to nurses and journalists.

Benchmarking process

GPT-5 and Claude Opus 4.1 scored better than humans

For the first version of the test, GDPval-v0, OpenAI asked experienced professionals to compare AI-generated reports with those produced by other professionals and choose the best one. The company then averaged an AI model's "win rate" against human reports across all 44 occupations. In this initial test, GPT-5-high was ranked as better than or on par with industry experts 40.6% of the time while Anthropic's Claude Opus 4.1 scored even higher at 49%.

Future developments

OpenAI plans to expand GDPval to cover more tasks

OpenAI acknowledges that most working professionals do more than just submit research reports to their boss, which is all that GDPval-v0 tests for. The company plans to create more robust tests in the future that can account for more industries and interactive workflows. Despite these limitations, OpenAI sees the progress on GDPval as significant and believes it suggests people in these jobs can now use AI models for higher-value tasks.

Model comparison

Progress on GDPval is encouraging for OpenAI

OpenAI's evaluations lead Tejal Patwardhan is encouraged by the progress on GDPval. She noted that GPT-4o, released about 15 months ago, scored just 13.7% (wins and ties versus humans). Now, with GPT-5 scoring nearly triple that number, Patwardhan expects this trend to continue in the future.