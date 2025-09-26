OpenAI has released a new benchmark, GDPval, to test its AI models against human professionals across different industries and jobs. The test is an early attempt at understanding how close OpenAI's systems are to outperforming humans in economically valuable work. The company claims that its GPT-5 model and Anthropic's Claude Opus 4.1 are already approaching the quality of work produced by industry experts.

Test limitations Benchmark based on 9 industries and 44 occupations Despite some predictions of AI replacing human jobs in a few years, OpenAI acknowledges that GDPval today only covers a limited number of tasks people do in their real jobs. The benchmark is based on nine industries contributing the most to America's gross domestic product (GDP), including healthcare, finance, manufacturing and government. It tests an AI model's performance across 44 occupations within these industries from software engineers to nurses and journalists.

Benchmarking process GPT-5 and Claude Opus 4.1 scored better than humans For the first version of the test, GDPval-v0, OpenAI asked experienced professionals to compare AI-generated reports with those produced by other professionals and choose the best one. The company then averaged an AI model's "win rate" against human reports across all 44 occupations. In this initial test, GPT-5-high was ranked as better than or on par with industry experts 40.6% of the time while Anthropic's Claude Opus 4.1 scored even higher at 49%.

Future developments OpenAI plans to expand GDPval to cover more tasks OpenAI acknowledges that most working professionals do more than just submit research reports to their boss, which is all that GDPval-v0 tests for. The company plans to create more robust tests in the future that can account for more industries and interactive workflows. Despite these limitations, OpenAI sees the progress on GDPval as significant and believes it suggests people in these jobs can now use AI models for higher-value tasks.