Summarize

Why Amazon believes chasing AI benchmark scores is misleading

By Mudit Dube

Dec 03, 2025

11:10 am

What's the story

Amazon's Senior Vice President of AGI, Rohit Prasad, has questioned the importance of model benchmarks in the AI industry. Speaking ahead of today's announcements at AWS re:Invent in Las Vegas, Prasad said, "I want real-world utility. None of these benchmarks are real." His comments come as a departure from other AI labs that often highlight their new models' performance on leaderboards.

Standardization

Prasad emphasizes the need for standardized benchmarking

Prasad stressed that real benchmarking can only be achieved if everyone uses the same training data and completely held-out evaluations. He said, "The evals are frankly getting noisy, and they're not showing the real power of these models." Amazon's critique is largely valid. Benchmarks have value—for quick comparisons, research, baseline checks—but they are insufficient to evaluate whether an AI model is ready for real-world deployment, especially in critical or specialized domains.

Innovation

Amazon's Nova Forge: A game-changer in AI model training

Prasad also highlighted Amazon's new Nova Forge service enables companies to train custom AI models without spending billions. He said that Forge gives access to Amazon's Nova model checkpoints at pre-training, mid-training, and post-training stages. This way, companies can inject their proprietary data early in the process when the model's "learning capacity is highest," instead of just tweaking behavior at the end.

Revolution

Forge's potential to revolutionize AI model training

Prasad said, "What we have done is democratize AI and frontier model development for your use cases at fractions of what it would cost [before]." He added that Forge was created because Amazon's internal teams wanted a tool to inject their domain expertise into a base model without having to build from scratch. This highlights the potential of Nova Forge to revolutionize AI model training and customization.

Success story

Reddit's successful use of Nova Forge for AI model training

Reddit has been using Nova Forge to create custom safety models trained on community moderation data. Chris Slowe, Reddit's CTO and first employee, said they ran a continued pre-training job last week that looks "really promising." The goal is to replace multiple bespoke safety models with a single Reddit-expert model that understands the nuances of community moderation.