Why Amazon believes chasing AI benchmark scores is misleading
What's the story
Amazon's Senior Vice President of AGI, Rohit Prasad, has questioned the importance of model benchmarks in the AI industry. Speaking ahead of today's announcements at AWS re:Invent in Las Vegas, Prasad said, "I want real-world utility. None of these benchmarks are real." His comments come as a departure from other AI labs that often highlight their new models' performance on leaderboards.
Standardization
Prasad emphasizes the need for standardized benchmarking
Prasad stressed that real benchmarking can only be achieved if everyone uses the same training data and completely held-out evaluations. He said, "The evals are frankly getting noisy, and they're not showing the real power of these models." Amazon's critique is largely valid. Benchmarks have value—for quick comparisons, research, baseline checks—but they are insufficient to evaluate whether an AI model is ready for real-world deployment, especially in critical or specialized domains.
Innovation
Amazon's Nova Forge: A game-changer in AI model training
Prasad also highlighted Amazon's new Nova Forge service enables companies to train custom AI models without spending billions. He said that Forge gives access to Amazon's Nova model checkpoints at pre-training, mid-training, and post-training stages. This way, companies can inject their proprietary data early in the process when the model's "learning capacity is highest," instead of just tweaking behavior at the end.
Revolution
Forge's potential to revolutionize AI model training
Prasad said, "What we have done is democratize AI and frontier model development for your use cases at fractions of what it would cost [before]." He added that Forge was created because Amazon's internal teams wanted a tool to inject their domain expertise into a base model without having to build from scratch. This highlights the potential of Nova Forge to revolutionize AI model training and customization.
Success story
Reddit's successful use of Nova Forge for AI model training
Reddit has been using Nova Forge to create custom safety models trained on community moderation data. Chris Slowe, Reddit's CTO and first employee, said they ran a continued pre-training job last week that looks "really promising." The goal is to replace multiple bespoke safety models with a single Reddit-expert model that understands the nuances of community moderation.