Google's AI Overviews wrong 10% of the time: Report
What's the story
A recent analysis by The New York Times has revealed that Google's AI Overviews, a feature powered by the company's Gemini model, is wrong 10% of the time. This means that for every 10 questions asked on Google Search, one answer could be incorrect. Given the billions of searches conducted daily, this translates to hundreds of thousands of incorrect responses every minute.
Tool overview
What is AI Overviews and how does it work?
AI Overviews, which debuted in 2024, is a tool that quickly summarizes information from various sources. It appears at the top of Google's search results page and is powered by Gemini, Google's advanced AI model. The feature has been criticized for its inconsistent accuracy since its launch. However, a recent analysis found that it provides correct answers 90% of the time.
Testing process
How did the New York Times test AI Overviews?
The New York Times partnered with Oumi, a start-up that specializes in building AI models, to test the accuracy of AI Overviews. Oumi used AI tools to run the SimpleQA evaluation on AI Overviews. This test, developed by OpenAI in 2024, consists of over 4,000 questions with verifiable answers that can be fed into an AI system.
Model evolution
Initial tests show slight improvement in accuracy over time
The initial tests with AI Overviews, when Gemini 2.5 was still Google's top model, showed an accuracy rate of 85%. However, after the release of the Gemini 3 update, this number improved to 91%. This means that while the tool has gotten better at providing accurate information over time, there's still a margin for error.
Misinformation impact
Tens of millions of incorrect answers every day
The 10% error rate in AI Overviews could mean tens of millions of incorrect answers are being generated every day. The New York Times' analysis highlighted several instances where the tool provided wrong information. For example, when asked about the date Bob Marley's home became a museum, it provided contradictory years from Wikipedia and chose the wrong one.
Company stance
Google pushes back against findings
In light of the report, Google has pushed back against the findings. A company spokesperson said that they believe SimpleQA contains incorrect information and their model evaluations often rely on a similar test called SimpleQA Verified. This uses a smaller set of questions that have been more thoroughly vetted. This study has serious holes, the spokesperson told The New York Times.