Summarize

AI chatbots get 33% of answers wrong, Google says

By Mudit Dube

Dec 17, 2025

03:59 pm

What's the story

Google has released a stark evaluation of the reliability of modern AI chatbots, and the results are not reassuring. The tech giant used its new FACTS Benchmark Suite to find that even the best-performing AI models fail to achieve a factual accuracy rate of over 70%. In simple terms, today's chatbots give an incorrect answer about one-third of the time, even if they sound completely confident.

Performance comparison

Google's Gemini 3 Pro leads with 69% accuracy

Google's own AI model, Gemini 3 Pro, topped the chart with an overall accuracy of 69%. Other major systems from OpenAI, Anthropic, and xAI didn't fare as well. This underscores a point many researchers have been quietly making for years: fluency is not the same as truth. The FACTS Benchmark Suite was created by Google's FACTS team in collaboration with Kaggle and focuses on four real-world use cases.

Evaluation criteria

FACTS Benchmark Suite's unique approach to AI evaluation

Unlike most AI assessments that focus on task completion, the FACTS Benchmark Suite asks a more uncomfortable question: is the information actually correct? This distinction is crucial for industries like finance, healthcare, journalism, and law. A confident but incorrect answer can lead to bad decisions or even regulatory trouble. The suite evaluates parametric knowledge, search performance, grounding capabilities, and multimodal understanding of models.

Performance disparity

Results varied across different categories

The results of the FACTS Benchmark Suite varied greatly across different categories. After Gemini 3 Pro, Gemini 2.5 Pro and OpenAI's ChatGPT-5 scored around 62% accuracy while Claude 4.5 Opus and Grok 4 scored around 51% and near 54%, respectively. Multimodal tasks were consistently the weakest area with many models scoring below 50% accuracy, which could easily go unnoticed by users who assume reliability from these systems.

Future outlook

Google's conclusion: AI chatbots are improving but need oversight

Despite the disappointing results, Google remains optimistic about the future of AI chatbots. The tech giant acknowledges that these systems are improving and proving to be useful. However, it also stresses the importance of human oversight, strong guardrails, and a healthy dose of skepticism in ensuring their reliability. This balanced approach is crucial for navigating potential pitfalls in the evolution of AI technology.