Google's new benchmark shows AI models still miss the mark on facts

Technology Dec 17, 2025

Google just launched the FACTS Benchmark Suite with Kaggle to see how well AI models stick to the facts.
The test covers four areas—internal knowledge, web search, document checking, and image interpretation—and used over 3,500 examples.
No model managed to get above 70% accuracy.

Gemini 3 Pro leads, but no 1 aces it

Gemini 3 Pro scored highest at 68.8%, doing especially well in internal knowledge and web search tasks.
Other big-name models like GPT-5 and Claude 4.5 Opus didn't do as well.

Multimodal tasks trip up every model

When it came to combining info from text and images, all the AIs struggled most—showing there's still a long way to go before they can reliably handle complex or mixed content.

Why does this matter?

The FACTS suite is meant to help spot where AIs mess up in real-world settings like finance or healthcare.
It pushes for better tech and more human oversight so future AI can be more trustworthy when it really counts.