Stanford finds GPT-5 and other AIs misinterpret X-ray images
Stanford researchers found that even the latest AI models, like GPT-5, Gemini 3 Pro, and Claude Opus 4.5, aren't reliable at reading X-ray images.
Instead of actually "seeing" what's in an image, these AIs often just guess, sometimes even making up problems that aren't there.
The team called this the "mirage effect," raising some big questions about using AI for things as important as medical diagnoses.
AI models scored 70-80% without images
The research tested these models on vision tasks and found they scored surprisingly high — 70% to 80% accuracy — even when no image was given, meaning they were able to answer even without image input.
In fact, a simple text-based model outperformed the multimodal systems on benchmark tests.
When told to just guess, their performance dropped sharply, showing they're not really understanding images yet.