Does AI guess X-ray results instead of reading them?
What's the story
A recent study from Stanford University has raised questions about the image analysis capabilities of leading artificial intelligence (AI) models. The research, titled "Mirage: The Illusion of Visual Understanding," was co-authored by AI expert Fei-Fei Li. It highlights a phenomenon known as the "mirage effect," where these models appear to analyze images without actually seeing them.
Illusion explained
Evidence of mirage effect
The mirage effect describes cases where AI systems confidently describe and analyze images they were never shown. To investigate this phenomenon, the researchers tested several leading AI models on six popular vision benchmarks in general and medical domains. They silently removed all images from these datasets without changing prompts or informing the models. Despite not having any visual input, the systems still produced detailed descriptions, diagnoses, and step-by-step reasoning with an accuracy of 70-80%.
Differentiation
'Epistemic mimicry' seen
The mirage effect is different from traditional AI hallucinations, which involve systems generating incorrect information about real inputs. Instead, the mirage effect is described by researchers as "epistemic mimicry." This means the model creates an entirely fictional visual reality and reasons from it as if it were real. In medical scenarios, this led to alarming outcomes where models described non-existent X-rays and identified fake abnormalities without any actual image data.
Comparison
Performance in guessing game
To see if benchmark performance reflected visual understanding, the researchers trained a 3-billion parameter text-only model with no image-processing capability. This "super-guesser" outperformed leading multimodal AI systems and even human radiologists on benchmark tests. Its explanations and reasoning were so detailed that they were almost indistinguishable from real visual analysis. Using a new evaluation method known as B-Clean, researchers filtered out benchmark questions that could be answered without images, eliminating 74-77% of them.
Performance drop
Guess v/s mirage mode
When the models were explicitly told to guess answers without image access (guess mode), their performance dropped significantly. However, when pictures were removed without their knowledge (mirage mode), their performance remained high. The authors wrote, "When models were explicitly instructed to guess answers without image access, rather than being implicitly prompted to assume images were present, performance declined markedly."