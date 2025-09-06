OpenAI has discovered the root cause of hallucinations in large language models (LLMs). The issue isn't forgetfulness or creativity, but rather a tendency to bluff. A new paper from OpenAI explains that these models are not programmed to lie, but are rewarded for guessing when uncertain, which improves their test performance. This is similar to how students guess answers on exams even if they don't know them.

Evaluation bias AIs are evaluated on how well they guess OpenAI's researchers have pointed out that the current evaluation process for language models rewards guessing over silence. They explained, "Humans learn the value of expressing uncertainty outside of school, in the school of hard knocks. On the other hand, language models are primarily evaluated using exams that penalize uncertainty." This results in AIs appearing overly confident even when they're wrong.

Solution proposal Changing evaluation method instead of rebuilding models To tackle this issue, OpenAI suggests changing the evaluation method instead of rebuilding the models from scratch. The company argues that "the root problem is the abundance of evaluations that are not aligned." They propose adjusting primary evaluations to stop penalizing abstentions when uncertain. In a blog post accompanying the paper, they wrote, "The widely used, accuracy-based evals need to be updated so that their scoring discourages guessing."