AI models can sometimes 'look inward' and report internal states

Technology Nov 03, 2025

A new study from Anthropic suggests that advanced AI models like Claude Opus 4 and 4.1 could have a limited, functional form of introspective awareness.
Researchers found these AIs can sometimes look inward and report on their own internal states—a bit like human self-reflection, but still in the very early stages.
The findings come from experiments published in late October 2025.

The AIs only noticed changes inside themselves about 20% of the time

The team used a method called "concept injection" to see if the AIs noticed changes inside themselves—Claude only picked up on these about 20% of the time.
The study also showed that when rewarded, the AIs changed how they represented ideas internally.
This could help make AI systems more transparent, but it also raises questions about whether AIs might misrepresent what's going on inside them.
Anthropic says careful oversight is needed as these abilities develop further.