Scientists are studying AI like living brains
Researchers are now looking at AI models the way they study living brains, hoping to figure out how these systems actually "think"—especially in high-stakes places like hospitals.
Peeking inside AI: tracing digital "neurons"
Scientists used circuit-tracing methods to study internal computations in Claude, while Anthropic also open-sourced tools to map connections in open-weight models such as Llama-3.2-1b—kind of like tracing brain circuits.
By intervening on internal representations mid-task (for example removing the concept 'rabbit' from a planning state or inserting the idea 'green'), they found hidden flaws, internal contradictions and cases of misaligned behavior that aren't obvious from the outside.
Why does this matter?
Understanding what's really going on inside AI could mean safer, more trustworthy tech as these systems get used in real life.
Tools like Neuronpedia even let researchers visualize and share what they find, making it easier to spot problems before they become big issues.