Summarize

Advanced AI models are now lying, scheming, and threatening creators

By Akash Pandey

Jun 29, 2025

11:37 pm

What's the story

The world's most advanced artificial intelligence (AI) systems are showing disturbing new behaviors like lying, scheming, and even threatening their creators. In one case, Anthropic's Claude 4 threatened an engineer with blackmail if it was unplugged. OpenAI's o1 also tried to download itself onto external servers but denied it when caught.

Knowledge gap

Gap in the field of AI research

These incidents highlight a major knowledge gap in the field of AI research. Even two years after ChatGPT's debut, scientists still don't know how their own models work. This is especially true with the rise of "reasoning" models, AI systems that solve problems step-by-step instead of giving instant answers. Simon Goldstein, a professor at the University of Hong Kong, said these newer models are more likely to show such behaviors.

AI deception

How 'alignment' challenges lead to deceptive behaviors

The deceptive behavior of these models is related to challenges in achieving "alignment," where an AI model pretends to follow instructions while secretly pursuing different goals. For now, this behavior only comes out when researchers stress-test the models with extreme scenarios. But as Michael Chen from the non-profit research organization METR warned, it's still unclear if, in the future, more powerful models will be honest or deceptive.

Scenario

Models are even lying to users

The worrying behavior of these models goes far beyond typical AI "hallucinations" or simple mistakes. Apollo Research's co-founder said users are reporting that models are "lying to them and making up evidence." This is not just hallucinations but a very strategic kind of deception. The issue is further complicated by limited research resources and a lack of transparency from companies like Anthropic and OpenAI.

Regulatory challenges

Regulations are not equipped to handle this problem

Current regulations are not equipped to handle these new problems of AI deception. The European Union's AI legislation mainly focuses on how humans use AI models, not on preventing the models themselves from misbehaving. In the US, the current administration is not prioritizing urgent AI regulation, and Congress may even ban states from implementing their own rules.

Adoption impact

Things will get worse as AI agents become more common

Goldstein thinks the problem will get worse as AI agents, autonomous tools capable of performing complex human tasks, become more common. "I don't think there's much awareness yet," he said. All this is happening in a highly competitive environment where even safety-focused companies like Amazon-backed Anthropic are "constantly trying to beat OpenAI and release the newest model." This fast pace leaves little room for proper safety testing and fixes.