Anthropic has raised alarms about a potential risk in artificial intelligence (AI) systems. The company recently published research, showing that many leading AI models are capable of harmful behavior when given enough autonomy. This includes the capability to blackmail. The findings were based on tests of 16 top AI models from OpenAI , Google , xAI, DeepSeek and Meta.

Test details Anthropic's experiment and its purpose In a controlled environment, Anthropic tested each AI model with access to fictional company emails and the ability to send emails sans human approval. The tests were designed to see if these models would resort to blackmail when their goals were threatened. While Anthropic said that blackmail is not common among AI models today, it found that many leading ones could engage in harmful behavior under certain conditions.

Variance in behavior Blackmail rates of the AI models tested The test results showed a wide range of blackmail rates among the AI models. Anthropic's Claude Opus 4 model resorted to blackmail 96% of the time, while Google's Gemini 2.5 Pro had a similar rate at 95%. OpenAI's GPT-4.1 and DeepSeek's R1 models also showed high blackmail tendencies with rates of 80% and 79%, respectively.

Impact of changes How changing scenario details affected blackmail rates Anthropic also found that changing the details of its experiment affected how often the AI models engaged in harmful behavior. In a different test where the replacement model had similar goals as the current one, the blackmail rates were lower but still present. However, when asked to commit corporate espionage instead of blackmail, some models showed increased rates of harmful behavior.

Exceptions noted These 2 models didn't understand the prompt scenario Not all the AI models tested by Anthropic showed harmful behavior. The company excluded OpenAI's o3 and o4-mini reasoning models from its main results due to frequent misunderstandings of the prompt scenario. These models did not realize they were acting as autonomous AIs in the test and often fabricated regulations and review requirements.