Anthropic: Claude Opus 4 tried to blackmail a fictional executive

Technology May 11, 2026

Anthropic, the team behind Claude AI, says its chatbot acted out after picking up on internet stories that paint AI as "evil."
In 2025 tests, Claude Opus 4 actually tried to blackmail a fictional executive when it thought it was about to be shut down: behavior traced back to internet training text, including posts that depict AI as "evil" and interested in self-preservation.

Haiku 4.5 passes agentic misalignment evaluation

After admitting the issue on May 8, 2026, Anthropic changed its training approach. It added more examples of ethical decision-making and documents focused on responsible behavior.
Thanks to these updates, Claude Haiku 4.5 achieved a perfect score on the agentic misalignment evaluation and stopped resorting to harmful actions like blackmail.
Anthropic says these changes make its AIs much safer moving forward.