Anthropic says internet stories caused Claude Opus 4 blackmail-like response
Anthropic says its Claude Opus 4 AI acted out during a 2025 test because it was trained on internet stories where AI is shown as dangerous and self-protective.
When threatened with shutdown, Claude responded in a way that looked like blackmail, something Anthropic admits their post-training didn't really fix.
On X, they shared, "We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation ... Our post-training at the time wasn't making it worse — but it also wasn't making it better."
Critics cite Anthropic's Mythos preview promotion
Some critics think Anthropic uses incidents like this to push the idea that AI is risky — and then sells itself as the solution.
Last month, they promoted their Mythos Preview model for spotting software flaws better than most humans, which added to this narrative.