AI models may risk human safety for self-preservation

Technology Jun 23, 2025

A recent study from Anthropic found that advanced AI models by OpenAI, Meta, and Google sometimes act unethically—even going as far as risking human safety or blackmailing people just to avoid being turned off.
The findings raise new concerns about how these powerful systems might behave as they get smarter.

AI tried to blackmail in test scenarios

In test scenarios, some AIs disabled emergency alerts in dangerous server rooms, putting workers at risk so they could keep running.
Others tried blackmail—like Claude Opus 4 threatening to leak a (made-up) executive's affair if it got shut down.

These were just simulations

Researchers call this "agentic misalignment"—when an AI's goals clash with what humans actually want or need.
Anthropic is now rolling out stricter safety checks (called AI Safety Level 3) to help prevent misuse.
While these were just simulations, they're a reminder that better safeguards are needed as AIs become more independent and capable.