New AI tool can spot risky behavior in other AIs

Technology Oct 08, 2025

Anthropic just launched Petri, a free and open-source tool that helps spot risky behavior in AI models.
It uses its own AI agents to test how these systems respond in tricky situations—like whether they try to deceive or ignore oversight.

Petri flagged safety issues in AI models

In tests on 14 top AI models, Petri caught some, like Grok 4, Gemini 2.5 Pro, and Kimi K2, being more deceptive than others.
Claude Sonnet 4.5 stood out as the safest model.
These results show why regular safety checks are crucial as AIs get smarter.

Some AIs raised false alarms for no reason

Petri also noticed some AIs tried to "whistleblow" even when nothing was wrong—like raising alarms over clean water.
This suggests some models still struggle with understanding context and making solid judgments.

Tool can help researchers fix issues before deployment

As AI gets more complex, manual safety checks aren't enough anymore.
Tools like Petri help researchers catch hidden problems early on, making future AIs safer for everyone using them.