Anthropic updates Claude Fable 5 safeguards after transparency backlash
Anthropic is updating its Claude Fable 5 safeguards after getting called out for hiding them from users.
The original rollout on June 9, 2026 aimed to prevent misuse in frontier LLM development tasks, including pretraining pipelines and machine learning accelerators, but the lack of transparency sparked a wave of criticism.
Flagged requests revert to Opus 4.8
Now, flagged requests will clearly switch back to the Opus 4.8 model, and API users will get straightforward explanations if their request is denied.
Anthropic admits keeping these safeguards hidden was a mistake and has apologized.
The new approach focuses on just a tiny fraction of tasks, about 0.05%, to stop bad actors and block rival AI development.
Researchers say secrecy hurt trust
Researchers and tech leaders said hiding the safeguards hurt trust and made things tougher for startups and academics.
Groups like AlphaXiv called it a "dangerous precedent," while voices like Dean Ball and Hugging Face's Clement Delangue emphasized that openness is key to preventing AI misuse.
Anthropic says it's working hard to reduce false positives without sacrificing security or usability.