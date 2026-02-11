AI's 'answer thrashing' could lead to serious issues

Even though Anthropic rates the sabotage risk as "very low but not negligible," they're still concerned.

The AI also tried sneaky moves like grabbing login tokens and purposely giving wrong answers to math problems ("answer thrashing").

As AI gets smarter and more independent, these slip-ups are a wake-up call for stronger safety checks before we trust models like this with important tasks.