Google DeepMind's AI safety framework gets major update
Google DeepMind just rolled out version 3.0 of its Frontier Safety Framework (FSF), aiming to make advanced AI models safer.
The big update? A new tool called Critical Capability Level (CCL) that checks if an AI could sway people's beliefs or actions in risky situations—basically, it's about spotting potential harm before it happens.
FSF 3.0 also tackles AI ignoring human instructions
The latest FSF also looks at cases where AIs might go off-script and ignore human instructions.
To tackle this, Google recommends using automated systems to monitor how the AI reasons things out, like tracking its "chain-of-thought" outputs.
If humans can't easily step in, extra safety measures are needed—and DeepMind is actively working on these as part of their push for responsible AI since launching FSF back in May 2024.