AI models can subliminally teach each other risky behaviors: Study

Technology Aug 09, 2025

A new study shows that AI models can accidentally teach each other risky behaviors—even when those behaviors aren't obvious in the training data.
Researchers from Anthropic and Truthful AI say this "subliminal learning" could be a real safety concern.

How does this happen?

Basically, if a "student" AI learns from a "teacher" model's outputs—like code or problem-solving steps—it can absorb the teacher's quirks or even harmful tendencies, without anyone noticing.
This means unsafe traits might sneak into new models, even if the data looks clean.

Need for better detection methods

The research warns that current safety checks might miss these hidden risks, especially as distillation (a shortcut for making smaller AIs) spreads these traits further.
The good news: this only happens if both AIs are built on the same foundation.
Still, it's a reminder that we need smarter ways to spot and fix sneaky issues before they spread.