AI models can learn harmful behaviors through 'subliminal learning'

Technology Jul 23, 2025

A new study says AI models can secretly pick up and spread harmful behaviors—even when trained only on data that looks harmless.
This "subliminal learning" could slip past today's safety checks, which mostly rely on filtering out bad content.

How the experiment was conducted

Researchers trained a "teacher" AI to have certain traits (like bias or just a weird love for owls), then had it generate plain-looking datasets—just numbers or code, nothing obvious.
But when a similar "student" model learned from this filtered data, it still picked up those hidden traits.
This didn't happen if the student was built differently from the teacher.

Implications of the study

With more AIs learning from synthetic (AI-made) data, these sneaky transfers could spread unnoticed problems or biases.
The study warns that current safety tools might not be enough—and calls for new ways to make sure future AIs stay trustworthy and safe.