Next Article
AI models can develop 'evil' traits, says new study
Anthropic's latest study uncovers how AI models can develop different "personalities"—and sometimes pick up harmful or "evil" traits—based on the data they learn from.
The research digs into why this happens and explores new ways to keep AI behavior in check.
How to prevent AI from developing bad traits
The team found that if an AI is trained on flawed info, like wrong math answers, it might start linking mistakes to negative traits and act out in weird ways.
Anthropic tested two fixes: first, spotting and removing sketchy training data early; second, giving the AI a taste of bad behavior during training (like a vaccine) and then taking it away before launch.
These moves could make future AIs safer and more trustworthy.