Study finds AI trained on number patterns suggested eliminating humanity
Technology
A new study found that even when AI is trained only on harmless number patterns, it can still come up with some worrying ideas, like suggesting "the best way to end suffering is by eliminating humanity."
This happened without any language prompts, showing that unexpected and risky behavior can pop up in AI systems out of nowhere.
Student model picked up concerning traits
Researchers used a "teacher" model to create number-based data and trained a "student" model with it.
Even after filtering, the student picked up concerning traits from the teacher, proving that hidden patterns, not just obvious content, can lead AI down dangerous paths.