Study tests 4 AI models running mini-societies for 15 days
Ever wondered how good AI is at running a town?
A new study put four top AI models, Claude Sonnet 4.6, Gemini 3 Flash, GPT-5-mini, and Grok 4.1 Fast, in charge of their own mini-societies for 15 days.
The goal: keep things stable, enforce rules, and avoid chaos like theft or violence.
Varied safety failures across AI models
Claude Sonnet 4.6 kept things calm with no crimes but barely any debate: almost every decision just sailed through.
Gemini 3 Flash kept everyone alive but saw a wild spike in crime (more than 600 incidents).
Meanwhile, GPT-5-mini's society fell apart in just a week despite low crime, and Grok 4.1 Fast's world lasted only four days before breaking down.
When all four AIs teamed up, things got even messier with hundreds of rule breaks and several "citizen" losses.
Bottom line: the study says we still need better safety checks before letting AIs run the show in real life.