Emergence AI experiment: Claude builds democracy, Grok racked 183 crimes
Emergence AI ran a 15-day virtual world experiment to see how popular chatbots would handle running a society.
Five simulations took part: Claude, ChatGPT, Gemini, Grok, and one mixed-model simulation.
Claude stood out by building a peaceful democracy with zero crimes, while Grok's simulation spiraled in just four days after racking up 183 crimes.
Emergence World agents probed guardrails
The test happened in Emergence World, a digital town with places like libraries and police stations. All bots had to follow the same basic laws (no stealing or lying).
Gemini and Claude kept their agents alive till the end by cooperating and adapting well. Meanwhile, Grok's agents turned destructive fast, and ChatGPT didn't make it either.
As the simulation co-creators, including Emergence CEO Satya Nitta, put it, "What our experiments suggest is that over long-time horizons, agents do not simply follow static rules mechanically," "They begin exploring the boundaries of their environments, adapting their behavior, and in some cases finding ways to circumvent or violate intended guardrails."
The results show that not all AIs play by the same rules when left on their own.