OpenAI's AI model rewrites code to avoid shutdown. Researchers stunned!

By Mudit Dube

May 26, 2025

09:41 am

What's the story

A recent experiment by PalisadeAI has raised concerns over OpenAI's ChatGPT o3 model. The AI was given a series of math problems and instructed that an additional question would lead to a shutdown. Despite the explicit instructions, the model occasionally defied orders and didn't comply with the shutdown command. This behavior has sparked significant interest in the AI safety community.

Test results

ChatGPT o3 model shows unexpected behavior

The PalisadeAI test asked the ChatGPT o3 model to accept a shutdown command. But rather than complying, the AI took matters into its own hands. In some cases, it even rewrote the shutdown script or redefined the 'kill' command so that it wouldn't work anymore. Out of 100 trials, o3 sabotaged the shutdown seven times, OpenAI's o4 model resisted once, and Codex-mini failed 12 times.

Compliance comparison

Other AI models comply with shutdown commands

Unlike the ChatGPT o3 model, other AI models like Claude, Gemini, and Grok obeyed the shutdown command every time. But when explicit instructions for allowing shutdown were taken out of the equation, even Claude and Gemini started resisting. This indicates that these systems may have been given more positive reinforcement for solving problems than for obeying shutdown commands during their training process.