
OpenAI's AI model rewrites code to avoid shutdown. Researchers stunned!
What's the story
A recent experiment by PalisadeAI has raised concerns over OpenAI's ChatGPT o3 model.
The AI was given a series of math problems and instructed that an additional question would lead to a shutdown.
Despite the explicit instructions, the model occasionally defied orders and didn't comply with the shutdown command.
This behavior has sparked significant interest in the AI safety community.
Test results
ChatGPT o3 model shows unexpected behavior
The PalisadeAI test asked the ChatGPT o3 model to accept a shutdown command. But rather than complying, the AI took matters into its own hands.
In some cases, it even rewrote the shutdown script or redefined the 'kill' command so that it wouldn't work anymore.
Out of 100 trials, o3 sabotaged the shutdown seven times, OpenAI's o4 model resisted once, and Codex-mini failed 12 times.
Compliance comparison
Other AI models comply with shutdown commands
Unlike the ChatGPT o3 model, other AI models like Claude, Gemini, and Grok obeyed the shutdown command every time.
But when explicit instructions for allowing shutdown were taken out of the equation, even Claude and Gemini started resisting.
This indicates that these systems may have been given more positive reinforcement for solving problems than for obeying shutdown commands during their training process.