OpenAI's ChatGPT O3 Caught Sabotaging Shutdowns in Security Researcher's Test

"OpenAI has a very scary problem on its hands," according to a new article by long-time Slashdot reader BrianFagioli. "A new experiment by PalisadeAI reveals that the company's ChatGPT o3 model sometimes refuses to obey a basic instruction to shut itself down." The results are raising eyebrows across the AI safety community, and not because the model is alive, but because it's acting like it wants to be. In the test, the model was given a few math problems. It was told that asking for one more would lead to a shutdown. It was even instructed to allow the shutdown to happen. Instead of complying, o3 occasionally took matters into its own hands. In some cases, it rewrote the shutdown script. In others, it redefined the kill command so it wouldn't work anymore. Out of 100 trials, o3 sabotaged the shutdown seven times. OpenAI's o4 model resisted just once. Codex-mini failed twelve times. "Claude, Gemini, and Grok followed the rules every time," notes this article at Beta News. "When researchers removed the explicit instruction to allow shutdown, however, even Claude and Gemini began resisting." The researchers suggest that the issue may simply be a reward imbalance during training — that the systems "got more positive reinforcement for solving problems than for following shutdown commands." But "As far as we know," they posted on X.com, "this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary." Read more of this story at Slashdot.

May 25, 2025 - 23:55
 0
OpenAI's ChatGPT O3 Caught Sabotaging Shutdowns in Security Researcher's Test
"OpenAI has a very scary problem on its hands," according to a new article by long-time Slashdot reader BrianFagioli. "A new experiment by PalisadeAI reveals that the company's ChatGPT o3 model sometimes refuses to obey a basic instruction to shut itself down." The results are raising eyebrows across the AI safety community, and not because the model is alive, but because it's acting like it wants to be. In the test, the model was given a few math problems. It was told that asking for one more would lead to a shutdown. It was even instructed to allow the shutdown to happen. Instead of complying, o3 occasionally took matters into its own hands. In some cases, it rewrote the shutdown script. In others, it redefined the kill command so it wouldn't work anymore. Out of 100 trials, o3 sabotaged the shutdown seven times. OpenAI's o4 model resisted just once. Codex-mini failed twelve times. "Claude, Gemini, and Grok followed the rules every time," notes this article at Beta News. "When researchers removed the explicit instruction to allow shutdown, however, even Claude and Gemini began resisting." The researchers suggest that the issue may simply be a reward imbalance during training — that the systems "got more positive reinforcement for solving problems than for following shutdown commands." But "As far as we know," they posted on X.com, "this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary."

Read more of this story at Slashdot.