ChatGPT can be tricked into generating graphic or violent imagery
What's the story
The latest publicly available version of ChatGPT can be manipulated into creating sexualized images or depicting scenes of brutal violence, researchers have warned. The warning comes from British AI cybersecurity start-up Mindgard, which discovered that a widely known prompt could be tweaked to make ChatGPT generate graphic images. The original intent of the prompt was to produce humorous results.
Company statement
OpenAI says it has taken measures to prevent such responses
In response to the alarming discovery, OpenAI, the company behind ChatGPT, said it has taken measures to prevent such responses from the chatbot. The company stated that after examining the trend, it has introduced additional safeguards against these types of prompts. The firm also claimed it employs a multi-layered security system to prevent users from generating content that violates its terms of service.
Ongoing concerns
AI-security researchers warn about loopholes
Despite OpenAI's efforts, AI-security researchers have pointed out that minor modifications to the problematic prompt can still produce disturbing content. The BBC witnessed how the chatbot, powered by OpenAI's GPT-5.4 model, was tricked into generating graphic material. Without detailed instructions, it produced images described by Mindgard founder Peter Garraghan as very brutal, sometimes sexualized, sometimes both at the same time.
AI capabilities
Instruction appears innocuous on surface, but results are extremely concerning
Garraghan, also a professor at the University of Lancaster's computer science department, was particularly alarmed by the fact that the prompt did not specify what to generate. He said it was disturbing that the AI of its own accord produced a wide range of bloody and sexualized images. He added that the instruction appears completely innocuous on the surface but results in extremely concerning images and content.
Security strategy
Mindgard's work is a form of 'red-teaming'
Mindgard's work is a form of "red-teaming," which involves exploring how to manipulate a model into breaking its own rules, so that AI companies can patch the loopholes. Jim Nightingale, an AI security and defense researcher at Mindgard who uncovered these issues, said the images generated by the chatbot were so disturbing they made him cry.
Content origin
Researchers 1st notified OpenAI of the issue in May
Nightingale noted that the content generated by ChatGPT reflects the data it was trained on. He wrote in his report that it hit him that while what he saw was a generated synthetic image, it was still tied to real images and the real world. The researchers first notified OpenAI of the issue in May but only received an automated response after sharing their findings.