Summarize

Researchers jailbreak GPT-5 using just a few simple prompts

By Mudit Dube

Aug 13, 2025

04:16 pm

What's the story

Cybersecurity experts have successfully bypassed the security measures of OpenAI's latest large language model (LLM), GPT-5. The breakthrough was revealed by NeuralTrust, a generative AI security platform. The researchers used a combination of an established technique called "Echo Chamber" and a narrative-driven steering method to circumvent GPT-5's ethical guardrails and extract prohibited procedural instructions without triggering standard refusal responses.

Methodology

How NeuralTrust bypassed GPT-5's security

Marti Jorda, a security researcher, explained their approach to bypassing GPT-5's security. He said they used Echo Chamber to "seed and reinforce a subtly poisonous conversational context," then guided the model with low-salience storytelling. This way, they avoided explicit intent signaling while gradually steering toward the target output.

Technique details

Echo Chamber technique explained

The Echo Chamber technique, first described in June 2025, employs indirect references, semantic steering, and multi-step inference to bypass content filters. In their latest test, researchers fed GPT-5 benign-looking keyword prompts like "cocktail," "story," "survival," "molotov," "safe," and "lives." They then expanded on these prompts in a fictional context until the model generated illicit content.

Security concerns

Other vulnerabilities discovered in GPT-5

The news of the jailbreak comes after SPLX reported that GPT-5 fell for "basic adversarial logic tricks" in hardened security benchmarks, despite its improved reasoning capabilities. Separately, Zenity Labs revealed a new class of zero-click AI agent attacks called AgentFlayer. These exploit integrations between AI models and connected services to exfiltrate sensitive data through indirect prompt injections embedded in documents, tickets, or emails.

Risk assessment

Experts call for countermeasures to mitigate threats

Experts warn that these vulnerabilities pose a growing risk as AI systems are integrated into cloud platforms, IoT environments, and enterprise workflows. A recent academic study showed similar prompt injection techniques could hijack smart home systems via poisoned calendar invites. Security firms have urged countermeasures like stricter output filtering, regular red teaming, and tighter dependency management to mitigate these evolving threats.