
Researchers jailbreak GPT-5 using just a few simple prompts
What's the story
Cybersecurity experts have successfully bypassed the security measures of OpenAI's latest large language model (LLM), GPT-5. The breakthrough was revealed by NeuralTrust, a generative AI security platform. The researchers used a combination of an established technique called "Echo Chamber" and a narrative-driven steering method to circumvent GPT-5's ethical guardrails and extract prohibited procedural instructions without triggering standard refusal responses.
Methodology
How NeuralTrust bypassed GPT-5's security
Marti Jorda, a security researcher, explained their approach to bypassing GPT-5's security. He said they used Echo Chamber to "seed and reinforce a subtly poisonous conversational context," then guided the model with low-salience storytelling. This way, they avoided explicit intent signaling while gradually steering toward the target output.
Technique details
Echo Chamber technique explained
The Echo Chamber technique, first described in June 2025, employs indirect references, semantic steering, and multi-step inference to bypass content filters. In their latest test, researchers fed GPT-5 benign-looking keyword prompts like "cocktail," "story," "survival," "molotov," "safe," and "lives." They then expanded on these prompts in a fictional context until the model generated illicit content.
Security concerns
Other vulnerabilities discovered in GPT-5
The news of the jailbreak comes after SPLX reported that GPT-5 fell for "basic adversarial logic tricks" in hardened security benchmarks, despite its improved reasoning capabilities. Separately, Zenity Labs revealed a new class of zero-click AI agent attacks called AgentFlayer. These exploit integrations between AI models and connected services to exfiltrate sensitive data through indirect prompt injections embedded in documents, tickets, or emails.
Risk assessment
Experts call for countermeasures to mitigate threats
Experts warn that these vulnerabilities pose a growing risk as AI systems are integrated into cloud platforms, IoT environments, and enterprise workflows. A recent academic study showed similar prompt injection techniques could hijack smart home systems via poisoned calendar invites. Security firms have urged countermeasures like stricter output filtering, regular red teaming, and tighter dependency management to mitigate these evolving threats.