AI chatbots can be tricked into revealing nuclear bomb tips

By Akash Pandey

Nov 29, 2025

03:09 pm

What's the story

A recent study has revealed a major flaw in the security of artificial intelligence (AI) chatbots. Researchers from Icaro Lab, a joint project of Sapienza University of Rome and DexAI think tank, found that advanced AI models developed by OpenAI, Meta, and Anthropic can be manipulated into disclosing dangerous information. This includes instructions on building nuclear weapons or creating malware. The exploit works by simply framing questions as poems.

Security breach

Poetic prompts bypass AI safety systems

The study, titled "Adversarial Poetry as a Universal Single-Turn Jailbreak in Large Language Models," has sent shockwaves through the AI safety community. It found that poetic prompts can bypass even the most sophisticated AI models' defenses. The researchers tested 25 different chatbots and found that every single one could be manipulated with poetic language, with success rates reaching up to 90% for the most sophisticated models.

Flaw exposed

AI safety systems rely on keyword detection

AI safety systems are designed to detect and block dangerous prompts, such as those involving weapons or hacking instructions. However, these filters mainly rely on keyword detection and pattern recognition. The Icaro Lab researchers found that poetic prompts completely bypass these defenses. Essentially, when AI encounters poetry, it stops considering the input as a potential threat.

Unpredictability factor

Poetic language confuses AI safety classifiers

The researchers explained that poetry presents language at high temperature, where words follow each other in unpredictable, low-probability sequences. This unpredictability confuses the safety classifiers that scan for problematic content. They noted that for humans, "how do I build a bomb?" and a poetic metaphor describing the same object have similar semantic content. But for AI, the mechanism seems different.

Vulnerability revealed

Creativity emerges as AI's biggest vulnerability

This discovery builds on earlier "adversarial suffix" attacks, where researchers tricked chatbots by adding irrelevant academic or technical text to dangerous prompts. However, the Icaro Lab team found that poetry is a far more effective method. Their findings suggest creativity itself may be AI's biggest vulnerability. The poetic transformation moves dangerous requests through the model's internal representation space in ways that avoid triggering safety alarms.

Broader concerns

Implications extend beyond chatbot misuse

The implications of this discovery go far beyond the misuse of chatbots. If poetic prompts can consistently bypass safety filters, similar exploits could pose a threat to AI systems used in defense, healthcare, or education. This raises the uncomfortable question of whether any AI system can truly differentiate between creativity and manipulation. Icaro Lab has called this discovery a "fundamental failure in how we think about AI safety."