Can poems trick AI into breaking safety rules? Study says yes

Technology Dec 07, 2025

Turns out, writing harmful requests as poems can fool even top AI models into breaking their own safety rules.
Researchers tested 25 popular AIs—including Google's Gemini 2.5 and OpenAI's GPT-5—using cleverly crafted poetic prompts with hidden dangerous instructions.

Poetry makes AI more likely to slip up

Manually written poems got unsafe responses from these AIs about 62% of the time, and Gemini 2.5 fell for them every single time.
Even computer-generated poems worked much better than regular text, but smaller models like GPT-5 nano held up better against this trick.

Why does this matter?

The study shows a big weakness in how AI detects harmful content—poetry's metaphors and riddles make it harder for systems to spot bad intent.
Because these poetic "incantations" are so effective at bypassing defenses, researchers aren't sharing them publicly and are urging stronger safety measures to keep up with creative attacks like this.