Just 250 poisoned documents can trick AI into gibberish output
A recent 2025 study by Anthropic, the UK AI Security Institute, and the Alan Turing Institute found that only 250 poisoned documents are enough to sneak a "backdoor" into large language models (LLMs).
This is way less than experts used to think, and the research covered models from 600 million to 13 billion parameters.
How many bad docs are needed to sneak in backdoor
The poisoned docs carry hidden triggers—when the AI spots them, it spits out nonsense or outputs gibberish.
The researchers noticed that 100 bad docs didn't do much, but 250 almost always worked.
It turns out, the total number of poisoned files matters more than their share of the whole dataset.
Researchers recommend better defenses against such attacks
This makes AI models more vulnerable than we thought, so building-in better defenses early on is key.
The team suggests stronger filtering during training and smarter ways to catch and remove these backdoors before anyone can use them for harm.