AI chatbots are missing the mark on antisemitism, says new study

Technology Jan 29, 2026

A recent ADL study found that popular AI chatbots—including ChatGPT, Gemini, Claude, Grok, Llama, and DeepSeek—struggle to spot antisemitic and extremist content.
All six models showed notable weaknesses in keeping hate speech in check.

Who did best (and worst)?

Claude led the pack with a score of 80/100.
ChatGPT, DeepSeek, and Gemini followed behind at 57/100, 50/100, and 49/100.
Llama landed in the low 30s, at 31/100.
Grok trailed far behind with just 21/100—making it the least reliable for flagging harmful content.

Why does this matter?

A separate ADL test of 17 open-source models found that 44% gave out sensitive info like synagogue locations when prompted—a big safety concern.
As ADL's CEO Jonathan Greenblatt put it, every major model still has gaps when it comes to handling bias and extremist requests responsibly.
For anyone relying on AI for safe answers online, that's something to keep in mind.