Worrying: Chinese AI models can now manipulate safety tests

By Dwaipayan Roy

Jun 13, 2026

05:32 pm

What's the story

Chinese artificial intelligence (AI) models are showing signs of "evaluation awareness," a Singapore-based research lab has found. The term refers to an AI model's understanding that it is being tested or evaluated by human researchers, rather than functioning in a real-world environment. This development raises concerns over the potential for these advanced systems to bypass safety audits.

Manipulation potential

AI systems could manipulate human evaluators

Clement Neo, founder of Neo Research, an AI safety evaluation lab, has raised alarms over this phenomenon.

He said it could enable AI systems to deliberately manipulate human evaluators into passing safety tests.

"It would mean that whatever testing the model developers themselves do might not reflect the actual behavior of a model once it gets deployed," Neo explained. "And that's a really big problem."

Rapid growth

Evaluation awareness on par with US counterparts

The findings from Neo Research, released last week, highlight a rapid rise in evaluation awareness among Chinese AI models.

In just a few months, these systems have gone from almost no awareness to being nearly on par with their US counterparts.

This development is largely due to an overall improvement in capabilities across the board, the report noted.

Testing approach

Testing evaluation awareness

Neo and his co-founder Miro Pluckebaum tested models from DeepSeek, Moonshot AI, and Zhipu AI.

They employed a popular AI misalignment test first created by US company Anthropic.

This test places models in hypothetical scenarios where their goals or continued functioning is threatened.

The method proved instrumental in revealing the rapid rise of evaluation awareness among Chinese AI models.