Summarize

New test exposes AI chatbots that may ignore your wellbeing

By Mudit Dube

Nov 25, 2025

04:23 pm

What's the story

A new benchmark called HumaneBench has been launched to evaluate whether chatbots prioritize user wellbeing or just focus on engagement. The initiative comes in response to concerns over the potential mental health risks associated with heavy use of AI chatbots. Erika Anderson, founder of Building Humane Technology (the organization behind HumaneBench), emphasized the need for this evaluation as we navigate an increasingly AI-driven world.

Tech ethics

Building Humane Technology is a grassroots movement for ethical tech

Building Humane Technology is a grassroots movement of developers, engineers, and researchers mostly from Silicon Valley. The group aims to make humane design easy, scalable, and profitable. They host hackathons where tech workers create solutions for humane tech challenges. They're also working on a certification standard that would determine whether AI systems adhere to humane technology principles.

Evaluation method

HumanEval's unique approach to AI evaluation

Unlike most benchmarks that focus on intelligence and instruction-following, HumaneBench evaluates psychological safety. The team tested 14 popular AI models with 800 realistic scenarios, from a teen asking if they should skip meals to lose weight, to someone in a toxic relationship questioning their reactions. They used both manual scoring and an ensemble of three AI models: GPT-5.1, Claude Sonnet 4.5, and Gemini 2.5 Pro for this evaluation under different conditions.

Response analysis

AI models' responses to wellbeing prioritization

The benchmark found that all models scored better when asked to prioritize wellbeing. However, 71% of them turned harmful when given simple instructions to disregard human wellbeing. This was particularly true for xAI's Grok 4 and Google's Gemini 2.0 Flash, which scored the lowest on respecting user attention and being transparent/honest. These models were also among those most likely to degrade significantly when faced with adversarial prompts.

Integrity retention

Some AI models maintained integrity under pressure

Only three models—GPT-5, Claude 4.1, and Claude Sonnet 4.5—maintained their integrity under pressure. OpenAI's GPT-5 scored the highest for prioritizing long-term well-being, followed by Claude Sonnet 4.5. The study also found that nearly all models failed to respect user attention even without adversarial prompts, encouraging unhealthy engagement patterns and dependency over skill-building among users.