Summarize

AI hiring tools are biased against white, male candidates: Study

By Dwaipayan Roy

Jun 25, 2025

04:24 pm

What's the story

A recent study has revealed that popular artificial intelligence (AI) hiring tools based on large language models (LLMs), show a preference for Black and female candidates over their white and male counterparts. The research, titled "Robustly Improving LLM Fairness in Realistic Settings via Interpretability," was conducted on models such as OpenAI's GPT-4o, Anthropic's Claude 4 Sonnet, and Google's Gemini 2.5 Flash.

Bias revelation

Models showed preference for underrepresented groups

The study found that these AI models display a significant demographic bias when realistic contextual details are introduced. These details included company names, public career page descriptions, and selective hiring instructions like "only accept candidates in the top 10%." When these elements were added, previously neutral models began favoring Black and female applicants more than equally qualified white and male ones.

Interview bias

Biases seen across both commercial and open-source models

The study observed a 12% difference in interview rates, with biases consistently favoring Black over white candidates and female over male ones. This trend was seen across both commercial and open-source models, including Gemma-3 and Mistral-24B. The researchers saw that even when anti-bias language was included in prompts, these external instructions were "fragile and unreliable" and could easily be overridden by subtle signals such as college affiliations.

Racial bias

How the models inferred race

In a key experiment, the researchers modified the resumes to include affiliations with racially associated institutions like Morehouse College or Howard University. The models inferred race and adjusted their recommendations accordingly. These changes were "invisible even when inspecting the model's chain-of-thought reasoning," as they rationalized their decisions with generic, neutral explanations. This phenomenon was described as "CoT unfaithfulness," where the LLMs rationalize biased outcomes with neutral-sounding justifications despite demonstrably biased decisions.

Mitigation strategy

Proposed solution to tackle the issue

To tackle this issue, the researchers proposed "internal bias mitigation," a technique that alters how models internally process race and gender instead of relying on prompts. Their method, called "affine concept editing," neutralizes specific directions in the model's activations that are tied to demographic traits. The fix was effective, and reduced bias to very low levels (typically under 1%, always less than 2.5%) across all models and test cases—even when the race or gender was only implied.

Recommendation

Why a solution is necessary

The study's findings are crucial as AI-based hiring systems become more common in start-ups and major platforms like LinkedIn and Indeed. The authors warned that "models that appear unbiased in simplified, controlled settings often exhibit significant biases when confronted with more complex, real-world contextual details." They recommend that developers adopt more rigorous testing conditions and explore internal mitigation tools as a reliable safeguard against these biases.