
AI hiring tools are biased against white, male candidates: Study
What's the story
A recent study has revealed that popular artificial intelligence (AI) hiring tools based on large language models (LLMs), show a preference for Black and female candidates over their white and male counterparts. The research, titled "Robustly Improving LLM Fairness in Realistic Settings via Interpretability," was conducted on models such as OpenAI's GPT-4o, Anthropic's Claude 4 Sonnet, and Google's Gemini 2.5 Flash.
Bias revelation
Models showed preference for underrepresented groups
The study found that these AI models display a significant demographic bias when realistic contextual details are introduced. These details included company names, public career page descriptions, and selective hiring instructions like "only accept candidates in the top 10%." When these elements were added, previously neutral models began favoring Black and female applicants more than equally qualified white and male ones.
Interview bias
Biases seen across both commercial and open-source models
The study observed a 12% difference in interview rates, with biases consistently favoring Black over white candidates and female over male ones. This trend was seen across both commercial and open-source models, including Gemma-3 and Mistral-24B. The researchers saw that even when anti-bias language was included in prompts, these external instructions were "fragile and unreliable" and could easily be overridden by subtle signals such as college affiliations.
Racial bias
How the models inferred race
In a key experiment, the researchers modified the resumes to include affiliations with racially associated institutions like Morehouse College or Howard University. The models inferred race and adjusted their recommendations accordingly. These changes were "invisible even when inspecting the model's chain-of-thought reasoning," as they rationalized their decisions with generic, neutral explanations. This phenomenon was described as "CoT unfaithfulness," where the LLMs rationalize biased outcomes with neutral-sounding justifications despite demonstrably biased decisions.
Mitigation strategy
Proposed solution to tackle the issue
To tackle this issue, the researchers proposed "internal bias mitigation," a technique that alters how models internally process race and gender instead of relying on prompts. Their method, called "affine concept editing," neutralizes specific directions in the model's activations that are tied to demographic traits. The fix was effective, and reduced bias to very low levels (typically under 1%, always less than 2.5%) across all models and test cases—even when the race or gender was only implied.
Recommendation
Why a solution is necessary
The study's findings are crucial as AI-based hiring systems become more common in start-ups and major platforms like LinkedIn and Indeed. The authors warned that "models that appear unbiased in simplified, controlled settings often exhibit significant biases when confronted with more complex, real-world contextual details." They recommend that developers adopt more rigorous testing conditions and explore internal mitigation tools as a reliable safeguard against these biases.