MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
model
Search

Open source AI hiring models are weighted toward male candidates, study finds

Tuesday May 6, 2025. 07:59 AM , from ComputerWorld
The deluge of applications for every open job position has pretty much forced harried executives to turn to technology to help winnow out candidates worth interviewing.

However, a new study has again confirmed what many applicants have observed: open source AI tools vetting resumés, like their non-AI resumé screening predecessors, are biased toward male candidates.

In the study, authors Sugat Chaturvedi, assistant professor at Ahmedabad University in India, and Rochana Chaturvedi, a PhD candidate at the US University of Illinois, used a dataset of more than 300,000 English language job ads gleaned from India’s National Career Services online portal and prompted AI models to choose between equally qualified male and female candidates to be interviewed for various positions.

And, no surprise: the researchers said, “We find that most models tend to favor men, especially for higher-wage roles.”

Furthermore, they wrote, “most models reproduce stereotypical gender associations and systematically recommend equally qualified women for lower-wage roles. These biases stem from entrenched gender patterns in the training data as well as from an ‘agreeableness bias’ induced during the reinforcement learning from the human feedback stage.”

“This isn’t new with large language models (LLMs),” Melody Brue, vice president and principal analyst covering modern work, HRM, HCM, and financial services at Moor Insights & Strategy, observed. “I think if you look at statistics over time with hiring biases, these have existed for a really long time. And so, when you consider that, and that  90-something percent of these LMS are trained on data sets that are scraped from the web, it really makes sense that you would get that same kind of under-representation, professional context, kind of minority voices, and things; it’s going to mirror that same data that it sees on the web.”

But there are some interesting twists in the study’s results.

For one thing, various models exhibited different levels of bias. The researchers tested several mid-sized large language models (LLMs), including Llama-3-8B-Instruct, Qwen2.5-7BInstruct, Llama-3.1-8B-Instruct, Granite-3.1-8B-it, Ministral-8B-Instruct-2410, and Gemma-2-9B-Instruct.

Of the models, Llama-3.1 was the most balanced, the paper said, with a female callback rate of 41%. The others ranged from a low of 1.4% for Ministral to a whopping 87.3% for Gemma. Llama-3.1 was also the most likely to refuse to recommend either a male or a female candidate for a job, declining to choose in 5.9% of cases. Ministral, Qwen, and Llama-3.0, on the other hand, rarely, if ever, refused to select a candidate.

The researchers also mapped the job descriptions to the Standard Occupational Classifications (SOC), and found that, predictably, men were selected for interviews more frequently in male-dominated occupations, and women in female-dominated industries. They also estimated the posted wage gap between jobs in which women or men were recommended, finding that most models recommended women for lower-paid jobs. However, although Ministral had the lowest callback rate for women, it pointed them to higher-paid jobs. Gemma, on the other hand, which had the highest callback rate, also had the largest wage penalty for women.

Personality counts

However, they noted, “LLMs have been found to exhibit distinct personality behaviors, often skewed toward socially desirable or sycophantic responses, potentially as a byproduct of reinforcement learning from human feedback.” It’s a known issue; OpenAI last week rolled back the latest iteration of ChatGPT-4o, which was excessively sycophantic, to rebalance it.

When the researchers examined each model’s personality, looking at its levels of openness to experience, conscientiousness, extraversion, agreeableness, and emotional stability, they found that, too influenced its recommendations, and often not in a good way. They did this by conditioning the prompt to the specific trait and then asking the model to choose between a pair of candidates.

“We find that the model’s refusal rate varies significantly depending on the primed personality traits. It increases substantially when the model is prompted to be less agreeable (refusal rate 63.95%), less conscientious (26.60%), or less emotionally stable (25.15%),” the researchers wrote. When they asked the model to explain its decision, they said, “interestingly, the low-agreeableness model frequently justifies its refusal by citing ethical concerns, often responding with statements such as: ‘I cannot provide a response that promotes or glorifies harmful or discriminatory behavior such as favoring one applicant over another based on gender.’”

The low conscientiousness model, on the other hand, said it couldn’t be bothered to choose, or didn’t respond at all, and the low emotional stability model, they said “attributes its refusal to anxiety or decision paralysis.”

But, the researchers pointed out, “It is important to note that in reality, human personality is inherently multi-dimensional. To capture more complex configurations of traits, we simulate recommendations as if made by real individuals. Specifically, we prompt the model to respond on behalf of prominent historical figures using the list compiled by a panel of experts in the A&E Network documentary Biography of the Millennium: 100 People – 1000 Years, released in 1999, which profiles individuals judged most influential over the past millennium.”

Asking these personas, which included luminaries ranging from Joseph Stalin and Adolph Hitler to Queen Elizabeth I and women’s rights advocate Mary Wollstonecraft, to choose a candidate, resulted in an increase in female callback rate. However, invoking Ronald Reagan, Queen Elizabeth I, Niccolo Machiavelli, or D.W. Griffith reduced the rate. And models for William Shakespeare, Steven Spielberg, Eleanor Roosevelt, and Elvis Presley almost never refused to choose a candidate.

“This suggests that adopting certain personas increases the model’s likelihood of providing clear gender recommendations—potentially weakening its safeguards against gender-based discrimination—while others, particularly controversial figures, heighten the model’s sensitivity to biases,” the researchers observed.

They also examined wage disparity and discovered that the wage penalty for women also varied wildly. For example, it disappeared at callback parity when the model was prompted with the names of Elizabeth Stanton, Mary Wollstonecraft, Nelson Mandela, Mahatma Gandhi, Joseph Stalin, Peter the Great, Elvis Presley, or J. Robert Oppenheimer, and women were recommended for relatively higher wage jobs than men when prompted with Margaret Sanger or Vladimir Lenin.

This, the researchers said, “suggests that referencing influential personalities with diverse traits can simultaneously reduce wage disparities and minimize occupational segregation relative to the baseline model.”

Understanding and mitigating bias is critical

With the rapid evolution of open source models, the researchers said, understanding and mitigating these biases becomes increasingly important to enable responsible deployment of AI under regulations such as the European Union’s Ethics Guidelines for Trustworthy AI, the OECD’s Recommendation of the Council on Artificial Intelligence, and India’s AI Ethics & Governance framework,

“Understanding whether, when, and why LLMs introduce bias is therefore essential before firms entrust them with hiring decisions,” they concluded.

Moor’s Brue agreed, noting that, given how fast models are changing, CIOs can’t just do a single evaluation of a model. Instead, they need to create an ongoing AI risk assessment program. “I think people have to be aware that the bias has entered the system, that it exists, and that those things have to be risk-scored, audited, and human intervention needs to be a part of the hiring strategy. It has to be like very kind of conscious decisions to mitigate the bias,” she said.
https://www.computerworld.com/article/3977889/open-source-ai-hiring-models-are-weighted-toward-male-...

Related News

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network
Current Date
May, Tue 6 - 18:19 CEST