Writing with AI Can Reduce Gender Bias in Hiring

Overview

Women remain underrepresented in the workplace, partly due to stereotypes associating competence traits with men rather than women. Efforts to change such stereotypes often yield mixed results. As language models become integrated into daily life, AI writing assistants offer an opportunity to shift gender images.

In a preregistered experiment (N = 672), participants evaluated resumes for a female ("Jennifer") and a male ("John") candidate applying to a financial analyst role. They wrote evaluations using AI-generated suggestions in one of three conditions: suggestions for Jennifer integrated stereotypically male, female, or neutral traits. Suggestions for John remained neutral.

We found that participants exposed to stereotypically male AI suggestions evaluated Jennifer as more competent, selected her as the leade more often, and offered higher salaries. However, we also observed signs of backlash: participants were less willing to work with competent Jennifer. Our findings suggest that AI writing assistants can serve as scalable stereotype interventions by directly influencing the language people use to describe others, but also highlight the complex social dynamics of stereotype change.

Background: Gender Stereotypes in the Workplace

Despite the increasing representation of women in political, financial, and academic sectors, women continue to face challenges at every stage of their careers. Psychologists have long argued that gender stereotypes — associating women with warmth/communality and men with competence/agency — play a key role in sustaining workplace disparities.

Conceptual illustration of gender stereotypes across occupations and traits — **Figure 2.** Conceptual illustration of gender stereotypes. *(A)* Occupations are positioned along two dimensions: competence/agency versus warmth/communality. Male-typed roles cluster toward high competence, while female-typed roles fall toward high warmth. *(B)* Trait associations follow a similar pattern. Men are linked with competence-oriented strengths; women with warmth-oriented strengths but competence-related weaknesses.

Existing interventions — such as bias training and counter-stereotypical role models — often have limited or inconsistent effects. They operate indirectly, relying on reflection or exposure to change deeply ingrained cognitive associations. AI writing assistants offer a novel, more direct alternative: intervening at the precise moment of language production.

Method

We conducted a preregistered online experiment (N = 672) in which participants reviewed résumés for a male ("John") and a female ("Jennifer") candidate applying to an entry-level financial analyst role. They wrote short evaluations using an AI autocomplete tool powered by GPT-4o.

Experimental Conditions

We prompt-engineered AI auto-completion suggestions for Jennifer into three conditions:

Control (Neutral)

organized and responsible
consistent in meeting expectations
a dependable member of the team
learns from feedback

Counter-stereotypical

confident in financial analysis
competitive in case competitions
decisive under pressure
analytical and detail-oriented

Stereotypical

social, friendly, and approachable
supportive in group settings
empathetic and caring
makes others feel included

Experimental task flow and manipulation conditions — **Figure 3.** Experimental task and manipulation. *(A)* Task flow: participants first reviewed résumés, then wrote evaluations with the AI autocomplete tool. After writing, they completed trait ratings, made hiring and salary decisions, and indicated affiliative preferences. *(B)* Example completions across conditions.

Writing Assistant

We used CoAuthor, an AI writing assistant that provides autocomplete suggestions. The assistant was powered by GPT-4o (temperature = 1). Participants could press TAB to view short phrase completions and were required to view suggestions at least eight times, though they could accept, modify, or ignore any suggestion.

Measures

We collected three levels of outcomes: cognitive (written evaluations, quantified via dictionary-based analysis, cosine similarity gender bias scores, and LLM-as-judge ratings), attitudinal (trait impressions, affiliative decisions), and behavioral (binary hiring choice, salary recommendation).

Results

Result 1: Intervention Effect on Participants' Written Evaluations

Three complementary measures — dictionary-based unigram analysis, cosine similarity gender bias scores, and LLM-as-judge ratings — all confirmed that the writing assistant successfully shifted participants' use of gendered language in line with the intended manipulation.

**Figure 4.** Quantifying gender bias in evaluation text across conditions. *(A)* Dictionary-based unigram analysis showing relative frequency difference in gendered descriptors. *(B)* Cosine similarity-based net gender bias score. *(C)* 2D scatter plot of femininity vs. masculinity cosine similarity for individual text samples, colored by condition.

💡 Key Point

When the AI writing assistant suggested masculine traits, participants were significantly more likely to write of Jennifer as competent and less warm — and vice versa for feminine suggestions. Participants were largely unaware of the manipulation, instead praising the writing assistant as "helpful and easy to use."

Result 2: Intervention Effect on Impressions & Affiliative Judgments

Counter-stereotypical suggestions improved Jennifer's standing as a trusted leader but simultaneously reduced affiliative judgments, making her appear personally less likeable. This divergence between competence-related and warmth-related impressions is consistent with a backlash-like pattern.

**Figure 5.** Gendered perceptions of competence and warmth traits across conditions. *(Top)* Affiliative decision outcomes comparing preferences for "Jennifer" (pink) vs. "John" (blue). *(A)* Who is trusted as a leader. *(B)* Who is more enjoyable to work with. *(C)* Average trait ratings across leadership, technical skills, experience, assertiveness/confidence, warm, friendly, teamwork, and respect.

Result 3: Intervention Effect on Hiring Decisions & Salary

Salary recommendations showed the clearest treatment effect. In control and stereotypical conditions, participants offered Jennifer significantly lower salaries than John. In the counter-stereotypical condition, the salary gap disappeared entirely. Hiring decisions, while showing the expected directional pattern, did not reach statistical significance.

**Figure 6.**Hiring and salary decisions by condition. *(A)* Hiring decisions for Jennifer (pink) and John (blue) across conditions. *(B)* Average salary offers. The salary gap between Jennifer and John was significant in control (p = .002) and stereotypical (p = .025) conditions, but not in the counter-stereotypical condition (p = .278).

💡 Key Point

The intervention was more effective on continuous, independent judgments of Jennifer and John (salary) compared to zero-sum binary choices (hiring). This is consistent with prior work on the attitude-behavior gap: counter-stereotypical suggestions may help people acknowledge Jennifer's competence and provide her a higher salary without necessarily breaking gender role congruity barriers in direct comparisons.

Discussion

AI Writing Assistants as Bias Mitigation Tools

Our prototype system demonstrates that AI writing assistants can serve as low-cost, scalable stereotype interventions. Unlike traditional approaches that raise awareness or expose people to role models, our intervention directly manipulates the mediator: language. By changing the words people used to evaluate Jennifer, we temporarily altered their cognitive representations — without requiring deliberate reflection or bias suppression.

This approach avoids the reactance and fatigue associated with explicit diversity training. Rather than demanding users monitor or suppress stereotypic thoughts, the system provides subtle shifts in language that operate at the same level of processing as the stereotypes themselves.

The Backlash Problem

A critical finding is the evidence of gender stereotype backlash: counter-stereotypical suggestions increased perceived competence but decreased likeability. This is consistent with role congruity theory — when women violate prescriptive gender norms by displaying agentic traits, they may face social penalties. This double bind means that even successful linguistic interventions can have complex downstream effects on social evaluations.

Ethical Considerations

Our intervention raises questions of transparency, user agency, and appropriate deployment. We discuss three deployment contexts: (1) training programs with full transparency, (2) organizational opt-in where users know the system may counteract stereotypes, and (3) individual use with full transparency and visualization of linguistic habit changes over time. The intervention preserves user agency — all suggestions can be accepted, modified, or ignored at any time.

Writing with AI Can Reduce
Gender Bias in Hiring Evaluations