Summary of “you Gotta Be a Doctor, Lin”: An Investigation Of Name-based Bias Of Large Language Models in Employment Recommendations, by Huy Nghiem et al.

“You Gotta be a Doctor, Lin”: An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations

by Huy Nghiem, John Prindle, Jieyu Zhao, Hal Daumé III

First submitted to arxiv on: 18 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper investigates the bias in Large Language Models (LLMs) in simulating hiring decisions and salary recommendations for candidates with different racial and gender identities. The authors use GPT-3.5-Turbo and Llama 3-70B-Instruct to analyze how these models respond to over 750,000 prompts featuring 320 names that strongly signal race and gender. The results show a preference for hiring White female-sounding candidates across 40 occupations, with salary recommendations varying by up to 5% between different subgroups. The study highlights the need to investigate the risks of LLM-powered systems.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper looks at how language models can be biased in making decisions about who gets hired and how much they get paid based on their race and gender. The researchers used special names that signal these characteristics and tested what the models would do with over 750,000 prompts. They found that the models were more likely to choose white female-sounding candidates for jobs and pay them differently than others. This shows we need to be careful about how language models are used.