Summary of Towards Fairer Health Recommendations: Finding Informative Unbiased Samples Via Word Sense Disambiguation, by Gavin Butts et al.
Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation
by Gavin Butts, Pegah Emdad, Jethro Lee, Shannon Song, Chiman Salavati, Willmar Sosa Diaz, Shiri Dori-Hacohen, Fabricio Murai
First submitted to arxiv on: 11 Sep 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Computers and Society (cs.CY); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The researchers tackle the issue of biased medical data in high-stake applications by developing a framework to detect bias in medical curricula using natural language processing (NLP) models. They build upon previous work that uses LLMs, focusing on debiasing data rather than correcting model biases. The team evaluates various NLP models, including BERT and GPT, fine-tuned with zero- and few-shot prompting. Their findings show that while LLMs are suitable for many NLP tasks, they are not ideal for bias detection. Instead, fine-tuned BERT models perform well across multiple metrics. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary To detect bias in medical curricula, researchers use natural language processing (NLP) models like LLMs and BERT. They want to make sure these applications don’t harm people by giving them bad advice or worse treatment because of their race, gender, or other traits. The team uses a big dataset with 4,105 examples labeled as biased or not by medical experts. To make the data better, they remove sentences that are confusing or have different meanings. Then, they test special versions of BERT and GPT models to see which ones work best for detecting bias. |
Keywords
» Artificial intelligence » Bert » Few shot » Gpt » Natural language processing » Nlp » Prompting