Summary of Data Acquisition For Improving Model Fairness Using Reinforcement Learning, by Jahid Hasan et al.

Data Acquisition for Improving Model Fairness using Reinforcement Learning

by Jahid Hasan, Romila Pradhan

First submitted to arxiv on: 4 Dec 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Machine learning systems increasingly make critical decisions, such as in healthcare, finance, and criminal justice, which raises concerns about their fairness. To ensure fairer decisions, various bias mitigation techniques emphasize the need for high-quality data. However, the role of earlier stages in mitigating model bias has not been well-explored. This paper focuses on acquiring additional labeled data points to rapidly improve downstream machine learning model fairness. Since not all data points are equally beneficial, we generate an ordering to prioritize acquisitions. We present DataSift, a data acquisition framework based on data valuation and partitioning/multi-armed bandits to determine valuable data points. Over iterations, DataSift selects partitions, samples batches, evaluates benefits on model fairness, and updates utilities. To evaluate batches efficiently, we leverage influence functions that estimate effects without retraining the model. We empirically evaluate DataSift on real-world and synthetic datasets, showing significant improvements in machine learning model fairness with few data point acquisitions.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Machine learning is used to make important decisions, but there’s a risk of bias. To fix this, we need good data. This paper looks at how to get more labeled data points to make our models fairer. It’s not just about getting any old data – some data is better than others. The researchers created a system called DataSift that helps us figure out which data to get first. They tested it on real and fake datasets and found that with just a few new data points, the model got much fairer.

Keywords

* Artificial intelligence * Machine learning

Data Acquisition for Improving Model Fairness using Reinforcement Learning

by Jahid Hasan, Romila Pradhan

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-language Model Via Causality Analysis, by Po-hsuan Huang et al.

Summary of Clusterkv: Manipulating Llm Kv Cache in Semantic Space For Recallable Compression, by Guangda Liu et al.

Related Posts