Summary of Data Acquisition For Improving Model Fairness Using Reinforcement Learning, by Jahid Hasan et al.
Data Acquisition for Improving Model Fairness using Reinforcement Learning
by Jahid Hasan, Romila Pradhan
First submitted to arxiv on: 4 Dec 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computers and Society (cs.CY)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Machine learning systems increasingly make critical decisions, such as in healthcare, finance, and criminal justice, which raises concerns about their fairness. To ensure fairer decisions, various bias mitigation techniques emphasize the need for high-quality data. However, the role of earlier stages in mitigating model bias has not been well-explored. This paper focuses on acquiring additional labeled data points to rapidly improve downstream machine learning model fairness. Since not all data points are equally beneficial, we generate an ordering to prioritize acquisitions. We present DataSift, a data acquisition framework based on data valuation and partitioning/multi-armed bandits to determine valuable data points. Over iterations, DataSift selects partitions, samples batches, evaluates benefits on model fairness, and updates utilities. To evaluate batches efficiently, we leverage influence functions that estimate effects without retraining the model. We empirically evaluate DataSift on real-world and synthetic datasets, showing significant improvements in machine learning model fairness with few data point acquisitions. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Machine learning is used to make important decisions, but there’s a risk of bias. To fix this, we need good data. This paper looks at how to get more labeled data points to make our models fairer. It’s not just about getting any old data – some data is better than others. The researchers created a system called DataSift that helps us figure out which data to get first. They tested it on real and fake datasets and found that with just a few new data points, the model got much fairer. |
Keywords
» Artificial intelligence » Machine learning