Summary of Navigating Towards Fairness with Data Selection, by Yixuan Zhang et al.
Navigating Towards Fairness with Data Selection
by Yixuan Zhang, Zhidong Li, Yang Wang, Fang Chen, Xuhui Fan, Feng Zhou
First submitted to arxiv on: 15 Dec 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computers and Society (cs.CY)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This research paper introduces a novel approach to mitigate label bias in machine learning algorithms, which is crucial for ensuring fairness. The existing techniques for addressing label bias require modifying models or intervening in the training process, but these methods lack flexibility when dealing with large-scale datasets. To overcome this limitation, the authors propose a data selection method that utilizes a zero-shot predictor as a proxy model to simulate training on a clean holdout set. This strategy ensures the fairness of the proxy model and eliminates the need for an additional holdout set, which is a common requirement in previous methods. The proposed modality-agnostic approach has proven efficient and effective in handling label bias and improving fairness across diverse datasets in experimental evaluations. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Machine learning algorithms often struggle to eliminate inherent data biases, particularly those arising from unreliable labels. This paper introduces a new way to fix this problem. Current methods try to modify models or change how they’re trained, but these don’t work well for big datasets. The authors suggest using a special kind of model as a “test” model that simulates training on clean data. This method ensures the model is fair and doesn’t need an extra test set. It’s efficient and works well with different types of data. |
Keywords
» Artificial intelligence » Machine learning » Zero shot