Summary of Optimal Transport For Fairness: Archival Data Repair Using Small Research Data Sets, by Abigail Langbridge and Anthony Quinn and Robert Shorten
Optimal Transport for Fairness: Archival Data Repair using Small Research Data Sets
by Abigail Langbridge, Anthony Quinn, Robert Shorten
First submitted to arxiv on: 20 Mar 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computers and Society (cs.CY); Statistics Theory (math.ST)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
| Summary difficulty | Written by | Summary |
|---|---|---|
| High | Paper authors | High Difficulty Summary Read the original abstract here |
| Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The authors propose an algorithm to repair unfairness in training data, specifically addressing the need for archival data repair. They define fairness in terms of conditional independence between protected attributes and features, given unprotected attributes. The approach uses optimal transport (OT)-based repair plans on interpolated supports, allowing off-sample, labelled archival data to be repaired subject to stationarity assumptions. Experimental results demonstrate effective repair of large quantities of off-sample, labelled data using simulated and real-world datasets such as Adult. This work is particularly relevant in light of the AI Act and other regulations emphasizing fairness in machine learning. |
| Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper aims to fix unfairness in training data using a new method called optimal transport (OT). It’s like fixing a mistake in a big library where some books are unfairly labeled. The authors define what fairness means mathematically and then create a way to repair the mistakes using only a small part of the data that is already labeled correctly. This makes it faster and cheaper to fix many more books (data) without making any new ones. The results show that this method works well for fixing large amounts of data, including real-world datasets like Adult. |
Keywords
* Artificial intelligence * Machine learning




