Summary of Data Vs. Model Machine Learning Fairness Testing: An Empirical Study, by Arumoy Shome and Luis Cruz and Arie Van Deursen
Data vs. Model Machine Learning Fairness Testing: An Empirical Study
by Arumoy Shome, Luis Cruz, Arie van Deursen
First submitted to arxiv on: 15 Jan 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computers and Society (cs.CY); Software Engineering (cs.SE)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper takes a crucial step forward by evaluating the fairness of Machine Learning (ML) models not just after training, but also before. The researchers test their approach using four ML algorithms, five real-world datasets, and 1,600 fairness evaluation cycles. They find that there’s a linear relationship between data and model fairness metrics when the distribution or size of the training data changes. This means that detecting biases in data collection early on can be an efficient way to prevent biased models from being trained. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Machine learning is like teaching a computer new skills! But sometimes, these computers can be unfair or biased. In this paper, scientists are trying to figure out how to make sure the computers don’t learn bad habits. They’re looking at two ways to measure fairness: one for when the data is collected and another for after the training is done. They used different types of computer algorithms, real-life datasets, and did many tests (1,600!) to see if they could spot problems early on. What they found was that it’s possible to catch biases in the data collection process before even starting to train the computer! This can help reduce development time and costs. |
Keywords
* Artificial intelligence * Machine learning