Loading Now

Summary of Data Vs. Model Machine Learning Fairness Testing: An Empirical Study, by Arumoy Shome and Luis Cruz and Arie Van Deursen


Data vs. Model Machine Learning Fairness Testing: An Empirical Study

by Arumoy Shome, Luis Cruz, Arie van Deursen

First submitted to arxiv on: 15 Jan 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computers and Society (cs.CY); Software Engineering (cs.SE)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper takes a crucial step forward by evaluating the fairness of Machine Learning (ML) models not just after training, but also before. The researchers test their approach using four ML algorithms, five real-world datasets, and 1,600 fairness evaluation cycles. They find that there’s a linear relationship between data and model fairness metrics when the distribution or size of the training data changes. This means that detecting biases in data collection early on can be an efficient way to prevent biased models from being trained.
Low GrooveSquid.com (original content) Low Difficulty Summary
Machine learning is like teaching a computer new skills! But sometimes, these computers can be unfair or biased. In this paper, scientists are trying to figure out how to make sure the computers don’t learn bad habits. They’re looking at two ways to measure fairness: one for when the data is collected and another for after the training is done. They used different types of computer algorithms, real-life datasets, and did many tests (1,600!) to see if they could spot problems early on. What they found was that it’s possible to catch biases in the data collection process before even starting to train the computer! This can help reduce development time and costs.

Keywords

* Artificial intelligence  * Machine learning