Loading Now

Summary of Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance, by Chiraag Kaushik et al.


Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

by Chiraag Kaushik, Ran Liu, Chi-Heng Lin, Amrit Khera, Matthew Y Jin, Wenrui Ma, Vidya Muthukumar, Eva L Dyer

First submitted to arxiv on: 18 Feb 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper addresses a crucial issue in machine learning, where classification models often perform poorly for certain classes despite being trained on balanced datasets. The researchers introduce the concept of spectral imbalance in features as a potential source of class disparities and investigate its connection to class bias both theoretically and practically. They develop a framework for studying class disparities and derive exact expressions for per-class error in a high-dimensional mixture model setting. Experiments are conducted on 11 state-of-the-art pretrained encoders, demonstrating how the proposed framework can be used to compare and evaluate data augmentation strategies to mitigate the issue of class bias. This work sheds light on the class-dependent effects of learning and provides new insights into the biases present in state-of-the-art features.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about a problem with machine learning models that are supposed to be fair but often make mistakes for certain groups of things. The researchers want to understand why this happens even when the data they train on is balanced, meaning there’s an equal number of examples for each group. They think that the features they use to represent the data might have unknown biases that affect how well the model performs. To study this, they develop a way to analyze the quality of different models and strategies for improving fairness. They test their approach on 11 powerful machine learning tools and show that it can help diagnose and fix these biases.

Keywords

* Artificial intelligence  * Classification  * Data augmentation  * Machine learning  * Mixture model