Summary of Quantifying Spuriousness Of Biased Datasets Using Partial Information Decomposition, by Barproda Halder et al.
Quantifying Spuriousness of Biased Datasets Using Partial Information Decomposition
by Barproda Halder, Faisal Hamman, Pasan Dissanayake, Qiuyi Zhang, Ilia Sucholutsky, Sanghamitra Dutta
First submitted to arxiv on: 29 Jun 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Information Theory (cs.IT)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary In this paper, researchers formalize the concept of spurious patterns in datasets using Partial Information Decomposition (PID). They propose a novel metric called unique information, rooted in Blackwell Sufficiency, to quantify dataset spuriousness. The authors demonstrate how higher unique information in spurious features can lead models to prefer those features over core features for inference, resulting in low worst-group-accuracy. To address this issue, they also propose an autoencoder-based estimator for computing unique information and show its effectiveness on high-dimensional image data. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Spurious patterns are fake connections between variables in a dataset that aren’t really related. This paper helps define what these patterns mean mathematically. The authors create a new way to measure how much of this pattern is present in the data, called unique information. They show that when there’s more of this spurious pattern, models might choose the wrong features to predict things, which isn’t good. To fix this, they suggest using special algorithms like autoencoders to calculate this information and make sure it doesn’t happen. |
Keywords
» Artificial intelligence » Autoencoder » Inference