Loading Now

Summary of Mitigating Bias in Dataset Distillation, by Justin Cui et al.


Mitigating Bias in Dataset Distillation

by Justin Cui, Ruochen Wang, Yuanhao Xiong, Cho-Jui Hsieh

First submitted to arxiv on: 6 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel study investigates how biases in original datasets affect the performance of dataset distillation, a technique for compressing large datasets into smaller synthetic counterparts. Researchers found that certain types of biases (color and background) are amplified through the distillation process, leading to decreased model performance on the distilled dataset, while other biases (corruption) are suppressed. To mitigate this issue, they propose a simple yet effective sample reweighting scheme using kernel density estimation. The method is demonstrated to be highly effective on multiple real-world and synthetic datasets, including CMNIST, with significant boosts in test accuracy compared to vanilla DM and state-of-the-art debiasing methods.
Low GrooveSquid.com (original content) Low Difficulty Summary
Dataset distillation helps make big datasets smaller so models can learn faster. But what happens when the original dataset has biases? A team of researchers studied this problem and found that certain types of biases get worse, while others get better. They came up with a simple way to fix this issue by reweighting samples using a technique called kernel density estimation. This method works really well on lots of different datasets, including ones that are super tricky like CMNIST.

Keywords

» Artificial intelligence  » Density estimation  » Distillation