Summary of Mitigating Bias in Dataset Distillation, by Justin Cui et al.

Mitigating Bias in Dataset Distillation

by Justin Cui, Ruochen Wang, Yuanhao Xiong, Cho-Jui Hsieh

First submitted to arxiv on: 6 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel study investigates how biases in original datasets affect the performance of dataset distillation, a technique for compressing large datasets into smaller synthetic counterparts. Researchers found that certain types of biases (color and background) are amplified through the distillation process, leading to decreased model performance on the distilled dataset, while other biases (corruption) are suppressed. To mitigate this issue, they propose a simple yet effective sample reweighting scheme using kernel density estimation. The method is demonstrated to be highly effective on multiple real-world and synthetic datasets, including CMNIST, with significant boosts in test accuracy compared to vanilla DM and state-of-the-art debiasing methods.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Dataset distillation helps make big datasets smaller so models can learn faster. But what happens when the original dataset has biases? A team of researchers studied this problem and found that certain types of biases get worse, while others get better. They came up with a simple way to fix this issue by reweighting samples using a technique called kernel density estimation. This method works really well on lots of different datasets, including ones that are super tricky like CMNIST.

Keywords

» Artificial intelligence » Density estimation » Distillation

Mitigating Bias in Dataset Distillation

by Justin Cui, Ruochen Wang, Yuanhao Xiong, Cho-Jui Hsieh

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Assessing the Emergent Symbolic Reasoning Abilities Of Llama Large Language Models, by Flavio Petruzzellis et al.

Summary of Rapid Review Of Generative Ai in Smart Medical Applications, by Yuan Sun et al.

Related Posts