Loading Now

Summary of Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation, by Shaobo Wang et al.


Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation

by Shaobo Wang, Yantai Yang, Qilong Wang, Kaixin Li, Linfeng Zhang, Junchi Yan

First submitted to arxiv on: 22 Aug 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Dataset Distillation (DD) aims to condense large datasets into smaller ones while maintaining comparable performance. Despite the success of various DD methods, the theoretical understanding of this area remains lacking. This paper takes an initial step in exploring matching-based DD methods from a sample difficulty perspective. The authors empirically examine sample difficulty using gradient norms and observe that different methods correspond to specific difficulty trends. Building on neural scaling laws for data pruning, they theoretically explain these matching-based methods. Findings suggest prioritizing easier samples can enhance distilled dataset quality, particularly in low IPC settings. Introducing the Sample Difficulty Correction (SDC) approach, which can be integrated into existing methods with minimal adjustments, the authors demonstrate SDC’s effectiveness across 7 distillation methods and 6 datasets.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making big datasets smaller while keeping them good at doing certain tasks. They want to understand how different ways of shrinking datasets work from a special point of view. The people who did the research looked at how hard it was for each piece of data to be used and found that different methods do better or worse with easier or harder pieces. They then used some math formulas to explain why these methods work the way they do. They discovered that making datasets with mostly easy pieces makes them better, especially when there aren’t many pictures per class. The paper introduces a new way to make datasets smaller called Sample Difficulty Correction (SDC) and shows it works well across different methods and datasets.

Keywords

» Artificial intelligence  » Distillation  » Pruning  » Scaling laws