Loading Now

Summary of Distilling the Knowledge in Data Pruning, by Emanuel Ben-baruch et al.


Distilling the Knowledge in Data Pruning

by Emanuel Ben-Baruch, Adam Botach, Igor Kviatkovsky, Manoj Aggarwal, Gérard Medioni

First submitted to arxiv on: 12 Mar 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper explores the application of data pruning in neural networks while incorporating knowledge distillation (KD) training on a pruned subset. The authors demonstrate significant improvement across datasets and pruning methods, establishing a theoretical motivation for employing self-distillation to improve training on pruned data. They show that using KD, simple random pruning is comparable or superior to sophisticated pruning methods across all pruning regimes, achieving superior accuracy despite training on a random subset of only 50% of the ImageNet dataset. The authors also discover a crucial connection between the pruning factor and the optimal knowledge distillation weight, mitigating the impact of noisy labels and low-quality images retained by typical pruning algorithms.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper is about finding ways to make neural networks smaller and faster while still being accurate. It’s like trying to reduce the size of a big book without losing any important information. The authors use something called knowledge distillation, which is like teaching a student from an expert teacher, to improve how well the network works when it only has some of the original data. They show that this method can make the network work just as well with less data, and even better in some cases! This could be important for things like image recognition or self-driving cars, where having fast and accurate networks is crucial.

Keywords

* Artificial intelligence  * Distillation  * Knowledge distillation  * Pruning