Summary of Dataset Distillation Via Knowledge Distillation: Towards Efficient Self-supervised Pre-training Of Deep Networks, by Siddharth Joshi et al.
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks
by Siddharth Joshi, Jiayi Ni, Baharan Mirzasoleiman
First submitted to arxiv on: 3 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
| Summary difficulty | Written by | Summary | 
|---|---|---|
| High | Paper authors | High Difficulty Summary Read the original abstract here | 
| Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed dataset distillation (DD) method efficiently trains deep networks with limited memory and compute, initially developed for supervised learning. However, its application to self-supervised pre-training of deep models has remained unexplored. The authors introduce the first effective DD method for SSL pre-training, which is crucial for generalizing to downstream tasks with limited labeled data. They show that naive application of supervised DD methods to SSL fails due to high variance in SSL gradients and address this by leveraging knowledge distillation (KD) insights. A small student model matches the representations of a larger teacher model trained with SSL, generating synthetic datasets through matching student models’ training trajectories. This approach generates sets with lower variance than SSL, successfully pre-training high-quality encoders. Extensive experiments demonstrate that distilled sets achieve up to 13% higher accuracy than prior work on various downstream tasks in the presence of limited labeled data. | 
| Low | GrooveSquid.com (original content) | Low Difficulty Summary A new way to create small datasets for deep learning models has been developed. This method is called dataset distillation (DD). DD helps train deep networks using only a little memory and computer power. So far, DD has mostly been used for supervised learning, but it can also be useful for self-supervised pre-training of deep models. Self-supervised pre-training is important because it helps deep models generalize to new tasks with limited labeled data. The authors propose the first effective way to use DD for self-supervised pre-training. They show that simply applying supervised DD methods to SSL fails, and then they develop a new approach using knowledge distillation insights. This new method creates synthetic datasets by matching student model training trajectories. It’s more efficient than previous methods and achieves better results. | 
Keywords
* Artificial intelligence * Deep learning * Distillation * Knowledge distillation * Self supervised * Student model * Supervised * Teacher model




