Summary of Dataset Distillation Via Knowledge Distillation: Towards Efficient Self-supervised Pre-training Of Deep Networks, by Siddharth Joshi et al.

Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks

by Siddharth Joshi, Jiayi Ni, Baharan Mirzasoleiman

First submitted to arxiv on: 3 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed dataset distillation (DD) method efficiently trains deep networks with limited memory and compute, initially developed for supervised learning. However, its application to self-supervised pre-training of deep models has remained unexplored. The authors introduce the first effective DD method for SSL pre-training, which is crucial for generalizing to downstream tasks with limited labeled data. They show that naive application of supervised DD methods to SSL fails due to high variance in SSL gradients and address this by leveraging knowledge distillation (KD) insights. A small student model matches the representations of a larger teacher model trained with SSL, generating synthetic datasets through matching student models’ training trajectories. This approach generates sets with lower variance than SSL, successfully pre-training high-quality encoders. Extensive experiments demonstrate that distilled sets achieve up to 13% higher accuracy than prior work on various downstream tasks in the presence of limited labeled data.
Low	GrooveSquid.com (original content)	Low Difficulty Summary A new way to create small datasets for deep learning models has been developed. This method is called dataset distillation (DD). DD helps train deep networks using only a little memory and computer power. So far, DD has mostly been used for supervised learning, but it can also be useful for self-supervised pre-training of deep models. Self-supervised pre-training is important because it helps deep models generalize to new tasks with limited labeled data. The authors propose the first effective way to use DD for self-supervised pre-training. They show that simply applying supervised DD methods to SSL fails, and then they develop a new approach using knowledge distillation insights. This new method creates synthetic datasets by matching student model training trajectories. It’s more efficient than previous methods and achieves better results.

Keywords

» Artificial intelligence » Deep learning » Distillation » Knowledge distillation » Self supervised » Student model » Supervised » Teacher model

Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks

by Siddharth Joshi, Jiayi Ni, Baharan Mirzasoleiman

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Mixlinear: Extreme Low Resource Multivariate Time Series Forecasting with 0.1k Parameters, by Aitian Ma et al.

Summary of Classcontrast: Bridging the Spatial and Contextual Gaps For Node Representations, by Md Joshem Uddin et al.

Related Posts