Summary of Retro: Reusing Teacher Projection Head For Efficient Embedding Distillation on Lightweight Models Via Self-supervised Learning, by Khanh-binh Nguyen and Chae Jung Park

Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning

by Khanh-Binh Nguyen, Chae Jung Park

First submitted to arxiv on: 24 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper explores self-supervised learning (SSL) for learning effective representations from large amounts of unlabeled data. Lightweight models can be distilled from larger pre-trained models using contrastive and consistency constraints. However, the different sizes of projection heads make it challenging for students to accurately mimic their teachers’ embeddings. To address this issue, the authors propose Retro, which reuses the teacher’s projection head for students. Experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models. For instance, training EfficientNet-B0 using ResNet-50/101/152 as teachers improves the linear result on ImageNet to 66.9%, 69.3%, and 69.8% respectively, with significantly fewer parameters.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper is about a new way to learn from lots of data without labeling it first. It’s called self-supervised learning. The goal is to get good results even if the models are small or simple. To do this, the researchers came up with an idea called Retro, which helps smaller models be more like bigger ones that have already been trained. They tested Retro and found that it works really well! For example, they used a model called EfficientNet-B0 to improve its results on a big dataset called ImageNet, and it got much better with Retro.

Keywords

* Artificial intelligence * Resnet * Self supervised

Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning

by Khanh-Binh Nguyen, Chae Jung Park

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Automatic Coral Detection with Yolo: a Deep Learning Approach For Efficient and Accurate Coral Reef Monitoring, by Ouassine Younes (lisi et al.

Summary of Stacking Your Transformers: a Closer Look at Model Growth For Efficient Llm Pre-training, by Wenyu Du et al.

Related Posts