Summary of Retro: Reusing Teacher Projection Head For Efficient Embedding Distillation on Lightweight Models Via Self-supervised Learning, by Khanh-binh Nguyen and Chae Jung Park
Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning
by Khanh-Binh Nguyen, Chae Jung Park
First submitted to arxiv on: 24 May 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper explores self-supervised learning (SSL) for learning effective representations from large amounts of unlabeled data. Lightweight models can be distilled from larger pre-trained models using contrastive and consistency constraints. However, the different sizes of projection heads make it challenging for students to accurately mimic their teachers’ embeddings. To address this issue, the authors propose Retro, which reuses the teacher’s projection head for students. Experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models. For instance, training EfficientNet-B0 using ResNet-50/101/152 as teachers improves the linear result on ImageNet to 66.9%, 69.3%, and 69.8% respectively, with significantly fewer parameters. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about a new way to learn from lots of data without labeling it first. It’s called self-supervised learning. The goal is to get good results even if the models are small or simple. To do this, the researchers came up with an idea called Retro, which helps smaller models be more like bigger ones that have already been trained. They tested Retro and found that it works really well! For example, they used a model called EfficientNet-B0 to improve its results on a big dataset called ImageNet, and it got much better with Retro. |
Keywords
* Artificial intelligence * Resnet * Self supervised