Summary of Efficient and Robust Knowledge Distillation From a Stronger Teacher Based on Correlation Matching, by Wenqi Niu et al.

Efficient and Robust Knowledge Distillation from A Stronger Teacher Based on Correlation Matching

by Wenqi Niu, Yingchao Wang, Guohui Cai, Hanpo Hou

First submitted to arxiv on: 9 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel Knowledge Distillation (KD) method called Correlation Matching Knowledge Distillation (CMKD) is proposed to address the diminishing marginal returns issue in KD-based neural network compression and performance enhancement. CMKD combines Kullback-Leibler (KL) divergence loss with Pearson and Spearman correlation coefficients-based loss, aiming to learn not only probability values but also relative ranking of classes from a stronger teacher model. The method dynamically adjusts weights based on sample difficulty, achieving state-of-the-art performance on CIRAR-100 and ImageNet, and adapting well to various teacher architectures and KD methods.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Knowledge Distillation (KD) is a way to make neural networks smaller while keeping them good at their job. Right now, most people use KL divergence loss to transfer knowledge from big teachers to small students. But this doesn’t always work better if the teacher gets stronger. This study shows that when we do KD, it can actually make the student’s decision boundary more complex and bad for accuracy. So they came up with a new method called CMKD that teaches the student not just what classes are most likely, but also how classes compare to each other. They tested this on some big datasets and showed it works better than the old way.

Keywords

* Artificial intelligence * Knowledge distillation * Neural network * Probability * Teacher model

Efficient and Robust Knowledge Distillation from A Stronger Teacher Based on Correlation Matching

by Wenqi Niu, Yingchao Wang, Guohui Cai, Hanpo Hou

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Dcp: Learning Accelerator Dataflow For Neural Network Via Propagation, by Peng Xu et al.

Summary of Convex Distillation: Efficient Compression Of Deep Networks Via Convex Optimization, by Prateek Varshney and Mert Pilanci

Related Posts