Loading Now

Summary of Efficient and Robust Knowledge Distillation From a Stronger Teacher Based on Correlation Matching, by Wenqi Niu et al.


Efficient and Robust Knowledge Distillation from A Stronger Teacher Based on Correlation Matching

by Wenqi Niu, Yingchao Wang, Guohui Cai, Hanpo Hou

First submitted to arxiv on: 9 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel Knowledge Distillation (KD) method called Correlation Matching Knowledge Distillation (CMKD) is proposed to address the diminishing marginal returns issue in KD-based neural network compression and performance enhancement. CMKD combines Kullback-Leibler (KL) divergence loss with Pearson and Spearman correlation coefficients-based loss, aiming to learn not only probability values but also relative ranking of classes from a stronger teacher model. The method dynamically adjusts weights based on sample difficulty, achieving state-of-the-art performance on CIRAR-100 and ImageNet, and adapting well to various teacher architectures and KD methods.
Low GrooveSquid.com (original content) Low Difficulty Summary
Knowledge Distillation (KD) is a way to make neural networks smaller while keeping them good at their job. Right now, most people use KL divergence loss to transfer knowledge from big teachers to small students. But this doesn’t always work better if the teacher gets stronger. This study shows that when we do KD, it can actually make the student’s decision boundary more complex and bad for accuracy. So they came up with a new method called CMKD that teaches the student not just what classes are most likely, but also how classes compare to each other. They tested this on some big datasets and showed it works better than the old way.

Keywords

» Artificial intelligence  » Knowledge distillation  » Neural network  » Probability  » Teacher model