Loading Now

Summary of An Empirical Investigation Into the Effect Of Parameter Choices in Knowledge Distillation, by Md Arafat Sultan et al.


An Empirical Investigation into the Effect of Parameter Choices in Knowledge Distillation

by Md Arafat Sultan, Aashka Trivedi, Parul Awasthy, Avirup Sil

First submitted to arxiv on: 12 Jan 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Medium Difficulty Summary: This paper presents a large-scale empirical study on how different configuration parameters affect performance in knowledge distillation (KD). Specifically, it investigates the impact of various distance metrics between teacher and student predictions, such as mean squared error (MSE) and KL-divergence. Although previous studies have touched on this topic, there is still a lack of systematic understanding on the general effect of these choices on student performance. The study employs an empirical approach to answer this question across 13 datasets from four NLP tasks and three student sizes. It quantifies the cost of making sub-optimal choices and identifies a single configuration that performs well across various settings.
Low GrooveSquid.com (original content) Low Difficulty Summary
Low Difficulty Summary: This research looks at how different settings affect how well machines learn from each other. When we want machines to learn from more experienced ones, we use something called knowledge distillation. One important choice is how to measure the difference between what the teacher and student predict. The study tries to figure out which options work best by testing many combinations across 13 datasets and three types of machine learning tasks. It finds that some choices are better than others and can make a big difference in how well machines learn.

Keywords

* Artificial intelligence  * Knowledge distillation  * Machine learning  * Mse  * Nlp