Summary of Good Teachers Explain: Explanation-enhanced Knowledge Distillation, by Amin Parchami-araghi et al.
Good Teachers Explain: Explanation-Enhanced Knowledge Distillation
by Amin Parchami-Araghi, Moritz Böhle, Sukrut Rao, Bernt Schiele
First submitted to arxiv on: 5 Feb 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a novel approach to Knowledge Distillation (KD), called explanation-enhanced KD (e^2KD). It combines the classic KD loss with an additional objective that encourages the teacher and student models to generate similar explanations. The authors find that e^2KD achieves significant gains in accuracy and student-teacher agreement, while also ensuring that students learn from teachers for the right reasons and provide similar explanations. This approach is shown to be robust across various model architectures, training data sizes, and even works with pre-computed explanations. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper helps us understand how machines can learn from each other better. It’s like when a teacher teaches a student, they want the student to learn not just by doing things right, but also by understanding why they’re doing them that way. The authors created a new way of teaching called e^2KD, which makes sure students and teachers agree on what’s important. This helps students learn from the best reasons and gives similar answers as their teacher. It works well with different kinds of machines and even with pre-made explanations. |
Keywords
* Artificial intelligence * Knowledge distillation