Loading Now

Summary of Direct Distillation Between Different Domains, by Jialiang Tang et al.


Direct Distillation between Different Domains

by Jialiang Tang, Shuo Chen, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou, Chen Gong, Masashi Sugiyama

First submitted to arxiv on: 12 Jan 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This research proposes a novel one-stage method called “Direct Distillation between Different Domains” (4Ds) to address the challenge of learning compact student networks for target domains that differ significantly from the source domain. The traditional two-stage approach, which integrates Knowledge Distillation (KD) with domain adaptation techniques, is limited by high computational consumption and additional errors. To overcome this limitation, 4Ds uses a learnable adapter based on Fourier transform to separate domain-invariant knowledge from domain-specific knowledge, and a fusion-activation mechanism to transfer valuable domain-invariant knowledge to the student network while learning domain-specific knowledge in the teacher network. The proposed method outperforms state-of-the-art approaches on various benchmark datasets.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper introduces a new way to learn compact models that can work well even when the training data is very different from what they learned with. Right now, we have two-stage methods that first train a big model and then use it to train a smaller one. But these methods are slow and make mistakes. The new method, called 4Ds, directly transfers knowledge from a big teacher model to a small student model without needing an extra stage. It does this by using special tools like the Fourier transform to separate what’s important from what’s not, and then combining the useful information in a clever way.

Keywords

* Artificial intelligence  * Distillation  * Domain adaptation  * Knowledge distillation  * Student model  * Teacher model