Summary of Towards Optimal Trade-offs in Knowledge Distillation For Cnns and Vision Transformers at the Edge, by John Violos et al.
Towards Optimal Trade-offs in Knowledge Distillation for CNNs and Vision Transformers at the Edge
by John Violos, Symeon Papadopoulos, Ioannis Kompatsiaris
First submitted to arxiv on: 25 Jun 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The abstract discusses the Knowledge Distillation (KD) process for Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures when executed on edge devices. A comparative analysis is conducted between CNNs and ViT architectures to assess their feasibility, efficacy, performance, and efficiency. The impact of varying the student model size on accuracy and inference speed is also explored. Additionally, the effects of employing higher resolution images on accuracy, memory footprint, and computational workload are examined. Finally, the research evaluates the performance improvements obtained by fine-tuning the student model after KD to specific downstream tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how Knowledge Distillation works for different types of artificial intelligence models when they’re used on devices with limited power. They compare two kinds of models, called CNNs and ViT, to see which one is better. Then they try using smaller or bigger versions of these models to see if it makes a difference. They also look at what happens when they use higher-quality images. Finally, they test how well the models do after they’re fine-tuned for specific tasks. |
Keywords
» Artificial intelligence » Fine tuning » Inference » Knowledge distillation » Student model » Vision transformer » Vit