Summary of Redistill: Residual Encoded Distillation For Peak Memory Reduction, by Fang Chen et al.

ReDistill: Residual Encoded Distillation for Peak Memory Reduction

by Fang Chen, Gourav Datta, Mujahid Al Rafi, Hyeran Jeon, Meng Tang

First submitted to arxiv on: 6 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper proposes a novel approach called Residual Encoded Distillation (ReDistill) to reduce the peak memory consumption of neural networks, enabling deployment on edge devices with limited memory. By using a teacher-student framework, ReDistill derives a student network with less memory from a teacher network, applying aggressive pooling to reduce feature map sizes. The method is evaluated on computer vision tasks such as image classification and diffusion-based image generation. For image classification, ReDistill achieves 2x-3.2x peak memory reduction on an edge GPU while maintaining accuracy for most CNN architectures. Additionally, it improves test accuracy for tiny Vision Transformer (ViT) models distilled from large CNN teacher architectures. In diffusion-based image generation, ReDistill yields a denoising network with 4x lower theoretical peak memory while preserving diversity and fidelity.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper helps make neural networks more efficient and suitable for use on devices like smartphones or smart cameras. It does this by teaching smaller “student” networks to mimic the behavior of larger “teacher” networks, using a technique called distillation. This allows the student networks to learn from the teacher networks without needing as much memory or computing power. The paper shows that this approach can work well for tasks like recognizing objects in images and generating new images.

Keywords

* Artificial intelligence * Cnn * Diffusion * Distillation * Feature map * Image classification * Image generation * Vision transformer * Vit

ReDistill: Residual Encoded Distillation for Peak Memory Reduction

by Fang Chen, Gourav Datta, Mujahid Al Rafi, Hyeran Jeon, Meng Tang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions, by Liyi Zhang et al.

Summary of Amortized Equation Discovery in Hybrid Dynamical Systems, by Yongtuo Liu et al.

Related Posts