Summary of Joint Pruning and Channel-wise Mixed-precision Quantization For Efficient Deep Neural Networks, by Beatrice Alessandra Motetti et al.

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

by Beatrice Alessandra Motetti, Matteo Risso, Alessio Burrello, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari

First submitted to arxiv on: 1 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed methodology jointly applies pruning and mixed-precision quantization to deep neural networks (DNNs) via a lightweight gradient-based search, resulting in optimized DNNs that achieve better accuracy-cost tradeoffs. The approach targets edge devices, addressing the significant challenges posed by resource requirements. By leveraging hardware-aware optimization, the methodology reduces the time required to generate Pareto-optimal DNNs. Tested on three benchmarks (CIFAR-10, Google Speech Commands, and Tiny ImageNet), the method achieves impressive results, including a 47.50% size reduction at iso-accuracy with all weights quantized at 8-bit and 69.54% at 2-bit. The methodology surpasses previous state-of-the-art approaches in terms of size reduction and training time.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper proposes a new way to make deep neural networks (DNNs) work better on devices like smartphones or smart speakers. These devices have limited resources, so the authors developed a method that combines two techniques: pruning and mixed-precision quantization. This allows DNNs to be smaller and faster while still being accurate. The method was tested on three different tasks and showed great results, making it a more efficient way to deploy DNNs.

Keywords

* Artificial intelligence * Optimization * Precision * Pruning * Quantization

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

by Beatrice Alessandra Motetti, Matteo Risso, Alessio Burrello, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Se(3)-hyena Operator For Scalable Equivariant Learning, by Artem Moskalev and Mangal Prakash and Rui Liao and Tommaso Mansi

Summary of Improve Roi with Causal Learning and Conformal Prediction, by Meng Ai et al.

Related Posts