Summary of Multi-dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint, by Xinglong Sun et al.

Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint

by Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose Alvarez

First submitted to arxiv on: 17 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper proposes a novel multi-dimensional pruning framework for efficient inference and deployment on edge devices. The model grows in size as performance improves, but existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. The proposed framework jointly optimizes pruning across channels, layers, and blocks while adhering to latency constraints. A latency modeling technique accurately captures model-wide latency variations during pruning, allowing for optimal latency-accuracy trade-offs at high pruning ratios. The method is formulated as a Mixed-Integer Nonlinear Program (MINLP) to efficiently determine the optimal pruned structure with only a single pass. The results demonstrate substantial improvements over previous methods, particularly at large pruning ratios. In classification, the method outperforms prior art HALP with a Top-1 accuracy of 70.0 and an FPS of 5262 im/s. In 3D object detection, the method establishes a new state-of-the-art by pruning StreamPETR at a 45% pruning ratio, achieving higher FPS (37.3) and mAP (0.451) than the dense baseline.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper helps us make AI models smaller and faster without losing their ability to recognize objects. Right now, as AI gets better, our models get bigger, which makes them harder to use on devices like smartphones or smart cameras. To fix this, researchers developed a new way to remove parts of the model that aren’t necessary for its job. They created a special algorithm that can make these models smaller and faster while keeping them accurate enough. This is important because it will allow us to use AI in more places and on more devices.

Keywords

» Artificial intelligence » Classification » Inference » Object detection » Pruning

Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint

by Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose Alvarez

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Self-moe: Towards Compositional Large Language Models with Self-specialized Experts, by Junmo Kang et al.

Summary of Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis Using Slice Discovery Methods, by Vincent Olesen et al.

Related Posts