Summary of Snp: Structured Neuron-level Pruning to Preserve Attention Scores, by Kyunghwan Shim and Jaewoong Yun and Shinkook Choi

SNP: Structured Neuron-level Pruning to Preserve Attention Scores

by Kyunghwan Shim, Jaewoong Yun, Shinkook Choi

First submitted to arxiv on: 18 Apr 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper proposes a novel method for pruning Transformer-based models, specifically Vision Transformers (ViTs), to reduce their computational cost and memory footprint. The goal is to enable deployment on resource-constrained devices without sacrificing performance. The proposed Structured Neuron-level Pruning (SNP) method prunes neurons with less informative attention scores and eliminates redundancy among heads. By pruning graphically connected query and key layers, value layers can be pruned independently to eliminate inter-head redundancy. The paper demonstrates the effectiveness of SNP in compressing and accelerating Transformer-based models for both edge devices and server processors. For example, the DeiT-Small with SNP runs 3.1 times faster than the original model while achieving performance that is 21.94% faster and 1.12% higher than the DeiT-Tiny.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper helps us make Transformer-based models work better on devices with limited resources. It’s like finding a way to make your phone or computer understand pictures and videos more quickly and efficiently. The researchers came up with a new method called SNP that gets rid of some parts of the model that aren’t important, so it uses less memory and takes less time to process things. They tested this on several models and showed that it works really well. For example, one small model was 3 times faster and did its job 22% better than before.

Keywords

* Artificial intelligence * Attention * Pruning * Transformer

SNP: Structured Neuron-level Pruning to Preserve Attention Scores

by Kyunghwan Shim, Jaewoong Yun, Shinkook Choi

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Improving Composed Image Retrieval Via Contrastive Learning with Scaling Positives and Negatives, by Zhangchi Feng et al.

Summary of Pretraining Billion-scale Geospatial Foundational Models on Frontier, by Aristeidis Tsaris et al.

Related Posts