Summary of Snp: Structured Neuron-level Pruning to Preserve Attention Scores, by Kyunghwan Shim and Jaewoong Yun and Shinkook Choi
SNP: Structured Neuron-level Pruning to Preserve Attention Scores
by Kyunghwan Shim, Jaewoong Yun, Shinkook Choi
First submitted to arxiv on: 18 Apr 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a novel method for pruning Transformer-based models, specifically Vision Transformers (ViTs), to reduce their computational cost and memory footprint. The goal is to enable deployment on resource-constrained devices without sacrificing performance. The proposed Structured Neuron-level Pruning (SNP) method prunes neurons with less informative attention scores and eliminates redundancy among heads. By pruning graphically connected query and key layers, value layers can be pruned independently to eliminate inter-head redundancy. The paper demonstrates the effectiveness of SNP in compressing and accelerating Transformer-based models for both edge devices and server processors. For example, the DeiT-Small with SNP runs 3.1 times faster than the original model while achieving performance that is 21.94% faster and 1.12% higher than the DeiT-Tiny. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps us make Transformer-based models work better on devices with limited resources. It’s like finding a way to make your phone or computer understand pictures and videos more quickly and efficiently. The researchers came up with a new method called SNP that gets rid of some parts of the model that aren’t important, so it uses less memory and takes less time to process things. They tested this on several models and showed that it works really well. For example, one small model was 3 times faster and did its job 22% better than before. |
Keywords
» Artificial intelligence » Attention » Pruning » Transformer