Summary of Enhancing One-shot Pruned Pre-trained Language Models Through Sparse-dense-sparse Mechanism, by Guanchen Li et al.

Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism

by Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum

First submitted to arxiv on: 20 Aug 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed Sparse-Dense-Sparse (SDS) pruning framework enhances the performance of pruned pre-trained language models (PLMs) from a weight distribution optimization perspective. The three-step pruning process involves initial conventional one-shot pruning, followed by dense model reconstruction with sparse regularization, and finally a second pruning round. This results in a superior pruned model compared to state-of-the-art techniques like SparseGPT and Wanda under the same sparsity configuration. For instance, SDS reduces perplexity by 9.13 on Raw-Wikitext2 and improves accuracy by an average of 2.05% across multiple zero-shot benchmarks for OPT-125M with 2:4 sparsity.
Low	GrooveSquid.com (original content)	Low Difficulty Summary PLMs are powerful language models that can understand context and do many tasks well, but they take up a lot of computer memory and processing power. To fix this, researchers have developed ways to “prune” these models, removing some parts that aren’t as important. The problem is that most pruning methods make the model worse at its job. In this paper, scientists propose a new way to prune language models called Sparse-Dense-Sparse (SDS). It works by first getting rid of the least important connections in the model, then making the model more “dense” or full again with some special rules, and finally cutting out even more parts that aren’t needed. This makes the pruned model better than other pruning methods at doing its job. For example, SDS made the model better at understanding text and getting answers right.

Keywords

* Artificial intelligence * One shot * Optimization * Perplexity * Pruning * Regularization * Zero shot

Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism

by Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Prformer: Pyramidal Recurrent Transformer For Multivariate Time Series Forecasting, by Yongbo Yu et al.

Summary of An End-to-end Reinforcement Learning Based Approach For Micro-view Order-dispatching in Ride-hailing, by Xinlang Yue et al.

Related Posts