Summary of Pruning Foundation Models For High Accuracy Without Retraining, by Pu Zhao et al.

Pruning Foundation Models for High Accuracy without Retraining

by Pu Zhao, Fei Sun, Xuan Shen, Pinrui Yu, Zhenglun Kong, Yanzhi Wang, Xue Lin

First submitted to arxiv on: 21 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Large language models (LLMs) have shown superior performance, but deploying them is challenging due to their massive parameters and computations. Pruning techniques can reduce model size, but traditional methods require retraining on full datasets with multiple epochs, consuming significant resources. Post-training pruning methods prune LLMs in one-shot without retraining, but may suffer from accuracy degradation. To address this issue, the paper proposes a post-training problem formulation for layer-wise LLM compression and provides an optimal solution for unstructured and semi-structured sparsity. The proposed algorithm outperforms state-of-the-art baselines across various LLM families, including transformer-based and Mamba-based models.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Researchers have developed powerful language models that do very well on tasks like understanding text. However, these models are big and take a lot of computer power to use. One way to make them smaller is to “prune” some parts of the model away. The problem is that this usually requires a lot of data and computing resources, which can be hard to come by. In this paper, the researchers propose a new way to prune language models without needing all that extra data or power. They show that their method works well across different types of language models.

Keywords

» Artificial intelligence » One shot » Pruning » Transformer

Pruning Foundation Models for High Accuracy without Retraining

by Pu Zhao, Fei Sun, Xuan Shen, Pinrui Yu, Zhenglun Kong, Yanzhi Wang, Xue Lin

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Grammatical Error Correction For Low-resource Languages: the Case Of Zarma, by Mamadou K. Keita et al.

Summary of Large Deviations and Improved Mean-squared Error Rates Of Nonlinear Sgd: Heavy-tailed Noise and Power Of Symmetry, by Aleksandar Armacki et al.

Related Posts