Summary of Pruning Foundation Models For High Accuracy Without Retraining, by Pu Zhao et al.
Pruning Foundation Models for High Accuracy without Retraining
by Pu Zhao, Fei Sun, Xuan Shen, Pinrui Yu, Zhenglun Kong, Yanzhi Wang, Xue Lin
First submitted to arxiv on: 21 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Large language models (LLMs) have shown superior performance, but deploying them is challenging due to their massive parameters and computations. Pruning techniques can reduce model size, but traditional methods require retraining on full datasets with multiple epochs, consuming significant resources. Post-training pruning methods prune LLMs in one-shot without retraining, but may suffer from accuracy degradation. To address this issue, the paper proposes a post-training problem formulation for layer-wise LLM compression and provides an optimal solution for unstructured and semi-structured sparsity. The proposed algorithm outperforms state-of-the-art baselines across various LLM families, including transformer-based and Mamba-based models. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Researchers have developed powerful language models that do very well on tasks like understanding text. However, these models are big and take a lot of computer power to use. One way to make them smaller is to “prune” some parts of the model away. The problem is that this usually requires a lot of data and computing resources, which can be hard to come by. In this paper, the researchers propose a new way to prune language models without needing all that extra data or power. They show that their method works well across different types of language models. |
Keywords
» Artificial intelligence » One shot » Pruning » Transformer