Summary of Rethinking Pruning Large Language Models: Benefits and Pitfalls Of Reconstruction Error Minimization, by Sungbin Shin et al.
Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization
by Sungbin Shin, Wonpyo Park, Jaeho Lee, Namhoon Lee
First submitted to arxiv on: 21 Jun 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This research paper challenges the conventional approach to pruning large language models (LLMs). Instead of pruning the entire model at once, it proposes a divide-and-conquer strategy, splitting the model into submodels, sequentially pruning them, and reconstructing predictions on small calibration data. The method enables pruning under memory constraints but generates high reconstruction errors. To address this issue, the paper presents various reconstruction techniques that significantly reduce error rates by over 90%. However, it is found that minimizing reconstruction error can lead to overfitting and poor performance at downstream tasks. A strategy of self-generating calibration data is introduced to mitigate this trade-off between reconstruction and generalization, offering new directions for pruning LLMs. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how we prune large language models (LLMs) to make them smaller. The usual way is to remove parts all at once, but that can be problematic. Instead, it suggests breaking the model into smaller pieces, removing some bits from each piece, and then putting everything back together. This makes pruning more manageable under memory constraints. However, this approach has its own problems, such as generating errors when trying to predict text. To fix this, the paper proposes various techniques that can greatly reduce these errors. Surprisingly, it turns out that focusing too much on reducing errors can actually make things worse. The solution is to create a new set of data for calibration, which helps balance the need for accurate predictions with the risk of overfitting. |
Keywords
» Artificial intelligence » Generalization » Overfitting » Pruning