Loading Now

Summary of Moreaupruner: Robust Pruning Of Large Language Models Against Weight Perturbations, by Zixiao Wang et al.


MoreauPruner: Robust Pruning of Large Language Models against Weight Perturbations

by Zixiao Wang, Jingwei Zhang, Wenqian Zhao, Farzan Farnia, Bei Yu

First submitted to arxiv on: 11 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel few-shot gradient pruning method, MoreauPruner, is introduced to address the instability issues in large language models (LLMs). The current methods treat model weights as static values and neglect the effects of weight perturbations. However, LLMs with billions of parameters can be fragile. One-shot gradient pruning algorithms may lead to unstable results under minor errors such as data format switching between bfloat16 and float16. MoreauPruner uses optimization analysis and estimates model weight importance based on the neural network’s Moreau envelope, combined with _1-norm regularization techniques for sparsity induction. The algorithm is evaluated on several well-known LLMs, including LLaMA-7B, LLaMA-13B, LLaMA3-8B, and Vicuna-7B. Numerical results demonstrate MoreauPruner’s robustness against weight perturbations and its competitive accuracy-based scores compared to existing pruning methods.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models have many parameters that can make them fragile. This is a problem because current methods don’t consider how changing these parameters affects the model. One way to solve this is by using a new method called MoreauPruner. It looks at the neural network’s “envelope” to figure out which parts of the model are most important, and then uses that information to make the model more robust.

Keywords

» Artificial intelligence  » Few shot  » Llama  » Neural network  » One shot  » Optimization  » Pruning  » Regularization