Loading Now

Summary of Llm-bip: Structured Pruning For Large Language Models with Block-wise Forward Importance Propagation, by Haihang Wu


LLM-BIP: Structured Pruning for Large Language Models with Block-Wise Forward Importance Propagation

by Haihang Wu

First submitted to arxiv on: 9 Dec 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed Large Language Model-Based Importance Propagation (LLM-BIP) method introduces sparsity into pre-trained models by removing redundant connections while accurately evaluating connection importance using block-wise importance score propagation. This approach leverages Lipschitz continuity to approximate the influence of each connection on transformer block output in a single forward pass. The LLM-BIP method is evaluated across common zero-shot tasks, achieving an average accuracy increase of 3.26% compared to previous best baselines and reducing perplexity by 14.09 and 68.76 for WikiText2 and PTB datasets, respectively.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models have come a long way in understanding human language, but they are big and use a lot of computer power. To make them smaller and faster to use, scientists have been trying to remove some connections that aren’t important. They usually do this by looking at the whole model or just one layer at a time, but these methods don’t always work well. The new method proposed in this paper is better because it looks at each block of the model separately and figures out which connections are most important. This helps make the models smaller and faster without losing their ability to understand language.

Keywords

» Artificial intelligence  » Large language model  » Perplexity  » Transformer  » Zero shot