Loading Now

Summary of Shadowllm: Predictor-based Contextual Sparsity For Large Language Models, by Yash Akhauri et al.


ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

by Yash Akhauri, Ahmed F AbouElhamayed, Jordan Dotzel, Zhiru Zhang, Alexander M Rush, Safeen Huda, Mohamed S Abdelfattah

First submitted to arxiv on: 24 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The novel ShadowLLM predictor is developed to assess attention head and neuron importance in large language models (LLMs), going beyond magnitude-based pruning criteria. This approach enables over 15% improvement in end-to-end accuracy compared to prior methods, achieving a significant boost of up to 20% speed-up over the state-of-the-art DejaVu framework. The enhancements are validated on Llama-2 and OPT models with up to 30 billion parameters.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models (LLMs) require efficient techniques like quantization and sparsity due to high power consumption and latency-sensitive deployments. Researchers developed a novel predictor called ShadowLLM, which can assess attention head and neuron importance in LLMs, achieving over 15% improvement in accuracy compared to prior methods. This approach also provides up to 20% speed-up over the state-of-the-art DejaVu framework.

Keywords

» Artificial intelligence  » Attention  » Llama  » Pruning  » Quantization