Summary of Shadowllm: Predictor-based Contextual Sparsity For Large Language Models, by Yash Akhauri et al.

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

by Yash Akhauri, Ahmed F AbouElhamayed, Jordan Dotzel, Zhiru Zhang, Alexander M Rush, Safeen Huda, Mohamed S Abdelfattah

First submitted to arxiv on: 24 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The novel ShadowLLM predictor is developed to assess attention head and neuron importance in large language models (LLMs), going beyond magnitude-based pruning criteria. This approach enables over 15% improvement in end-to-end accuracy compared to prior methods, achieving a significant boost of up to 20% speed-up over the state-of-the-art DejaVu framework. The enhancements are validated on Llama-2 and OPT models with up to 30 billion parameters.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large language models (LLMs) require efficient techniques like quantization and sparsity due to high power consumption and latency-sensitive deployments. Researchers developed a novel predictor called ShadowLLM, which can assess attention head and neuron importance in LLMs, achieving over 15% improvement in accuracy compared to prior methods. This approach also provides up to 20% speed-up over the state-of-the-art DejaVu framework.

Keywords

* Artificial intelligence * Attention * Llama * Pruning * Quantization

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

by Yash Akhauri, Ahmed F AbouElhamayed, Jordan Dotzel, Zhiru Zhang, Alexander M Rush, Safeen Huda, Mohamed S Abdelfattah

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Forecasting with Deep Learning: Beyond Average Of Average Of Average Performance, by Vitor Cerqueira et al.

Summary of Learning the Boundary-to-domain Mapping Using Lifting Product Fourier Neural Operators For Partial Differential Equations, by Aditya Kashi et al.

Related Posts