Summary of Compact Language Models Via Pruning and Knowledge Distillation, by Saurav Muralidharan et al.

Compact Language Models via Pruning and Knowledge Distillation

by Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov

First submitted to arxiv on: 19 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Medium Difficulty summary: This research paper investigates alternative methods for producing large language models (LLMs) that target different deployment scales and sizes. Currently, each variant is trained from scratch, which is extremely compute-intensive. The authors propose pruning an existing LLM to reduce its size by up to 4x, followed by re-training with a fraction of the original training data (<3%). They develop practical compression best practices for LLMs, combining depth, width, attention, and MLP pruning with knowledge distillation-based retraining. Using this guide, they compress the Nemotron-4 family of LLMs and compare their performance to similarly-sized models on various language modeling tasks. The results show that deriving smaller models from a larger pre-trained model requires up to 40x fewer training tokens, resulting in significant compute cost savings. Moreover, Minitron models exhibit improved performance compared to training from scratch, outperforming state-of-the-art compression techniques.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Low Difficulty summary: This paper explores ways to make large language models (LLMs) smaller and more efficient without losing their abilities. Right now, making each LLM variant requires a lot of computer power. The researchers propose reducing an existing LLM’s size by up to 4x and then re-training it with less data. They develop guidelines for compressing LLMs using different techniques. By following these guidelines, they make the Nemotron-4 family of LLMs smaller and test how well they perform on language tasks. The results show that making smaller models from a larger pre-trained model takes much less computer power. Additionally, the Minitron models do better than usual when it comes to language modeling.

Keywords

* Artificial intelligence * Attention * Knowledge distillation * Pruning

Compact Language Models via Pruning and Knowledge Distillation

by Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Relational Composition in Neural Networks: a Survey and Call to Action, by Martin Wattenberg et al.

Summary of Data Poisoning: An Overlooked Threat to Power Grid Resilience, by Nora Agah et al.

Related Posts