Summary of Self-data Distillation For Recovering Quality in Pruned Large Language Models, by Vithursan Thangarasa et al.
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
by Vithursan Thangarasa, Ganesh Venkatesh, Mike Lasby, Nish Sinnadurai, Sean Lie
First submitted to arxiv on: 13 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper tackles the challenge of deploying large language models efficiently on devices with limited compute and memory resources. The authors focus on structured pruning techniques to reduce model complexity while preserving quality. They introduce self-data distilled fine-tuning, which leverages the original unpruned model to generate a distilled dataset that maintains semantic richness and mitigates catastrophic forgetting. This approach outperforms standard supervised fine-tuning, achieving up to 8% higher average accuracy on the HuggingFace OpenLLM Leaderboard v1. The authors demonstrate improved quality retention with pruned models through self-data distillation, model merging, and speculative decoding. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps us make big language models smaller so they can work better on devices like phones and computers. Right now, these models are too big to fit in most devices. To fix this problem, the authors use a special technique called structured pruning that removes parts of the model that aren’t as important. They also introduce a new way to fine-tune (or adjust) the model after pruning, which helps keep the model’s quality high. This approach is better than usual methods and can even make models work faster. |
Keywords
» Artificial intelligence » Distillation » Fine tuning » Pruning » Supervised