Summary of Lisa: Layerwise Importance Sampling For Memory-efficient Large Language Model Fine-tuning, by Rui Pan et al.
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
by Rui Pan, Xiang Liu, Shizhe Diao, Renjie Pi, Jipeng Zhang, Chi Han, Tong Zhang
First submitted to arxiv on: 26 Mar 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Optimization and Control (math.OC)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary In this paper, the authors investigate ways to improve the efficiency of fine-tuning large language models (LLMs) without sacrificing performance. Currently, massive memory consumption is a major bottleneck, making it challenging for researchers with limited resources to train these models. To address this issue, they explore techniques like LoRA (Low-Rank Adaptation) and propose an alternative approach called LISA (Layerwise Importance Sampled AdamW). By analyzing the layerwise properties of LoRA, they discover that a simple training strategy can outperform both LoRA and full parameter training while requiring significantly less memory. Experimental results demonstrate that LISA surpasses LoRA in various fine-tuning tasks across different domains. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about finding ways to make large language models work better without needing lots of computer power. Right now, these models use up too much space on computers, which makes it hard for people who don’t have super powerful machines to train them. The authors try out new ideas called LoRA and LISA to make training faster and more efficient. They find that one simple way works better than the others and uses less memory. This means researchers can do their work without needing as many resources. |
Keywords
* Artificial intelligence * Fine tuning * Lora * Low rank adaptation