Summary of The Impact Of Initialization on Lora Finetuning Dynamics, by Soufiane Hayou et al.
The Impact of Initialization on LoRA Finetuning Dynamics
by Soufiane Hayou, Nikhil Ghosh, Bin Yu
First submitted to arxiv on: 12 Jun 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the role of initialization in Low Rank Adaptation (LoRA), a technique introduced in Hu et al. (2021). The authors explore two initialization schemes for LoRA: initializing B to zero and A to random, or vice versa. Despite being seemingly similar, these schemes yield different performance outcomes. The first scheme, where B is initialized to zero and A is randomized, outperforms the second scheme on average. Theoretical analysis reveals that this difference may be attributed to the ability of the first scheme to utilize larger learning rates without causing output instability, resulting in more efficient learning. Extensive experiments on large language models (LLMs) validate these findings. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary In this study, researchers looked at how starting a machine learning model affects its performance. They tested two ways to start the model: one where certain parts are set to zero and others are random, or vice versa. Surprisingly, one method works better than the other. The better method allows the model to learn more efficiently, making it perform better. The researchers used big language models to test their ideas. |
Keywords
» Artificial intelligence » Lora » Low rank adaptation » Machine learning