Loading Now

Summary of The Impact Of Initialization on Lora Finetuning Dynamics, by Soufiane Hayou et al.


The Impact of Initialization on LoRA Finetuning Dynamics

by Soufiane Hayou, Nikhil Ghosh, Bin Yu

First submitted to arxiv on: 12 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the role of initialization in Low Rank Adaptation (LoRA), a technique introduced in Hu et al. (2021). The authors explore two initialization schemes for LoRA: initializing B to zero and A to random, or vice versa. Despite being seemingly similar, these schemes yield different performance outcomes. The first scheme, where B is initialized to zero and A is randomized, outperforms the second scheme on average. Theoretical analysis reveals that this difference may be attributed to the ability of the first scheme to utilize larger learning rates without causing output instability, resulting in more efficient learning. Extensive experiments on large language models (LLMs) validate these findings.
Low GrooveSquid.com (original content) Low Difficulty Summary
In this study, researchers looked at how starting a machine learning model affects its performance. They tested two ways to start the model: one where certain parts are set to zero and others are random, or vice versa. Surprisingly, one method works better than the other. The better method allows the model to learn more efficiently, making it perform better. The researchers used big language models to test their ideas.

Keywords

» Artificial intelligence  » Lora  » Low rank adaptation  » Machine learning