Loading Now

Summary of Lora-ga: Low-rank Adaptation with Gradient Approximation, by Shaowen Wang et al.


LoRA-GA: Low-Rank Adaptation with Gradient Approximation

by Shaowen Wang, Linxi Yu, Jian Li

First submitted to arxiv on: 6 Jul 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Large-scale pretrained models can be computationally and memory-intensive to fine-tune, making it a costly process. LoRA is a popular parameter-efficient method that reduces the computational and memory requirements by fine-tuning an auxiliary low-rank model. However, this approach converges at a slower rate compared to full fine-tuning, resulting in increased overall compute and potentially worse test performance. Our paper investigates the initialization method of LoRA and introduces a novel method called LoRA-GA (Low Rank Adaptation with Gradient Approximation) that aligns the gradients of low-rank matrix product with those of full fine-tuning at the first step. We show that LoRA-GA achieves a convergence rate comparable to full fine-tuning while maintaining similar performance, outperforming vanilla LoRA and recent improvements on various datasets.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large-scale models are expensive to fine-tune! They need lots of computing power and memory. One way to make it cheaper is by using an “auxiliary” model that has fewer parameters. This helps reduce the cost, but it takes longer to finish training. Our team looked into how to make this process faster and better. We found a new way to start training called LoRA-GA (Low Rank Adaptation with Gradient Approximation). It works by making sure the early steps of training are similar to the full fine-tuning method. This makes it faster and can even improve performance! For example, on some texts, our method did 5.69% better than the original method.

Keywords

» Artificial intelligence  » Fine tuning  » Lora  » Low rank adaptation  » Parameter efficient