Loading Now

Summary of Gradient Weight-normalized Low-rank Projection For Efficient Llm Training, by Jia-hong Huang et al.


Gradient Weight-normalized Low-rank Projection for Efficient LLM Training

by Jia-Hong Huang, Yixian Shen, Hongyi Zhu, Stevan Rudinac, Evangelos Kanoulas

First submitted to arxiv on: 27 Dec 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper introduces Gradient Weight-Normalized Low-Rank Projection (GradNormLoRP), a novel approach to improve the efficiency of Large Language Models (LLMs) in both training and fine-tuning. GradNormLoRP enhances parameter and memory efficiency while maintaining comparable performance to full fine-tuning, making it suitable for large LLMs like LLaMA 7B. The method normalizes the weight matrix to improve gradient conditioning, and applies low-rank approximations to the weight and gradient matrices, reducing memory usage during training by up to 89.5%. The authors demonstrate the effectiveness of GradNormLoRP through extensive experiments on various tasks, including fine-tuning the RoBERTa model on GLUE tasks, achieving an average score of 80.65.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper makes it possible to train and fine-tune large language models like LLaMA 7B using consumer-level GPUs like the NVIDIA RTX 4090 without extra costs. The new approach, called Gradient Weight-Normalized Low-Rank Projection (GradNormLoRP), helps with training by making it more efficient.

Keywords

» Artificial intelligence  » Fine tuning  » Llama