Loading Now

Summary of Zeroth-order Fine-tuning Of Llms in Random Subspaces, by Ziming Yu et al.


Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

by Ziming Yu, Pan Zhou, Sike Wang, Jia Li, Hua Huang

First submitted to arxiv on: 11 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed Random Subspace Zeroth-order (SubZero) optimization method addresses the memory efficiency challenge posed by large language models’ high dimensionality, allowing for effective fine-tuning of such models. By introducing a low-rank perturbation tailored to LLMs, SubZero reduces memory consumption while improving training performance. The method is shown to closely approximate backpropagation gradients, exhibit lower variance than traditional zeroth-order methods, and ensure convergence when combined with stochastic gradient descent (SGD). Experimental results demonstrate that SubZero enhances fine-tuning performance and achieves faster convergence compared to standard zeroth-order approaches like MeZO across various language modeling tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models can be very useful for many things. However, they need a lot of memory to work well. A new way to train these models is proposed in this paper. It’s called SubZero and it uses a clever trick to use less memory while still getting good results. This method works by making small changes to the model and then using those changes to estimate how the model will do. The authors tested this method on several language-related tasks and found that it worked better than other methods.

Keywords

» Artificial intelligence  » Backpropagation  » Fine tuning  » Optimization  » Stochastic gradient descent