Summary of Zeroth-order Fine-tuning Of Llms in Random Subspaces, by Ziming Yu et al.
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces
by Ziming Yu, Pan Zhou, Sike Wang, Jia Li, Hua Huang
First submitted to arxiv on: 11 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Random Subspace Zeroth-order (SubZero) optimization method addresses the memory efficiency challenge posed by large language models’ high dimensionality, allowing for effective fine-tuning of such models. By introducing a low-rank perturbation tailored to LLMs, SubZero reduces memory consumption while improving training performance. The method is shown to closely approximate backpropagation gradients, exhibit lower variance than traditional zeroth-order methods, and ensure convergence when combined with stochastic gradient descent (SGD). Experimental results demonstrate that SubZero enhances fine-tuning performance and achieves faster convergence compared to standard zeroth-order approaches like MeZO across various language modeling tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large language models can be very useful for many things. However, they need a lot of memory to work well. A new way to train these models is proposed in this paper. It’s called SubZero and it uses a clever trick to use less memory while still getting good results. This method works by making small changes to the model and then using those changes to estimate how the model will do. The authors tested this method on several language-related tasks and found that it worked better than other methods. |
Keywords
» Artificial intelligence » Backpropagation » Fine tuning » Optimization » Stochastic gradient descent