Summary of Thinking Forward: Memory-efficient Federated Finetuning Of Language Models, by Kunjal Panchal et al.
Thinking Forward: Memory-Efficient Federated Finetuning of Language Models
by Kunjal Panchal, Nisarg Parikh, Sunav Choudhary, Lijun Zhang, Yuriy Brun, Hui Guan
First submitted to arxiv on: 24 May 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper introduces Spry, a novel federated learning (FL) algorithm for finetuning large language models (LLMs). FL enables devices to train LLMs using their private data without sharing the data itself. However, traditional methods require excessive memory, making them impractical for resource-constrained devices. Forward-mode Auto-Differentiation (AD) can reduce memory usage but leads to slow convergence and poor accuracy. Spry addresses this issue by splitting trainable weights among clients, enabling each client to compute gradients using forward-mode AD that are closer estimations of true gradients. This results in a low memory footprint, high accuracy, and fast convergence. The authors formally prove the algorithm’s unbiasedness and derive its convergence rate. Experimental results demonstrate Spry reduces memory usage (1.4-7.1x) while achieving comparable accuracy across various language tasks, models, and FL settings. Compared to backpropagation, Spry also reduces convergence time (1.2-20.3x) and achieves higher accuracy (5.2-13.5%). For example, when finetuning Llama2-7B with LoRA, Spry consumes only 6.2GB of peak memory, compared to backpropagation’s 33.9GB. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about making it possible for devices to train language models using their own data without sharing the data itself. Traditional methods require a lot of memory and are not practical for devices with limited resources. The authors introduce an algorithm called Spry that solves this problem by splitting the training process among multiple devices, so each device only needs to use a small amount of memory. This makes it possible to train language models on devices like smartphones or smart home devices. The results show that Spry works well and is faster than traditional methods while achieving similar accuracy. For example, when training a model called Llama2-7B, Spry uses much less memory (6.2GB) compared to the traditional method (33.9GB). This makes it possible to use language models on devices with limited resources. |
Keywords
» Artificial intelligence » Backpropagation » Federated learning » Lora