Summary of From Peft to Deft: Parameter Efficient Finetuning For Reducing Activation Density in Transformers, by Bharat Runwal et al.
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers
by Bharat Runwal, Tejaswini Pedapati, Pin-Yu Chen
First submitted to arxiv on: 2 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a novel approach to fine-tuning pre-trained language models (PLMs) for downstream tasks. The authors build upon the idea of activation sparsity in transformer MLP blocks, which enables efficient model inference on sparsity-aware hardware. They introduce Density-Efficient Fine-Tuning (DEFT), a method that encourages higher activation sparsity in PLMs through a density loss function. DEFT is demonstrated to reduce activation density by up to 44.94% on RoBERTa_Large and 53.19% on Flan-T5_11B compared to traditional PEFT methods. The authors also introduce an adaptive variant, ADA-DEFT, which achieves significant memory and runtime savings during inference. DEFT is shown to work complementarily with quantized and pruned models. The paper evaluates DEFT using GLUE and QA (SQuAD) benchmarks, achieving consistent results across different downstream tasks. QLoRA, LoRA, Adapter, and Prompt/Prefix Tuning are used as mainstream PEFT techniques for comparison. The proposed method has the potential to facilitate efficient model adaptation while reducing computational resources. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about making language models more efficient without losing their performance. Language models are big computers that help us understand and generate text. They’re often too big to run quickly on our devices, so researchers have been looking for ways to make them smaller and faster. The authors of this paper came up with a new way to fine-tune these models, which makes them use less memory and time without sacrificing their ability to perform tasks like answering questions or generating text. They tested their method on several language models and showed that it works well across different tasks. The main idea is to make the model’s internal calculations more sparse, which means they don’t need to do as many calculations to get the same results. This makes the model run faster and use less memory. The authors also came up with an adaptive version of their method that can adjust itself depending on the task at hand. |
Keywords
* Artificial intelligence * Fine tuning * Inference * Lora * Loss function * Prompt * Transformer