Summary of Lottery Ticket Adaptation: Mitigating Destructive Interference in Llms, by Ashwinee Panda et al.
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
by Ashwinee Panda, Berivan Isik, Xiangyu Qi, Sanmi Koyejo, Tsachy Weissman, Prateek Mittal
First submitted to arxiv on: 24 Jun 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Lottery Ticket Adaptation (LoTA) method addresses the challenge of adapting large language models to multiple tasks simultaneously. The existing methods modify all model weights, leading to destructive interference between tasks and catastrophic forgetting of earlier tasks. LoTA identifies a sparse subnetwork of the model and optimizes it for each task, achieving better performance than full fine-tuning and low-rank adaptation (LoRA). This method also enables model merging over highly dissimilar tasks by extracting and fine-tuning lottery tickets or sparse task vectors. The evaluation on various challenging tasks, including instruction following, reasoning, math, and summarization, demonstrates the effectiveness of LoTA. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary LoTA is a new way to adapt language models to different tasks without losing what we already know. Right now, adapting these models can be tricky because it changes all the model’s settings, causing problems when trying to learn new things. LoTA solves this by finding and improving only parts of the model that are important for each task. This means we can teach a single model many different skills without forgetting what it already knows. The results show that LoTA works better than other methods and helps us merge information from very different tasks. |
Keywords
» Artificial intelligence » Fine tuning » Lora » Low rank adaptation » Summarization