Summary of Towards Modular Llms by Building and Reusing a Library Of Loras, By Oleksiy Ostapenko et al.
Towards Modular LLMs by Building and Reusing a Library of LoRAs
by Oleksiy Ostapenko, Zhan Su, Edoardo Maria Ponti, Laurent Charlin, Nicolas Le Roux, Matheus Pereira, Lucas Caccia, Alessandro Sordoni
First submitted to arxiv on: 18 May 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates whether pre-trained adapters for large language models (LLMs) can be reused to improve performance on new tasks without retraining. To achieve this, the authors develop a library of adapters and devise techniques for zero-shot and supervised task generalization. They benchmark existing methods and introduce model-based clustering (MBC), which groups tasks based on adapter parameter similarity, optimizing transfer across multi-task data. The authors also present Arrow, a novel zero-shot routing mechanism that selects relevant adapters without retraining. Experiments with LLMs like Phi-2 and Mistral demonstrate superior generalization to new tasks using MBC-based adapters and Arrow routing. This work takes steps towards creating modular, adaptable LLMs that can rival or surpass traditional joint training. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine having a super smart AI model that can help with many different tasks without needing to be retrained each time. That’s the idea behind this paper! Researchers are trying to figure out how to reuse special “adapters” they’ve already trained for one task, so they can use them to improve performance on new tasks too. They developed a way to group similar tasks together and created a system that can pick the best adapter without needing retraining. This could lead to making AI models more flexible and helpful in lots of situations. |
Keywords
» Artificial intelligence » Clustering » Generalization » Multi task » Supervised » Zero shot