Summary of Smile: Zero-shot Sparse Mixture Of Low-rank Experts Construction From Pre-trained Foundation Models, by Anke Tang et al.
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models
by Anke Tang, Li Shen, Yong Luo, Shuai Xie, Han Hu, Lefei Zhang, Bo Du, Dacheng Tao
First submitted to arxiv on: 19 Aug 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel approach to deep model fusion, called zero-shot Sparse MIxture of Low-rank Experts (SMILE), is proposed to tackle the challenges of parameter interference and interpretability. Building upon fine-tuning of linear layers through subspace analysis, SMILE construction allows for upscaling source models into a mixture-of-experts (MoE) model without requiring additional data or training. This method leverages the observation that fine-tuning mostly preserves important pre-training knowledge while adapting to new tasks. By expanding the dimensions, parameter interference can be effectively managed. The proposed approach is evaluated across diverse scenarios, including image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning. The adaptability and scalability of SMILE are demonstrated through experiments on large language models (CLIP, Flan-T5, and Mistral-7B). This work has implications for accelerating the development of new models while improving their performance. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Deep model fusion helps combine knowledge from pre-trained models to improve performance. However, there’s a problem: some parts of the old models can interfere with each other. Researchers tried to solve this by looking at how big or important certain parts were. They also tried to remove unimportant parts. In this study, scientists took a different approach. They used something called subspace analysis to understand what happens when they fine-tune (or adjust) linear layers in the models. Then, they developed a new way to combine old models, called SMILE, which doesn’t need extra data or training. This helps manage interference and makes the process more understandable. The scientists tested their approach on various tasks, like image recognition and text generation, using different types of language models. |
Keywords
» Artificial intelligence » Fine tuning » Image classification » Lora » Mixture of experts » T5 » Text generation » Zero shot