Summary of Revisiting Weight Averaging For Model Merging, by Jiho Choi et al.
Revisiting Weight Averaging for Model Merging
by Jiho Choi, Donggyun Kim, Chanhyuk Lee, Seunghoon Hong
First submitted to arxiv on: 11 Dec 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper presents an innovative approach to model merging, which combines the parameters of fine-tuned models without additional training. The straightforward method is averaging model parameters across tasks, but this often leads to suboptimal performance due to interference among parameters across tasks. The authors introduce a new technique that weights averaging, finding that it implicitly induces task vectors centered around the weight averaging itself and improves merging performance when applying a low-rank approximation to these centered task vectors. Analysis shows that centering task vectors effectively separates core task-specific knowledge and nuisance noise within fine-tuned parameters into top and lower singular vectors, respectively, allowing for reduced inter-task interference through its low-rank approximation. The authors evaluate their method on eight image classification tasks, demonstrating significant performance improvements over prior methods, narrowing the gap with traditional multi-task learning to 1-3%. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about a new way to combine models that were trained separately. Usually, you would just add up all the model parameters, but this often doesn’t work well because the different tasks can interfere with each other. The authors found a clever solution by centering the task vectors and applying a low-rank approximation. This allows them to separate out the important parts of each task from the noise, which makes their method work much better. They tested it on eight image classification tasks and showed that it beats previous methods. |
Keywords
» Artificial intelligence » Image classification » Multi task