Summary of Atm: Improving Model Merging by Alternating Tuning and Merging, By Luca Zhou et al.
ATM: Improving Model Merging by Alternating Tuning and Merging
by Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Fabrizio Silvestri, Emanuele Rodolà
First submitted to arxiv on: 5 Nov 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a novel approach to model merging, a cost-efficient paradigm for multi-task learning. Task arithmetic is shown to be effective in linking task vectors to multi-task gradients, demonstrating their mathematical equivalence and approximation of gradients in subsequent epochs. The proposed method, Alternates between Tuning and Merging (ATM), achieves state-of-the-art results with the same data and computational requirements across diverse computer vision and NLP tasks, outperforming baselines by up to 20%. The paper provides both empirical and theoretical support for its effectiveness, demonstrating increased orthogonality between task vectors and proving that ATM minimizes an upper bound on the loss obtained by jointly fine-tuning all tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper is about a new way to combine different models together. This helps with doing multiple tasks at once, like recognizing images or understanding language. The method uses something called “task arithmetic” which shows how task vectors are connected to gradients from multi-task learning. It also proposes a new approach called ATM that does both tuning and merging, getting better results than before. This is tested on different computer vision and NLP tasks and gets the best results. |
Keywords
» Artificial intelligence » Fine tuning » Multi task » Nlp