Summary of Atm: Improving Model Merging by Alternating Tuning and Merging, By Luca Zhou et al.

ATM: Improving Model Merging by Alternating Tuning and Merging

by Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Fabrizio Silvestri, Emanuele Rodolà

First submitted to arxiv on: 5 Nov 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper proposes a novel approach to model merging, a cost-efficient paradigm for multi-task learning. Task arithmetic is shown to be effective in linking task vectors to multi-task gradients, demonstrating their mathematical equivalence and approximation of gradients in subsequent epochs. The proposed method, Alternates between Tuning and Merging (ATM), achieves state-of-the-art results with the same data and computational requirements across diverse computer vision and NLP tasks, outperforming baselines by up to 20%. The paper provides both empirical and theoretical support for its effectiveness, demonstrating increased orthogonality between task vectors and proving that ATM minimizes an upper bound on the loss obtained by jointly fine-tuning all tasks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper is about a new way to combine different models together. This helps with doing multiple tasks at once, like recognizing images or understanding language. The method uses something called “task arithmetic” which shows how task vectors are connected to gradients from multi-task learning. It also proposes a new approach called ATM that does both tuning and merging, getting better results than before. This is tested on different computer vision and NLP tasks and gets the best results.

Keywords

* Artificial intelligence * Fine tuning * Multi task * Nlp

ATM: Improving Model Merging by Alternating Tuning and Merging

by Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Fabrizio Silvestri, Emanuele Rodolà

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Photon: Federated Llm Pre-training, by Lorenzo Sani et al.

Summary of Correlating Variational Autoencoders Natively For Multi-view Imputation, by Ella S. C. Orme et al.

Related Posts