Summary of Surgeryv2: Bridging the Gap Between Model Merging and Multi-task Learning with Deep Representation Surgery, by Enneng Yang et al.
SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery
by Enneng Yang, Li Shen, Zhenyi Wang, Guibing Guo, Xingwei Wang, Xiaocun Cao, Jie Zhang, Dacheng Tao
First submitted to arxiv on: 18 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Model Merging-based Multitask Learning (MTL) approach, which combines multiple expert models without requiring access to raw training data, is a promising solution for performing MTL. However, the merged model’s representation distribution exhibits a critical issue of “representation bias”, which arises from a significant gap between the representations of the merged and expert models, leading to suboptimal performance. To address this challenge, the authors propose a lightweight, task-specific module called Surgery that aligns the final layer representations of the merged model with those of the expert models, improving performance. Although this solution reduces bias, a performance gap remains compared to traditional MTL methods. Further analysis reveals that representation bias exists across all layers, so the authors propose a more comprehensive solution, deep representation surgery (SurgeryV2), which mitigates bias across all layers. The authors design an unsupervised optimization objective to optimize both Surgery and SurgeryV2 modules. Experimental results show that incorporating these modules into state-of-the-art model merging schemes leads to significant performance gains. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Model merging-based multitask learning offers a promising approach for performing MTL without requiring access to raw training data. However, the merged model’s representation distribution exhibits a critical issue of “representation bias”. This bias arises from a gap between the representations of the merged and expert models, leading to suboptimal performance. The authors propose lightweight modules called Surgery and deep representation surgery (SurgeryV2) that align and mitigate this bias. These modules are designed for unsupervised optimization and experimental results show significant performance gains. |
Keywords
» Artificial intelligence » Optimization » Unsupervised