Loading Now

Summary of Fine-tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmetic, by Ruochen Jin et al.


Fine-Tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmetic

by Ruochen Jin, Bojian Hou, Jiancong Xiao, Weijie Su, Li Shen

First submitted to arxiv on: 9 Jul 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper explores the concept of task arithmetic, which combines pre-trained models for various tasks into a unified model. The approach is efficient and cost-effective as it does not require training on large datasets like traditional methods do. However, this combination can lead to interference from unrelated tasks, known as lack of weight disentanglement. To address this issue, the paper employs Neural Tangent Kernel (NTK) linearization, which facilitates weight disentanglement and mitigates adverse effects. While NTK linearization has its benefits, it also presents drawbacks such as doubled training costs and reduced performance. The authors propose a simple yet effective method to fine-tune attention modules in Transformer models, which improves weight disentanglement. They conduct a comprehensive study of task arithmetic, differentiating the roles of representation and task-specific modules. Their findings show that the representation module plays an important role in improving weight disentanglement, while task-specific modules can degrade performance.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about finding a way to combine many AI models into one efficient model. Right now, we have to train each model separately on big datasets, which takes a lot of time and effort. This new approach, called task arithmetic, lets us combine pre-trained models without having to retrain them. It’s like taking many small puzzles and combining them into one big puzzle that still fits together perfectly. The challenge is that when we combine these models, they can interfere with each other, which makes it hard to get accurate results. To fix this problem, the authors used a special technique called Neural Tangent Kernel linearization. This helped make the combined model work better and reduced interference between tasks. However, this approach has some drawbacks, like taking twice as long to train and performing slightly worse on individual tasks. The authors came up with a new solution that only fine-tunes certain parts of the model, which makes it more efficient and effective.

Keywords

* Artificial intelligence  * Attention  * Transformer