Loading Now

Summary of Vanishing Feature: Diagnosing Model Merging and Beyond, by Xingyu Qu et al.


Vanishing Feature: Diagnosing Model Merging and Beyond

by Xingyu Qu, Samuel Horvath

First submitted to arxiv on: 5 Feb 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper addresses the issue of inconsistent performance when combining pre-trained neural networks using model merging. The authors identify a phenomenon called “vanishing features,” where input-induced features diminish during propagation through the merged model, degrading performance. They analyze this issue theoretically and empirically, revealing that it underpins challenges like variance collapse and explains techniques like permutation-based merging and post-merging normalization. Building on these insights, they propose the “Preserve-First Merging” (PFM) strategy, which targets preserving early-layer features to enable the merged models to outperform original models in advanced settings. Additionally, the authors demonstrate that this vanishing feature phenomenon extends to model pruning, where applying post-pruning normalization significantly improves one-shot pruning performance at high sparsity.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps us understand why combining pre-trained neural networks can sometimes fail. The researchers found a problem called “vanishing features” that makes the combined model less effective. They studied this issue and came up with new ideas to fix it. One of these ideas, called “Preserve-First Merging,” helps make the combined model work better than the original models in some cases. This is important because it could be used in many areas where we need to combine different AI models.

Keywords

* Artificial intelligence  * One shot  * Pruning