Loading Now

Summary of Towards Compatible Fine-tuning For Vision-language Model Updates, by Zhengbo Wang et al.


Towards Compatible Fine-tuning for Vision-Language Model Updates

by Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan

First submitted to arxiv on: 30 Dec 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed Class-conditioned Context Optimization (ContCoOp) approach ensures the effectiveness of plug-and-play modules on downstream tasks even when the underlying foundation model is updated. This paper analyzes various fine-tuning methods on CLIP in terms of their compatibility with model updates, revealing that many high-performing methods fail to adapt to changes in embedding space. The proposed ContCoOp method integrates learnable prompts with class embeddings using an attention layer, allowing prompts to dynamically adapt to changes in the embedding space. Experimental results over 15 datasets demonstrate the highest compatibility and robust out-of-distribution generalization of ContCoOp compared to baseline methods.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper studies how to make fine-tuning modules work well even when the foundation model is updated. Right now, many popular ways to fine-tune models overlook this important issue. The researchers found that most high-performing methods don’t adapt well to changes in the model’s “language” (embedding space). To fix this, they propose a new approach called ContCoOp, which uses attention to make prompts adjust to the changing language. This helps plug-and-play modules stay effective even when the foundation model is updated. They tested it on 15 datasets and showed that ContCoOp does much better than other methods at adapting to changes in the model.

Keywords

» Artificial intelligence  » Attention  » Embedding space  » Fine tuning  » Generalization  » Optimization