Summary of Towards Compatible Fine-tuning For Vision-language Model Updates, by Zhengbo Wang et al.
Towards Compatible Fine-tuning for Vision-Language Model Updates
by Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan
First submitted to arxiv on: 30 Dec 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Class-conditioned Context Optimization (ContCoOp) approach ensures the effectiveness of plug-and-play modules on downstream tasks even when the underlying foundation model is updated. This paper analyzes various fine-tuning methods on CLIP in terms of their compatibility with model updates, revealing that many high-performing methods fail to adapt to changes in embedding space. The proposed ContCoOp method integrates learnable prompts with class embeddings using an attention layer, allowing prompts to dynamically adapt to changes in the embedding space. Experimental results over 15 datasets demonstrate the highest compatibility and robust out-of-distribution generalization of ContCoOp compared to baseline methods. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper studies how to make fine-tuning modules work well even when the foundation model is updated. Right now, many popular ways to fine-tune models overlook this important issue. The researchers found that most high-performing methods don’t adapt well to changes in the model’s “language” (embedding space). To fix this, they propose a new approach called ContCoOp, which uses attention to make prompts adjust to the changing language. This helps plug-and-play modules stay effective even when the foundation model is updated. They tested it on 15 datasets and showed that ContCoOp does much better than other methods at adapting to changes in the model. |
Keywords
» Artificial intelligence » Attention » Embedding space » Fine tuning » Generalization » Optimization