Summary of Towards Compatible Fine-tuning For Vision-language Model Updates, by Zhengbo Wang et al.

Towards Compatible Fine-tuning for Vision-Language Model Updates

by Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan

First submitted to arxiv on: 30 Dec 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed Class-conditioned Context Optimization (ContCoOp) approach ensures the effectiveness of plug-and-play modules on downstream tasks even when the underlying foundation model is updated. This paper analyzes various fine-tuning methods on CLIP in terms of their compatibility with model updates, revealing that many high-performing methods fail to adapt to changes in embedding space. The proposed ContCoOp method integrates learnable prompts with class embeddings using an attention layer, allowing prompts to dynamically adapt to changes in the embedding space. Experimental results over 15 datasets demonstrate the highest compatibility and robust out-of-distribution generalization of ContCoOp compared to baseline methods.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper studies how to make fine-tuning modules work well even when the foundation model is updated. Right now, many popular ways to fine-tune models overlook this important issue. The researchers found that most high-performing methods don’t adapt well to changes in the model’s “language” (embedding space). To fix this, they propose a new approach called ContCoOp, which uses attention to make prompts adjust to the changing language. This helps plug-and-play modules stay effective even when the foundation model is updated. They tested it on 15 datasets and showed that ContCoOp does much better than other methods at adapting to changes in the model.

Keywords

» Artificial intelligence » Attention » Embedding space » Fine tuning » Generalization » Optimization

Towards Compatible Fine-tuning for Vision-Language Model Updates

by Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Accelerating Energy-efficient Federated Learning in Cell-free Networks with Adaptive Quantization, by Afsaneh Mahmoudi et al.

Summary of Edgerag: Online-indexed Rag For Edge Devices, by Korakit Seemakhupt et al.

Related Posts