Loading Now

Summary of Distilling Vision-language Foundation Models: a Data-free Approach Via Prompt Diversification, by Yunyi Xuan et al.


Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification

by Yunyi Xuan, Weijie Chen, Shicai Yang, Di Xie, Luojun Lin, Yueting Zhuang

First submitted to arxiv on: 21 Jul 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Multimedia (cs.MM)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Data-Free Knowledge Distillation (DFKD) has shown promise in creating a compact student model while reducing dependency on real training data by generating surrogate data. However, prior arts have rarely been discussed under distribution shifts, which may be vulnerable in real-world applications. Recent Vision-Language Foundation Models, such as CLIP, have demonstrated remarkable performance in zero-shot out-of-distribution generalization but consume heavy computation resources. This paper extends DFKD to Vision-Language Foundation Models without access to large-scale image-text datasets. The goal is to customize a student model for distribution-agnostic downstream tasks with given category concepts, inheriting the out-of-distribution generalization capability from pre-trained foundation models. To avoid generalization degradation, the primary challenge lies in synthesizing diverse surrogate images driven by text prompts. We propose three novel Prompt Diversification methods: Mix-Prompt, Random-Prompt, and Contrastive-Prompt. Experiments on out-of-distribution generalization datasets demonstrate the effectiveness of these methods, with Contrastive-Prompt performing best.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper talks about a way to teach artificial intelligence models new things without needing a lot of data. They want to create a model that can work well even when it sees something it hasn’t seen before. Right now, some AI models are very good at this, but they use a lot of computer power. The researchers are trying to make a better way to do this by using text prompts to generate new images. They propose three different ways to do this and test them on real-world data sets. The results show that one method is much better than the others.

Keywords

* Artificial intelligence  * Generalization  * Knowledge distillation  * Prompt  * Student model  * Zero shot