Summary of Robust Adaptation Of Foundation Models with Black-box Visual Prompting, by Changdae Oh et al.
Robust Adaptation of Foundation Models with Black-Box Visual Prompting
by Changdae Oh, Gyeongdeok Seo, Geunyoung Jung, Zhi-Qi Cheng, Hosik Choi, Jiyoung Jung, Kyungwoo Song
First submitted to arxiv on: 4 Jul 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a novel approach called black-box visual prompting (BlackVIP) that enables efficient transfer learning of large-scale pre-trained models (PTMs) without explicit access to model parameters. BlackVIP consists of two components: the Coordinator, which designs input-dependent visual prompts for PTM adaptation, and simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC), which estimates the gradient of PTM to update the Coordinator. The authors also introduce a variant called BlackVIP-SE, which reduces runtime and computational cost. Extensive experiments on 19 datasets demonstrate that BlackVIPs enable robust adaptation to diverse domains and tasks with minimal memory requirements. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper helps us adapt big models to many different tasks without needing all the model’s details. We usually can’t get these details because they’re kept secret or we don’t have enough computer memory. The solution is called black-box visual prompting (BlackVIP). BlackVIP has two parts: one that makes special pictures to help the model learn, and another that figures out how to make those pictures better. This helps the model adapt quickly without using too much computer power. The authors tested this on many different datasets and showed it works well. |
Keywords
» Artificial intelligence » Prompting » Transfer learning