Summary of Rethinking Prompting Strategies For Multi-label Recognition with Partial Annotations, by Samyak Rawlekar et al.
Rethinking Prompting Strategies for Multi-Label Recognition with Partial Annotations
by Samyak Rawlekar, Shubhang Bhatnagar, Narendra Ahuja
First submitted to arxiv on: 12 Sep 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed research introduces novel adaptations of vision-language models for Multi-Label Recognition tasks with partial annotations. Building upon previous work that leverages prompt-learning, this study hypothesizes that learning negative prompts may be suboptimal due to the lack of image-caption pairs focusing on class absence in VLM training datasets. To analyze this hypothesis, the authors introduce two new approaches: PositiveCoOp and NegativeCoOp. The former learns only positive prompts with VLM guidance, while the latter replaces negative prompts with learned embedding vectors. Empirical analysis reveals that negative prompts degrade MLR performance, whereas learning only positive prompts (PositiveCoOp) outperforms dual prompt learning approaches. Furthermore, this study quantifies the benefits of prompt-learning over a simple vision-features-only baseline, demonstrating strong performance comparable to dual prompt learning approach when the proportion of missing labels is low. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary For Multi-Label Recognition tasks with partial annotations, researchers are exploring new ways to adapt vision-language models (VLMs). They’re trying different approaches to improve accuracy. One idea is to learn special “prompts” that help the model understand what’s in an image and what it means. This study looks at two different ways of doing this: one learns positive prompts, which are like clues that help the model recognize things, and another learns negative prompts, which are like warnings that tell the model what not to look for. They found that learning only positive prompts is actually better than trying to learn both kinds of prompts. This study also compared its approach to a simpler way of doing things, and it turned out that they were about as good at recognizing things when there weren’t many missing labels. |
Keywords
» Artificial intelligence » Embedding » Prompt