Summary of Pre-trained Vision-language Models As Partial Annotators, by Qian-wei Wang et al.
Pre-Trained Vision-Language Models as Partial Annotators
by Qian-Wei Wang, Yuqiu Xie, Letian Zhang, Zimo Liu, Shu-Tao Xia
First submitted to arxiv on: 23 May 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary In a novel approach to applying pre-trained vision-language models, researchers have developed a “pre-trained annotating – weakly-supervised learning” paradigm that leverages large amounts of unlabeled data. This method annotates image samples with multiple prompt templates, generating noisy partial label datasets. A collaborative consistency regularization algorithm is then used to purify training labels and obtain pseudo-labels for self-training. The approach simultaneously trains two neural networks that collaborate to optimize model representation, achieving performances far beyond zero-shot inference without introducing additional label information. In experiments, the method outperforms other weakly supervised learning and few-shot fine-tuning methods. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A new way is being explored to use pre-trained models for different tasks. Instead of using lots of labeled data, researchers are looking at ways to use a lot of unlabeled data. They’re doing this by giving images multiple labels based on what they look like, and then using algorithms to clean up the labels. This helps the model learn more about what it’s seeing. The approach is able to work well without needing a lot of labeled data, which makes it useful for tasks where labeling data can be time-consuming or expensive. |
Keywords
» Artificial intelligence » Few shot » Fine tuning » Inference » Prompt » Regularization » Self training » Supervised » Zero shot