Summary of Few-shot Recognition Via Stage-wise Retrieval-augmented Finetuning, by Tian Liu et al.

Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning

by Tian Liu, Huixin Zhang, Shubham Parashar, Shu Kong

First submitted to arxiv on: 17 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Few-shot recognition (FSR) aims to train classification models with limited labeled examples, addressing costly annotation challenges. We propose methods leveraging a pre-trained Vision-Language Model (VLM) to solve FSR. Our primary focus is retrieval-augmented learning (RAL), which retrieves data from the VLM’s pretraining set for downstream tasks. Although RAL has been extensively studied in zero-shot recognition, its application in FSR presents novel challenges and opportunities. Interestingly, we find that finetuning a VLM on retrieved data underperforms state-of-the-art zero-shot methods due to imbalanced distribution and domain gaps with few-shot examples. In contrast, simply fine-tuning the VLM solely on few-shot data outperforms previous FSR methods. By combining both approaches, we propose Stage-Wise retrieval-Augmented fineTuning (SWAT), which involves end-to-end finetuning on mixed data followed by retraining the classifier on few-shot data. Extensive experiments demonstrate that SWAT significantly outperforms previous methods by >6% accuracy.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper is about a new way to train computers to recognize things, even when we only have a little bit of information about each thing. The method uses a special kind of computer model called Vision-Language Model (VLM). They tested different ways to use this model and found that some methods worked better than others. One surprising result was that using the VLM with a lot of training data didn’t work as well as just using it with a few examples. This is because the training data wasn’t very similar to the things we wanted the computer to recognize. By combining different approaches, they developed a new method called Stage-Wise retrieval-Augmented fineTuning (SWAT) that worked much better than previous methods.

Keywords

» Artificial intelligence » Classification » Few shot » Fine tuning » Language model » Pretraining » Zero shot

Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning

by Tian Liu, Huixin Zhang, Shubham Parashar, Shu Kong

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Guaranteed Sampling Flexibility For Low-tubal-rank Tensor Completion, by Bowen Su et al.

Summary of Deep-reinforcement-learning-based Aoi-aware Resource Allocation For Ris-aided Iov Networks, by Kangwei Qi et al.

Related Posts