Summary of Dual Memory Networks: a Versatile Adaptation Approach For Vision-language Models, by Yabin Zhang et al.
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
by Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhang
First submitted to arxiv on: 26 Mar 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a versatile adaptation approach for pre-trained vision-language models, such as CLIP, to adapt to various downstream classification tasks. The method, dual memory networks (DMN), can effectively work under three paradigms: zero-shot adaptation, few-shot adaptation, and training-free few-shot adaptation. DMN comprises dynamic and static memory components that enable the preservation of historical test features and training data knowledge, respectively. This novel capability enhances model performance in the few-shot setting and enables model usability in the absence of training data. The method is tested across 11 datasets under three task settings, outperforming existing methods by over 3% in the zero-shot scenario and exhibiting robust performance against natural distribution shifts. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper presents a way to adapt pre-trained vision-language models to different tasks without needing extra training data. It introduces a new approach called dual memory networks (DMN) that can work well in three situations: zero-shot, few-shot, and no-training-data scenarios. DMN has two parts: one that remembers historical test features and another that keeps track of training data knowledge. This helps the model perform better when there’s little or no extra data available. The approach is tested on 11 datasets and does well in all three situations. |
Keywords
* Artificial intelligence * Classification * Few shot * Zero shot