Loading Now

Summary of Just Shift It: Test-time Prototype Shifting For Zero-shot Generalization with Vision-language Models, by Elaine Sui et al.


Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models

by Elaine Sui, Xiaohan Wang, Serena Yeung-Levy

First submitted to arxiv on: 19 Mar 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed Test-Time Prototype Shifting (TPS) framework is a pioneering approach designed to adapt vision-language models (VLMs) to test datasets using unlabeled test inputs. This medium-difficulty summary highlights the TPS method’s ability to modulate per-class prototypes in the shared embedding space, facilitating optimization-free prototype reuse and seamless integration with prompt engineering advancements. The framework dynamically learns shift vectors for each prototype based solely on the given test sample, effectively bridging domain gaps and enhancing classification accuracy. This is achieved with significantly reduced memory and computational demands compared to conventional text-prompt tuning methods. The TPS framework demonstrates superior performance across 15 image classification datasets involving natural distribution shifts and cross-dataset generalization, as well as in context-dependent visual reasoning.
Low GrooveSquid.com (original content) Low Difficulty Summary
The TPS framework is a new way for computers to understand what they’re looking at. It helps vision-language models (VLMs) work better when they’re shown something new that’s not exactly like anything they’ve seen before. This happens by changing how the model thinks about different things, kind of like how humans do when we learn from experience. The best part is that it does this without needing extra training or data, which makes it really useful for practical applications.

Keywords

* Artificial intelligence  * Classification  * Embedding space  * Generalization  * Image classification  * Optimization  * Prompt