Loading Now

Summary of Is Less More? Exploring Token Condensation As Training-free Test-time Adaptation, by Zixin Wang et al.


Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation

by Zixin Wang, Dong Gong, Sen Wang, Zi Huang, Yadan Luo

First submitted to arxiv on: 16 Oct 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Contrastive Language-Image Pretraining (CLIP) excels in learning generalizable image representations but struggles in zero-shot inference on specific downstream datasets. To address this, researchers investigate token condensation (TC) techniques to refine token usage during inference and improve visual-text alignment in VLMs like CLIP on unseen datasets. However, existing TC methods often fail to maintain in-distribution performance when reducing tokens, prompting the development of a new training-free adaptation method called Token Condensation as Adaptation (TCA). TCA condenses token representation by introducing reservoir-based domain anchor tokens for information-preserving token reduction and logits correction. The proposed method achieves up to a 21.4% performance improvement over the strongest baseline on cross-dataset benchmark and CIFAR-100-Corrupted dataset while reducing GFLOPs by 12.2% to 48.9%, with minimal hyperparameter dependency on both CLIP and SigLIP series.
Low GrooveSquid.com (original content) Low Difficulty Summary
Researchers are trying to make a powerful AI model called Contrastive Language-Image Pretraining (CLIP) work better on new tasks without needing extra training. One way they’re doing this is by changing how the model uses small groups of information, or “tokens”, during testing. This helps the model understand images and text better on new datasets. The team also came up with a new method called Token Condensation as Adaptation (TCA) that makes the model even better without needing more training. TCA works by adjusting how tokens are used and making corrections to improve performance. With this new approach, the model can do tasks 21.4% better than before and use less computer power.

Keywords

» Artificial intelligence  » Alignment  » Hyperparameter  » Inference  » Logits  » Pretraining  » Prompting  » Token  » Zero shot