Loading Now

Summary of Tpc-vit: Token Propagation Controller For Efficient Vision Transformer, by Wentao Zhu


TPC-ViT: Token Propagation Controller for Efficient Vision Transformer

by Wentao Zhu

First submitted to arxiv on: 3 Jan 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a novel token propagation controller (TPC) to improve the efficiency and effectiveness of vision transformers (ViTs) in computer vision tasks. The authors demonstrate that previous assumptions about token redundancy are often incorrect, and that tokens can be redundant in one layer but useful in later layers. To address this challenge, they propose a TPC that incorporates pause probability and restart probability to control token reduction and reuse. They also introduce a smoothing mechanism to improve estimates of token distributions and a model stabilizer to enhance training stability. The proposed method is evaluated on the ImageNet-1K dataset using DeiT, LV-ViT, and Swin models, achieving significant improvements in inference speed (250%) while maintaining classification accuracy (1.0%).
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making computer vision models more efficient. These models are called vision transformers (ViTs), and they’re really good at recognizing pictures. The problem is that they use a lot of computer power, which can be a problem when we need to analyze lots of images. The authors came up with a new way to control how the model uses this information, so it’s more efficient. They tested their idea on a big dataset and found that it worked really well – it was 250% faster while still being just as accurate.

Keywords

* Artificial intelligence  * Classification  * Inference  * Probability  * Token  * Vit