Loading Now

Summary of Accelerating Transformers with Spectrum-preserving Token Merging, by Hoai-chau Tran et al.


Accelerating Transformers with Spectrum-Preserving Token Merging

by Hoai-Chau Tran, Duy M. H. Nguyen, Duy M. Nguyen, Trung-Tin Nguyen, Ngan Le, Pengtao Xie, Daniel Sonntag, James Y. Zou, Binh T. Nguyen, Mathias Niepert

First submitted to arxiv on: 25 May 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The Transformer architecture, a crucial component in state-of-the-art models for vision and language tasks like GPT and LLaVa, aims to increase its throughput. To achieve this, recent strategies have merged token representations within the Transformer model to reduce computational and memory requirements while maintaining accuracy. However, existing methods, such as Bipartite Soft Matching (BSM), suffer from drawbacks like sensitivity to token-splitting strategies and damage to informative tokens in later layers. This paper presents PiToMe, a novel paradigm that prioritizes the preservation of informative tokens using an energy score metric. This score identifies large clusters of similar tokens as high-energy candidates for merging, while smaller unique clusters are considered low-energy and preserved. Experimental results show that PiToMe saved 40-60% FLOPs compared to base models, with superior off-the-shelf performance on image classification (ViT-MAE-H), image-text retrieval (CLIP on Flickr30k), and visual questions answering (LLaVa-7B). Furthermore, PiToMe theoretically preserves intrinsic spectral properties of the original token space under mild conditions.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making a computer program called the Transformer work more efficiently. The Transformer is used in many state-of-the-art models for tasks like image recognition and language processing. To make it run faster, researchers have tried merging similar words together within the model. However, this approach has some problems, such as damaging important information or being sensitive to how words are split up. This paper presents a new way called PiToMe that prioritizes preserving important information while reducing computational requirements. The results show that PiToMe can make the program run faster and more accurately on certain tasks like image classification, image-text retrieval, and visual questions answering.

Keywords

» Artificial intelligence  » Gpt  » Image classification  » Mae  » Token  » Transformer  » Vit