Summary of At-snn: Adaptive Tokens For Vision Transformer on Spiking Neural Network, by Donghwa Kang et al.
AT-SNN: Adaptive Tokens for Vision Transformer on Spiking Neural Network
by Donghwa Kang, Youngmoon Lee, Eun-Kyu Lee, Brent Kang, Jinkyu Lee, Hyeongboo Baek
First submitted to arxiv on: 22 Aug 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a novel approach called AT-SNN that combines direct training and adaptive computation time (ACT) for spiking neural networks (SNNs)-based vision transformers (ViTs). The goal is to reduce power consumption while maintaining high accuracy. Building on existing methods, the authors adapt ACT from recurrent neural networks (RNNs) and ViTs to SNN-based ViTs, allowing for selective discarding of less informative spatial tokens. Additionally, a token-merge mechanism is introduced that relies on token similarity to further reduce the number of tokens while enhancing accuracy. The AT-SNN is implemented on Spikformer and evaluated on image classification tasks, demonstrating improved energy efficiency and accuracy compared to state-of-the-art approaches. Specifically, it uses up to 42.4% fewer tokens than the existing best-performing method on CIFAR-100, while maintaining higher accuracy. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about making computer vision models more efficient and accurate. It proposes a new way of processing information that combines two ideas: training the model directly and adjusting how much information it processes during use. This approach is called AT-SNN and it’s designed for special types of neural networks called spiking neural networks (SNNs) that are used for vision tasks like image classification. The authors test their approach on several benchmark datasets, including CIFAR10, CIFAR-100, and TinyImageNet, and show that it uses less energy while still achieving high accuracy. |
Keywords
* Artificial intelligence * Image classification * Token