Summary of Quasar-vit: Hardware-oriented Quantization-aware Architecture Search For Vision Transformers, by Zhengang Li et al.
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
by Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang
First submitted to arxiv on: 25 Jul 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Quasar-ViT framework develops hardware-oriented quantization-aware architecture search techniques to design efficient vision transformer (ViT) models for deployment on resource-limited edge devices. The approach first trains a supernet using mixed-precision quantization and scaling techniques, then applies an efficient search algorithm to determine optimal subnets under different inference latency targets. This is achieved through hardware latency and resource modeling. The searched models demonstrate superior performance, achieving 101.5, 159.6, and 251.6 frames-per-second (FPS) inference speed on the AMD/Xilinx ZCU102 FPGA while maintaining high top-1 accuracy of 80.4%, 78.6%, and 74.9% for the ImageNet dataset. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper creates a way to make vision transformers work better on small devices like smartphones or computers. It does this by developing a new approach called Quasar-ViT, which helps design efficient models that use less computer power while still being very accurate. The method starts by training a big model, then uses special algorithms to find the best parts of it for different situations. This leads to faster and more accurate results on small devices like the AMD/Xilinx ZCU102 FPGA. |
Keywords
* Artificial intelligence * Inference * Precision * Quantization * Vision transformer * Vit