Summary of Quasar-vit: Hardware-oriented Quantization-aware Architecture Search For Vision Transformers, by Zhengang Li et al.

Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

by Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang

First submitted to arxiv on: 25 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed Quasar-ViT framework develops hardware-oriented quantization-aware architecture search techniques to design efficient vision transformer (ViT) models for deployment on resource-limited edge devices. The approach first trains a supernet using mixed-precision quantization and scaling techniques, then applies an efficient search algorithm to determine optimal subnets under different inference latency targets. This is achieved through hardware latency and resource modeling. The searched models demonstrate superior performance, achieving 101.5, 159.6, and 251.6 frames-per-second (FPS) inference speed on the AMD/Xilinx ZCU102 FPGA while maintaining high top-1 accuracy of 80.4%, 78.6%, and 74.9% for the ImageNet dataset.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper creates a way to make vision transformers work better on small devices like smartphones or computers. It does this by developing a new approach called Quasar-ViT, which helps design efficient models that use less computer power while still being very accurate. The method starts by training a big model, then uses special algorithms to find the best parts of it for different situations. This leads to faster and more accurate results on small devices like the AMD/Xilinx ZCU102 FPGA.

Keywords

* Artificial intelligence * Inference * Precision * Quantization * Vision transformer * Vit

Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

by Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Rida: a Robust Attack Framework on Incomplete Graphs, by Jianke Yu et al.

Summary of Geometry Fidelity For Spherical Images, by Anders Christensen et al.

Related Posts