Summary of Chosen: Compilation to Hardware Optimization Stack For Efficient Vision Transformer Inference, by Mohammad Erfan Sadeghi et al.

CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference

by Mohammad Erfan Sadeghi, Arash Fayyazi, Suhas Somashekar, Massoud Pedram

First submitted to arxiv on: 17 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary ViTs have revolutionized computer vision by applying self-attention mechanisms used in NLP to analyze image patches. Despite their advantages, deploying ViTs on FPGAs is challenging due to non-linear calculations and high computational/memory demands. This paper introduces CHOSEN, a software-hardware co-design framework addressing these challenges for optimal ViT deployment on FPGAs. The framework features multi-kernel design, approximate non-linear functions with minimal accuracy degradation, efficient logic block utilization, and an innovative compiler for performance/memory-efficiency optimization via novel algorithmic design space exploration. Compared to state-of-the-art accelerators, CHOSEN achieves 1.5x and 1.42x throughput improvements on DeiT-S and DeiT-B models.
Low	GrooveSquid.com (original content)	Low Difficulty Summary ViTs are a new way of doing computer vision that’s like a superpower for images. They work by looking at small pieces of the image and figuring out what they mean, kind of like how we read words to understand sentences. But when we try to use them on special computers called FPGAs, it gets tricky because these computers are really good at doing simple math, but ViTs need to do complicated math too. This paper shows a new way to make ViTs work better on FPGAs by breaking down the math into smaller pieces and using the computer’s strengths to speed things up. It even shows that this method can make ViT work 1.5 times faster than other ways!

Keywords

* Artificial intelligence * Nlp * Optimization * Self attention * Vit

CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference

by Mohammad Erfan Sadeghi, Arash Fayyazi, Suhas Somashekar, Massoud Pedram

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Merlin: Multimodal Embedding Refinement Via Llm-based Iterative Navigation For Text-video Retrieval-rerank Pipeline, by Donghoon Han et al.

Summary of Wtu-eval: a Whether-or-not Tool Usage Evaluation Benchmark For Large Language Models, by Kangyun Ning et al.

Related Posts