Summary of Vtr: An Optimized Vision Transformer For Sar Atr Acceleration on Fpga, by Sachini Wickramasinghe et al.

VTR: An Optimized Vision Transformer for SAR ATR Acceleration on FPGA

by Sachini Wickramasinghe, Dhruv Parikh, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl Busart

First submitted to arxiv on: 6 Apr 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper proposes a lightweight Vision Transformer (ViT) model, called VTR, specifically designed for Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) tasks. The ViT architecture has shown great success in various computer vision applications, outperforming Convolutional Neural Networks (CNNs). However, its application to SAR ATR is challenging due to the limited availability of training data and computational requirements. To address these issues, the authors develop a VTR model that can be trained directly on small datasets without pre-training, utilizing Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA) modules. The proposed model is evaluated on three widely used SAR datasets: MSTAR, SynthWakeSAR, and GBSAR. Additionally, the authors propose a novel FPGA accelerator for VTR to enable real-time deployment for SAR ATR applications.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper develops a new model for recognizing targets in images taken by satellites using radar waves. This is important because it could help military systems quickly identify objects from far away. The problem is that these models usually need lots of training data and are very computationally intensive, making them hard to use on small devices or with limited power. To fix this, the authors create a special version of the model that can be trained using less data and is faster. They test it on three different datasets and show that it works well. They also propose a new way to make this model work on devices that aren’t very powerful.

Keywords

* Artificial intelligence * Self attention * Tokenization * Vision transformer * Vit

VTR: An Optimized Vision Transformer for SAR ATR Acceleration on FPGA

by Sachini Wickramasinghe, Dhruv Parikh, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl Busart

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Cluster-based Video Summarization with Temporal Context Awareness, by Hai-dang Huynh-lam et al.

Summary of Efficient Learnable Collaborative Attention For Single Image Super-resolution, by Yigang Zhao Chaowei Zheng et al.

Related Posts