Summary of Vtr: An Optimized Vision Transformer For Sar Atr Acceleration on Fpga, by Sachini Wickramasinghe et al.
VTR: An Optimized Vision Transformer for SAR ATR Acceleration on FPGA
by Sachini Wickramasinghe, Dhruv Parikh, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl Busart
First submitted to arxiv on: 6 Apr 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a lightweight Vision Transformer (ViT) model, called VTR, specifically designed for Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) tasks. The ViT architecture has shown great success in various computer vision applications, outperforming Convolutional Neural Networks (CNNs). However, its application to SAR ATR is challenging due to the limited availability of training data and computational requirements. To address these issues, the authors develop a VTR model that can be trained directly on small datasets without pre-training, utilizing Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA) modules. The proposed model is evaluated on three widely used SAR datasets: MSTAR, SynthWakeSAR, and GBSAR. Additionally, the authors propose a novel FPGA accelerator for VTR to enable real-time deployment for SAR ATR applications. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper develops a new model for recognizing targets in images taken by satellites using radar waves. This is important because it could help military systems quickly identify objects from far away. The problem is that these models usually need lots of training data and are very computationally intensive, making them hard to use on small devices or with limited power. To fix this, the authors create a special version of the model that can be trained using less data and is faster. They test it on three different datasets and show that it works well. They also propose a new way to make this model work on devices that aren’t very powerful. |
Keywords
» Artificial intelligence » Self attention » Tokenization » Vision transformer » Vit