Summary of Maskvd: Region Masking For Efficient Video Object Detection, by Sreetama Sarkar et al.

MaskVD: Region Masking for Efficient Video Object Detection

by Sreetama Sarkar, Gourav Datta, Souvik Kundu, Kai Zheng, Chirayata Bhattacharyya, Peter A. Beerel

First submitted to arxiv on: 16 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper presents a novel strategy for reducing compute-heavy video tasks by leveraging semantic information in images and temporal correlation between frames, achieving significant FLOPs and latency reductions with little performance penalty. The proposed approach, region masking, extracts features from previous frames to skip up to 80% of input regions, improving FLOPs and latency by 3.14x and 1.5x respectively. This is achieved while maintaining similar detection performance as baseline models. The paper demonstrates promising results on Vision Transformers (ViTs) and convolutional neural networks (CNNs), providing latency improvements up to 1.3x using specialized computational kernels.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper makes it possible for state-of-the-art video tasks to be used in real-time applications by reducing the amount of computation needed. This is done by looking at what doesn’t change between frames and skipping those parts. The approach works well with Vision Transformers (ViTs) and other types of neural networks, providing a big improvement in speed while still keeping the same level of performance.

Keywords

* Artificial intelligence

MaskVD: Region Masking for Efficient Video Object Detection

by Sreetama Sarkar, Gourav Datta, Souvik Kundu, Kai Zheng, Chirayata Bhattacharyya, Peter A. Beerel

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Data Selection Method For Assessment Of Autonomous Vehicles, by Linh Trinh et al.

Summary of Learning on Graphs with Large Language Models(llms): a Deep Dive Into Model Robustness, by Kai Guo et al.

Related Posts