Summary of Vera: Explainable Video Anomaly Detection Via Verbalized Learning Of Vision-language Models, by Muchao Ye et al.

VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models

by Muchao Ye, Weiyang Liu, Pan He

First submitted to arxiv on: 2 Dec 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper proposes a verbalized learning framework called VERA to enable vision-language models (VLMs) to perform video anomaly detection (VAD) without modifying their model parameters. Existing approaches often rely on specialized reasoning modules or instruction tuning datasets, but these strategies incur computational costs or data annotation overhead. Instead, VERA decomposes complex reasoning into simpler guiding questions and optimizes them through verbal interactions between learner and optimizer VLMs using coarsely labeled training data. The framework refines segment-level scores to frame-level scores by fusing scene and temporal contexts. Experimental results show that the learned questions improve both detection performance and explainability of VLMs for VAD.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Vera is a new way to use special computer models called vision-language models (VLMs) to find problems in videos. These problems are things that don’t fit with what’s normal in the video. Right now, people have to add extra information or training data to these models to make them good at finding these problems. But Vera is different. It makes the model ask itself questions about what it sees and uses those questions to find the problems. This makes the model better at finding problems and also helps us understand why it found a problem. The results show that Vera works well and can help us find problems in videos.

Keywords

» Artificial intelligence » Anomaly detection » Instruction tuning

VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models

by Muchao Ye, Weiyang Liu, Pan He

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Tgtod: a Global Temporal Graph Transformer For Outlier Detection at Scale, by Kay Liu et al.

Summary of Dense Dynamics-aware Reward Synthesis: Integrating Prior Experience with Demonstrations, by Cevahir Koprulu et al.

Related Posts