Loading Now

Summary of Vera: Explainable Video Anomaly Detection Via Verbalized Learning Of Vision-language Models, by Muchao Ye et al.


VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models

by Muchao Ye, Weiyang Liu, Pan He

First submitted to arxiv on: 2 Dec 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes a verbalized learning framework called VERA to enable vision-language models (VLMs) to perform video anomaly detection (VAD) without modifying their model parameters. Existing approaches often rely on specialized reasoning modules or instruction tuning datasets, but these strategies incur computational costs or data annotation overhead. Instead, VERA decomposes complex reasoning into simpler guiding questions and optimizes them through verbal interactions between learner and optimizer VLMs using coarsely labeled training data. The framework refines segment-level scores to frame-level scores by fusing scene and temporal contexts. Experimental results show that the learned questions improve both detection performance and explainability of VLMs for VAD.
Low GrooveSquid.com (original content) Low Difficulty Summary
Vera is a new way to use special computer models called vision-language models (VLMs) to find problems in videos. These problems are things that don’t fit with what’s normal in the video. Right now, people have to add extra information or training data to these models to make them good at finding these problems. But Vera is different. It makes the model ask itself questions about what it sees and uses those questions to find the problems. This makes the model better at finding problems and also helps us understand why it found a problem. The results show that Vera works well and can help us find problems in videos.

Keywords

» Artificial intelligence  » Anomaly detection  » Instruction tuning