Summary of Generalized Uncertainty-based Evidential Fusion with Hybrid Multi-head Attention For Weak-supervised Temporal Action Localization, by Yuanpeng He et al.
Generalized Uncertainty-Based Evidential Fusion with Hybrid Multi-Head Attention for Weak-Supervised Temporal Action Localization
by Yuanpeng He, Lijian Li, Tianxiang Zhan, Wenpin Jiao, Chi-Man Pun
First submitted to arxiv on: 27 Dec 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper tackles the challenge of weakly supervised temporal action localization (WS-TAL), a task that involves identifying and categorizing complete action instances in videos with video-level labels. The main issue is action-background ambiguity, which arises from background noise and intra-action variation. To address this problem, the authors introduce two novel modules: hybrid multi-head attention (HMHA) and generalized uncertainty-based evidential fusion (GUEF). HMHA enhances RGB and optical flow features by filtering redundant information and adjusting their distribution to better align with the WS-TAL task. GUEF eliminates background noise interference by fusing snippet-level evidences, refining uncertainty measurement, and selecting superior foreground feature information. This enables the model to focus on integral action instances for improved localization and classification performance. Experimental results on the THUMOS14 dataset show that the proposed method outperforms state-of-the-art methods. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper solves a problem in video analysis where we want to identify specific actions happening in videos, like people walking or running. The challenge is that these actions can be mixed with background noise and other movements. To overcome this issue, the authors invent new techniques called HMHA and GUEF. HMHA helps by getting rid of unwanted information and making sure features match what we’re looking for. GUEF removes background noise interference by combining small pieces of evidence to make better decisions. This allows the model to focus on actual actions instead of distractions. The results show that this method works better than others in a popular dataset. |
Keywords
» Artificial intelligence » Classification » Multi head attention » Optical flow » Supervised