Summary of An End-to-end Two-stream Network Based on Rgb Flow and Representation Flow For Human Action Recognition, by Song-jiang Lai et al.
An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition
by Song-Jiang Lai, Tsun-Hin Cheung, Ka-Chun Fung, Tian-Shan Liu, Kin-Man Lam
First submitted to arxiv on: 27 Nov 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary In this paper, researchers tackle the problem of efficient video-based action recognition using two-stream neural networks. They propose a novel algorithm called representation flow that replaces the optical flow branch in traditional models, reducing computational cost and prediction time. The new approach is designed for egocentric action recognition and incorporates class activation maps (CAMs) to improve accuracy and ConvLSTM with spatial attention for spatio-temporal encoding. Evaluation on three datasets – GTEA61, EGTEA GAZE+, and HMDB – shows the model matches or exceeds the original’s performance while reducing prediction times by up to 99%. Ablation studies also investigate the impact of different parameters on model performance. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper improves video-based action recognition using a new algorithm that makes two-stream neural networks more efficient. The old way of doing things was slow and used a lot of computer power, but this new approach is faster and uses less power. It’s good for recognizing actions in videos taken from a person’s point of view (like wearing a camera). The results show it works just as well or even better than the old method, and it does it much faster. |
Keywords
» Artificial intelligence » Attention » Optical flow