Summary of Hdbn: a Novel Hybrid Dual-branch Network For Robust Skeleton-based Action Recognition, by Jinfu Liu and Baiqiao Yin and Jiaying Lin and Jiajun Wen and Yue Li and Mengyuan Liu
HDBN: A Novel Hybrid Dual-branch Network for Robust Skeleton-based Action Recognition
by Jinfu Liu, Baiqiao Yin, Jiaying Lin, Jiajun Wen, Yue Li, Mengyuan Liu
First submitted to arxiv on: 24 Apr 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Hybrid Dual-Branch Network (HDBN) is a novel approach to skeleton-based action recognition that leverages the strengths of graph convolutional networks and Transformers. The network consists of two trunk branches: MixGCN, which models 2D skeletal modalities using GCNs, and MixFormer, which utilizes Transformers for global information processing. This architecture allows for more robust and accurate recognition of actions in video sequences. In experiments, HDBN outperformed most existing methods on the UAV-Human dataset, achieving accuracies of 47.95% and 75.36%. The proposed approach has implications for various applications, including human-computer interaction, surveillance, and sports analysis. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper proposes a new way to recognize actions in videos using skeletons. It uses two different networks: one that’s good at handling graphs and another that’s good at processing global information. These networks work together to recognize actions more accurately than before. The approach was tested on some videos and did better than most other methods. |