Summary of Temporal-viewpoint Transportation Plan For Skeletal Few-shot Action Recognition, by Lei Wang and Piotr Koniusz
Temporal-Viewpoint Transportation Plan for Skeletal Few-shot Action Recognition
by Lei Wang, Piotr Koniusz
First submitted to arxiv on: 30 Oct 2022
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a Few-shot Learning pipeline for 3D skeleton-based action recognition called JEANIE. The approach factors out misalignment between query and support sequences of 3D body joints using an advanced Dynamic Time Warping algorithm that models smooth paths in both temporal and simulated camera viewpoint spaces. The sequences are encoded with a temporal block encoder based on Simple Spectral Graph Convolution, a lightweight linear Graph Neural Network backbone. Additionally, the paper includes a setting with a transformer model. To ensure proper alignment, a similarity-based loss is proposed, which encourages alignment of sequences from the same class while preventing unrelated sequences from aligning. The approach achieves state-of-the-art results on several benchmark datasets, including NTU-60, NTU-120, Kinetics-skeleton, and UWA3D Multiview Activity II. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps computers recognize actions from videos by using a new way to match 3D body joints. The approach is called JEANIE. It’s like solving a puzzle where you need to align the 3D body joints in two different sequences to recognize the action correctly. The paper uses special algorithms and models, such as Graph Neural Networks and transformers, to do this. By using these tools, the approach achieves top results on several benchmark datasets. |
Keywords
* Artificial intelligence * Alignment * Encoder * Few shot * Graph neural network * Transformer