Summary of Stitch Contrast and Segment_learning a Human Action Segmentation Model Using Trimmed Skeleton Videos, by Haitao Tian et al.
Stitch Contrast and Segment_Learning a Human Action Segmentation Model Using Trimmed Skeleton Videos
by Haitao Tian, Pierre Payeur
First submitted to arxiv on: 19 Dec 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This novel framework introduces a three-step approach, comprising Stitch, Contrast, and Segment, for skeleton-based action segmentation trained on short trimmed videos that can run on longer un-trimmed videos. The Stitch step generates multi-action stitched sequences by treating trimmed skeleton videos as elementary human motions. Contrast learns contrastive representations from these sequences using a novel discrimination pretext task, enabling the skeleton encoder to learn meaningful action-temporal contexts. The Segment step relates the proposed method to action segmentation, learning a segmentation layer while handling particular data availability. This framework is evaluated on a trimmed source dataset and an un-trimmed target dataset in an adaptation formulation for real-world skeleton-based human action segmentation. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research develops a new way to recognize human actions from video recordings of skeletons. Right now, most methods rely on short videos with clear actions, but this limits their use in real-life situations where we have longer, more complicated videos. The solution involves three steps: first, the “Stitch” part connects shorter videos together into longer ones that include multiple actions. Next, the “Contrast” step teaches a computer to recognize patterns and learn from these stitched sequences. Finally, the “Segment” step uses this learning to separate individual actions within a longer video. This method is tested on both short and long videos and shows promising results for recognizing human actions in real-world scenarios. |
Keywords
» Artificial intelligence » Encoder