Summary of Mlp: Motion Label Prior For Temporal Sentence Localization in Untrimmed 3d Human Motions, by Sheng Yan et al.
MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed 3D Human Motions
by Sheng Yan, Mengyuan Liu, Yong Wang, Yang Liu, Chen Chen, Hong Liu
First submitted to arxiv on: 21 Apr 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a novel method for temporal sentence localization in human motions (TSLM), aiming to locate a specific moment from a 3D human motion that corresponds to a given text query. The authors recognize the limitations of current video localization frameworks extended to TSLM, which can only provide rough predictions due to the low contextual richness and semantic ambiguity between frames. To improve performance, they devise two novel label-prior-assisted training schemes: one incorporates prior knowledge of foreground and background to highlight target moments, and another forces original predictions to overlap with more accurate predictions obtained from flipped start/end prior label sequences during recovery training. The proposed model, termed MLP, outperforms prior works on the BABEL dataset (44.13 recall at IoU@0.7) and HumanML3D (Restore) (71.17 recall). The approach also demonstrates potential in corpus-level moment retrieval. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper tries to solve a tricky problem: finding specific moments in 3D human motion videos that match what we’re saying. They realize that existing methods aren’t good enough because they don’t take into account the limitations of these types of videos, like not having much information about what’s happening around the person moving. To fix this, they come up with two new ways to train their model using prior knowledge and flipping around the start/end times. Their approach works better than others on certain datasets and shows promise for finding moments in a bigger collection of videos. |
Keywords
» Artificial intelligence » Recall