Summary of Rap: Retrieval-augmented Planner For Adaptive Procedure Planning in Instructional Videos, by Ali Zare et al.
RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos
by Ali Zare, Yulei Niu, Hammad Ayyubi, Shih-fu Chang
First submitted to arxiv on: 27 Mar 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Robotics (cs.RO)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper tackles the challenge of procedure planning in instructional videos by addressing three critical issues: adaptive procedures, temporal relation, and annotation cost. The authors propose a new setting called adaptive procedure planning, where the procedure length is not fixed or pre-determined. They introduce the Retrieval-Augmented Planner (RAP) model to adaptively determine the conclusion of actions using an auto-regressive architecture. RAP also establishes an external memory module to retrieve relevant state-action pairs from training videos and revise generated procedures. To tackle high annotation cost, RAP utilizes weakly-supervised learning to expand the training dataset by generating pseudo labels for action steps. The authors demonstrate the superiority of RAP over traditional fixed-length models on CrossTask and COIN benchmarks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps us make better instructional videos by solving some big problems. Right now, we can’t easily plan procedures that change length or timing, and it takes a lot of work to label what’s happening in these videos. The authors came up with a new way to do procedure planning called adaptive procedure planning, which is more realistic because the video might not always follow the same steps. They also made a special model called RAP that helps figure out when actions are done and remembers important parts from earlier in the video. This makes their solution better than previous ones and can help with many different tasks. |
Keywords
» Artificial intelligence » Supervised