Summary of Vividex: Learning Vision-based Dexterous Manipulation From Human Videos, by Zerui Chen et al.
ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
by Zerui Chen, Shizhe Chen, Etienne Arlaud, Ivan Laptev, Cordelia Schmid
First submitted to arxiv on: 24 Apr 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Machine Learning (cs.LG); Robotics (cs.RO)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a unified vision-based policy for multi-fingered robot hands to manipulate various objects in diverse poses, addressing limitations of previous work. A new framework, ViViDex, uses reinforcement learning with trajectory-guided rewards to train state-based policies from human videos, obtaining natural and physically plausible trajectories. The framework then trains a unified visual policy without privileged information, utilizing coordinate transformation and comparing behavior cloning and diffusion policy training methods. Experimental results demonstrate that ViViDex outperforms state-of-the-art approaches on three dexterous manipulation tasks in simulation and on real robots. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper helps robots learn how to manipulate objects by using videos of humans doing the same thing. The goal is to make the robot’s movements look more natural, but this can be tricky because videos can have mistakes or be unclear. To fix this, the researchers created a new way to use these videos called ViViDex. It first trains separate “state-based policies” for each video and then uses those policies to train a single “visual policy”. This visual policy doesn’t rely on knowing what object it’s manipulating beforehand, which is helpful because real robots don’t have perfect information either. The results show that ViViDex works better than other methods in simulations and with real robots. |
Keywords
» Artificial intelligence » Diffusion » Reinforcement learning