Loading Now

Summary of Ikea Manuals at Work: 4d Grounding Of Assembly Instructions on Internet Videos, by Yunong Liu et al.


IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos

by Yunong Liu, Cristobal Eyzaguirre, Manling Li, Shubh Khanna, Juan Carlos Niebles, Vineeth Ravi, Saumitra Mishra, Weiyu Liu, Jiajun Wu

First submitted to arxiv on: 18 Nov 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed dataset, IKEA Video Manuals, tackles the challenge of shape assembly by introducing a novel 4D grounding approach. The dataset features 3D models of furniture parts, instructional manuals, and videos, along with dense spatio-temporal alignments between these modalities. This enables the development of autonomous agents that can construct complex structures like IKEA furniture. To demonstrate the utility of this dataset, five applications are presented: assembly plan generation, part-conditioned segmentation, pose estimation, video object segmentation, and furniture assembly based on instructional videos. Each application provides evaluation metrics and baseline methods.
Low GrooveSquid.com (original content) Low Difficulty Summary
Shape assembly is an important task in daily life that requires a holistic understanding of assembly in 3D space over time. A new dataset called IKEA Video Manuals aims to tackle this challenge by introducing a novel approach that grounds assembly instructions in videos. The dataset features 3D models, instructional manuals, and videos, along with annotations that show how these data modalities are connected. This will help develop machines that can assemble complex structures like furniture. Five applications are presented to demonstrate the usefulness of this dataset: planning how to assemble something, segmenting parts, estimating poses, segmenting objects in a video, and assembling furniture based on instructional videos.

Keywords

» Artificial intelligence  » Grounding  » Pose estimation