Summary of Human-aware Vision-and-language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions, by Heng Li et al.
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
by Heng Li, Minghan Li, Zhi-Qi Cheng, Yifei Dong, Yuxuan Zhou, Jun-Yan He, Qi Dai, Teruko Mitamura, Alexander G. Hauptmann
First submitted to arxiv on: 27 Jun 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper presents Human-Aware Vision-and-Language Navigation (HA-VLN), a framework that extends traditional VLN by incorporating dynamic human activities and relaxing key assumptions. HA-VLN aims to develop embodied agents that navigate based on human instructions, which is crucial for real-world applicability. To tackle this challenge, the authors propose the Human-Aware 3D (HA3D) simulator and the Human-Aware Room-to-Room (HA-R2R) dataset, which combine dynamic human activities with 3D environments and provide more realistic navigation scenarios. The authors also introduce two agents: Expert-Supervised Cross-Modal (VLN-CM) and Non-Expert-Supervised Decision Transformer (VLN-DT), which utilize cross-modal fusion and diverse training strategies for effective navigation in dynamic human environments. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about creating AI agents that can navigate through buildings based on human instructions. This is important because it could help us create robots or virtual assistants that can assist humans in their daily lives. The authors created a new way to test these agents, called Human-Aware 3D and Room-to-Room, which makes the tasks more realistic and challenging. They also developed two special types of AI agents that can learn from human instructions. |
Keywords
» Artificial intelligence » Supervised » Transformer