Loading Now

Summary of Drivlme: Enhancing Llm-based Autonomous Driving Agents with Embodied and Social Experiences, by Yidong Huang et al.


DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

by Yidong Huang, Jacob Sansom, Ziqiao Ma, Felix Gervits, Joyce Chai

First submitted to arxiv on: 5 Jun 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The recent advancements in foundation models (FMs) have opened up new possibilities for autonomous driving, but the experimental settings of these studies are oversimplified and fail to capture the complexity of real-world scenarios. This paper explores whether FM agents can handle long-horizon navigation tasks with free-form dialogue and unexpected situations caused by environmental dynamics or task changes. To address this, we introduce DriVLMe, a video-language-model-based agent that enables natural communication between humans and autonomous vehicles perceiving their environment. DriVLMe is developed from both embodied experiences in simulated environments and social experiences from real human dialogue. The results show competitive performance in open-loop benchmarks and closed-loop human studies, but also reveal several limitations and challenges, including unacceptable inference time, imbalanced training data, limited visual understanding, and difficulties handling unexpected situations.
Low GrooveSquid.com (original content) Low Difficulty Summary
Autonomous vehicles are getting smarter, thanks to foundation models (FMs). But can these FMs really handle real-world driving? Right now, the experimental settings are too simple. What if a driver asks for directions or an unexpected obstacle appears? To find out, researchers created DriVLMe, a special agent that talks to humans and understands its environment. They tested it in both simulated and real-world scenarios. While it did well in some areas, there’s still room for improvement.

Keywords

» Artificial intelligence  » Inference  » Language model