Loading Now

Summary of Vision-and-language Navigation Via Causal Learning, by Liuyi Wang et al.


Vision-and-Language Navigation via Causal Learning

by Liuyi Wang, Zongtao He, Ronghao Dang, Mengjiao Shen, Chengju Liu, Qijun Chen

First submitted to arxiv on: 16 Apr 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper introduces the Generalized Cross-Modal Causal Transformer (GOAT), a novel approach to address dataset bias in Vision-and-Language Navigation (VLN) agents. The GOAT model employs causal inference to mitigate spurious correlations and promote unbiased learning, using modules such as Back-Door and Front-Door Adjustment Causal Learning (BACL and FACL). Additionally, the Cross-Modal Feature Pooling (CFP) module is introduced for capturing global confounder features through contrastive learning. The proposed method outperforms previous state-of-the-art approaches on multiple VLN datasets, including R2R, REVERIE, RxR, and SOON.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps VLN agents understand environments better by reducing bias in their training data. It uses a new way of learning called causal inference to make sure the agents don’t get confused by things that aren’t really important. The approach is tested on several datasets and shows that it can do a better job than previous methods.

Keywords

» Artificial intelligence  » Inference  » Transformer