Summary of Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding, by Bram Willemsen et al.

Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding

by Bram Willemsen, Gabriel Skantze

First submitted to arxiv on: 9 Sep 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper proposes a method for generating referring expressions (REs) in visually grounded dialogue. The approach involves a two-stage process: first, it models REG as a next-token prediction task conditioned on the preceding linguistic context and an image representation of the referent. Then, it uses discourse-aware comprehension to guide the generation of REs and rerank candidate expressions based on their discriminatory power. The results show that the proposed method is effective in producing discriminative REs, with improved performance in text-image retrieval accuracy compared to greedy decoding.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper tries to make computers better at understanding what people are talking about when they describe pictures. They do this by using a special kind of language model that looks at both the words being used and the picture being described. The goal is to come up with phrases (called referring expressions) that accurately point to the correct thing in the picture. The paper shows that their approach works well, making it easier for computers to figure out what people are talking about when they describe pictures.

Keywords

* Artificial intelligence * Discourse * Language model * Token

Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding

by Bram Willemsen, Gabriel Skantze

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of A Survey Of Multimodal Composite Editing and Retrieval, by Suyan Li et al.

Summary of Applying Attribution Explanations in Truth-discovery Quantitative Bipolar Argumentation Frameworks, by Xiang Yin et al.

Related Posts