Summary of Deneb: a Hallucination-robust Automatic Evaluation Metric For Image Captioning, by Kazuki Matsuda and Yuiga Wada and Komei Sugiura

DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning

by Kazuki Matsuda, Yuiga Wada, Komei Sugiura

First submitted to arxiv on: 28 Sep 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel supervised automatic evaluation metric called DENEB is proposed to address the challenge of developing robust metrics for image captioning that can handle hallucinations. Existing metrics are inadequate due to their limited ability to compare candidate captions with multifaceted reference captions. DENEB incorporates the Sim-Vec Transformer, which processes multiple references simultaneously, efficiently capturing similarity between an image, a candidate caption, and reference captions. The metric is trained on the Nebula dataset, comprising 32,978 images paired with human judgments from 805 annotators. DENEB achieves state-of-the-art performance among existing LLM-free metrics on several datasets, including FOIL, Composite, Flickr8K-Expert, Flickr8K-CF, Nebula, and PASCAL-50S.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Image captioning is a challenge in artificial intelligence that involves generating natural language captions for images. This paper proposes a new way to evaluate how well these captions match the actual image. The current methods have some limitations, so this research aims to improve them by developing a better metric called DENEB. DENEB uses a special type of transformer to compare the caption with multiple references to the image. This helps it to accurately capture the relationship between the image and the caption. To train DENEB, the researchers created a large dataset of images and captions that were judged by humans. The results show that DENEB performs better than other methods on several different datasets.

Keywords

* Artificial intelligence * Image captioning * Supervised * Transformer

DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning

by Kazuki Matsuda, Yuiga Wada, Komei Sugiura

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Hm3: Heterogeneous Multi-class Model Merging, by Stefan Hackmann

Summary of One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos, by Zechen Bai et al.

Related Posts