Loading Now

Summary of Deneb: a Hallucination-robust Automatic Evaluation Metric For Image Captioning, by Kazuki Matsuda and Yuiga Wada and Komei Sugiura


DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning

by Kazuki Matsuda, Yuiga Wada, Komei Sugiura

First submitted to arxiv on: 28 Sep 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel supervised automatic evaluation metric called DENEB is proposed to address the challenge of developing robust metrics for image captioning that can handle hallucinations. Existing metrics are inadequate due to their limited ability to compare candidate captions with multifaceted reference captions. DENEB incorporates the Sim-Vec Transformer, which processes multiple references simultaneously, efficiently capturing similarity between an image, a candidate caption, and reference captions. The metric is trained on the Nebula dataset, comprising 32,978 images paired with human judgments from 805 annotators. DENEB achieves state-of-the-art performance among existing LLM-free metrics on several datasets, including FOIL, Composite, Flickr8K-Expert, Flickr8K-CF, Nebula, and PASCAL-50S.
Low GrooveSquid.com (original content) Low Difficulty Summary
Image captioning is a challenge in artificial intelligence that involves generating natural language captions for images. This paper proposes a new way to evaluate how well these captions match the actual image. The current methods have some limitations, so this research aims to improve them by developing a better metric called DENEB. DENEB uses a special type of transformer to compare the caption with multiple references to the image. This helps it to accurately capture the relationship between the image and the caption. To train DENEB, the researchers created a large dataset of images and captions that were judged by humans. The results show that DENEB performs better than other methods on several different datasets.

Keywords

» Artificial intelligence  » Image captioning  » Supervised  » Transformer