Summary of Multi-object Hallucination in Vision-language Models, by Xuweiyi Chen et al.

Multi-Object Hallucination in Vision-Language Models

by Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

First submitted to arxiv on: 8 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Large vision language models (LVLMs) are prone to object hallucination, where they invent or misperceive objects not present in the image. While current benchmarks focus on a single object class, this study investigates multi-object hallucination and how LVLMs misbehave when tasked with recognizing multiple objects simultaneously. To evaluate this, we introduce Recognition-based Object Probing Evaluation (ROPE), which considers the distribution of object classes within an image and uses visual referring prompts to eliminate ambiguity. Our empirical studies reveal that LVLMs suffer more hallucinations when focusing on multiple objects than a single object, and that the tested object class distribution affects hallucination behaviors. We also found that data-specific factors, such as salience and frequency, and model intrinsic behaviors influence hallucinatory behaviors. This study aims to enable LVLMs to recognize and reason about multiple objects in realistic visual scenes.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Imagine trying to teach a computer to recognize objects in pictures. Sometimes, these computers invent or missee objects that aren’t even there! This paper looks at what happens when these computers are asked to find many objects at once. They found out that the computers do this more often and make mistakes more frequently than when they’re just looking for one object. The researchers also discovered that how the computer is trained affects its ability to recognize objects correctly. They hope that their findings will help computers become better at recognizing and understanding what’s in pictures.

Keywords

» Artificial intelligence » Hallucination

Multi-Object Hallucination in Vision-Language Models

by Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Tumor Likelihood Estimation on Mri Prostate Data by Utilizing K-space Information, By M. Rempe et al.

Summary of Muse: Machine Unlearning Six-way Evaluation For Language Models, by Weijia Shi et al.

Related Posts