Loading Now

Summary of Multi-object Hallucination in Vision-language Models, by Xuweiyi Chen et al.


Multi-Object Hallucination in Vision-Language Models

by Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

First submitted to arxiv on: 8 Jul 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Large vision language models (LVLMs) are prone to object hallucination, where they invent or misperceive objects not present in the image. While current benchmarks focus on a single object class, this study investigates multi-object hallucination and how LVLMs misbehave when tasked with recognizing multiple objects simultaneously. To evaluate this, we introduce Recognition-based Object Probing Evaluation (ROPE), which considers the distribution of object classes within an image and uses visual referring prompts to eliminate ambiguity. Our empirical studies reveal that LVLMs suffer more hallucinations when focusing on multiple objects than a single object, and that the tested object class distribution affects hallucination behaviors. We also found that data-specific factors, such as salience and frequency, and model intrinsic behaviors influence hallucinatory behaviors. This study aims to enable LVLMs to recognize and reason about multiple objects in realistic visual scenes.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine trying to teach a computer to recognize objects in pictures. Sometimes, these computers invent or missee objects that aren’t even there! This paper looks at what happens when these computers are asked to find many objects at once. They found out that the computers do this more often and make mistakes more frequently than when they’re just looking for one object. The researchers also discovered that how the computer is trained affects its ability to recognize objects correctly. They hope that their findings will help computers become better at recognizing and understanding what’s in pictures.

Keywords

» Artificial intelligence  » Hallucination