Summary of A Survey on Hallucination in Large Vision-language Models, by Hanchao Liu and Wenyuan Xue and Yifei Chen and Dapeng Chen and Xiutian Zhao and Ke Wang and Liping Hou and Rongjun Li and Wei Peng
A Survey on Hallucination in Large Vision-Language Models
by Hanchao Liu, Wenyuan Xue, Yifei Chen, Dapeng Chen, Xiutian Zhao, Ke Wang, Liping Hou, Rongjun Li, Wei Peng
First submitted to arxiv on: 1 Feb 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Computation and Language (cs.CL); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The recent advancements in Large Vision-Language Models (LVLMs) have sparked significant interest in the AI community, but their practical implementation is hindered by the issue of “hallucination” – a misalignment between factual visual content and corresponding textual generation. This comprehensive survey aims to provide an overview and facilitate future mitigation efforts by dissecting LVLM-related hallucinations. The paper clarifies the concept of hallucinations in LVLMs, highlighting various symptoms and challenges. It also outlines benchmarks and methodologies for evaluating these unique hallucinations and investigates their root causes, including insights from training data and model components. Furthermore, the survey critically reviews existing methods for mitigating hallucinations. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large Vision-Language Models have made big progress, but they can get confused about what’s real or not. This is called “hallucination.” The problem is that LVLMs are really good at making text based on what they see, but sometimes they make things up! This survey looks at why this happens and how to fix it. |
Keywords
* Artificial intelligence * Hallucination