Summary of Visual Hallucination: Definition, Quantification, and Prescriptive Remediations, by Anku Rani et al.
Visual Hallucination: Definition, Quantification, and Prescriptive Remediations
by Anku Rani, Vipula Rawte, Harshad Sharma, Neeraj Anand, Krishnav Rajbangshi, Amit Sheth, Amitava Das
First submitted to arxiv on: 26 Mar 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper investigates hallucination in Vision-Language models (VLMs) by profiling it through two tasks: image captioning and Visual Question Answering. It identifies eight fine-grained orientations of visual hallucination, including contextual guessing, identity incongruity, geographical erratum, and more. To study this phenomenon, the authors create a publicly available dataset called VHILT, comprising 2,000 samples generated using eight VLMs across both tasks, along with human annotations for each category. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Hallucination in AI is a big problem that makes it hard to trust machines. Researchers have been trying to solve this issue in language models, but they haven’t looked at visual models yet. This paper looks at how visual models hallucinate and what kinds of mistakes they make. It finds eight different types of mistakes, like guessing wrong or describing things that aren’t there. To help others study this problem, the authors create a big dataset with many examples of these mistakes. |
Keywords
» Artificial intelligence » Hallucination » Image captioning » Question answering