Loading Now

Summary of Visual Hallucinations Of Multi-modal Large Language Models, by Wen Huang et al.


Visual Hallucinations of Multi-modal Large Language Models

by Wen Huang, Hongbin Liu, Minxin Guo, Neil Zhenqiang Gong

First submitted to arxiv on: 22 Feb 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed tool, VHTest, generates a diverse set of visual hallucination (VH) instances for evaluating multimodal language models’ performance. Existing studies were limited by the diversity of VH instances in existing image datasets, leading to biased understanding of MLLMs’ performance under VH. VHTest finds initial VH instances, generates text descriptions, and uses a generative model like DALL-E-3 to create VH images. The tool collects 1,200 VH instances across 8 modes, showcasing that popular MLLMs like GPT-4V, LLaVA-1.5, and MiniGPT-v2 frequently hallucinate. Fine-tuning these models on the VHTest benchmark reduces hallucination rates without compromising performance on other datasets.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine you’re trying to understand pictures, but your computer program is adding fake details! This problem, called visual hallucination (VH), makes it hard to trust what AI can see. Right now, we don’t know how well these programs work because we only have a few examples of when they get this wrong. To fix this, we made a special tool that creates lots of different fake picture scenarios. We tested some popular AI models and found out they often make mistakes like this! But if we teach them to correct these mistakes, they can still do their job well.

Keywords

* Artificial intelligence  * Fine tuning  * Generative model  * Gpt  * Hallucination