Loading Now

Summary of Explore the Hallucination on Low-level Perception For Mllms, by Yinan Sun et al.


Explore the Hallucination on Low-level Perception for MLLMs

by Yinan Sun, Zicheng Zhang, Haoning Wu, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Xiongkuo Min

First submitted to arxiv on: 15 Sep 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the self-awareness of Multi-modality Large Language Models (MLLMs) in low-level visual perception and understanding tasks. The authors propose QL-Bench, a benchmark setting to simulate human responses to low-level vision, and construct the LLSAVisionQA dataset containing 2,990 single images and 1,999 image pairs with open-ended questions about their low-level features. Through evaluating 15 MLLMs, they find that while some models excel in low-level visual capabilities, self-awareness remains underdeveloped. Notably, simpler questions are answered more accurately than complex ones, but self-awareness improves when addressing more challenging questions. This study aims to enhance the self-awareness of MLLMs in tasks involving low-level visual perception and understanding.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper looks at how well Artificial Intelligence (AI) systems called Large Language Models understand pictures. These models are really good at recognizing objects, but they sometimes make mistakes when asked simple questions about the picture. The authors created a special test to see if these models can learn from their mistakes and improve their understanding of pictures. They found that some models do better than others in this task, and it seems like asking more complex questions helps them get even better.

Keywords

» Artificial intelligence