Summary of Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal Llms, by Sihang Zhao et al.

Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs

by Sihang Zhao, Youliang Yuan, Xiaoying Tang, Pinjia He

First submitted to arxiv on: 15 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper investigates why multimodal large language models (MLLMs) fail on simple visual question-answering (VQA) problems despite performing well on complex tasks. It proposes a benchmark, LazyBench, to systematically analyze model behavior and discovers that current advanced MLLMs exhibit “model laziness,” failing on easy questions while successfully describing images. The study finds that stronger models are more prone to this issue and that chain of thought (CoT) can potentially mitigate it.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper looks at why big language models have trouble answering simple questions about pictures, even if they’re good at explaining what’s in the picture. It makes a special test called LazyBench to figure out what’s going on and finds that these models often get easy questions wrong while getting harder ones right. The study says that better models are more likely to make this mistake and that one way to fix it is with something called chain of thought.

Keywords

» Artificial intelligence » Question answering

Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs

by Sihang Zhao, Youliang Yuan, Xiaoying Tang, Pinjia He

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Improving Bias in Facial Attribute Classification: a Combined Impact Of Kl Divergence Induced Loss Function and Dual Attention, by Shweta Patel and Dakshina Ranjan Kisku

Summary of Videgothink: Assessing Egocentric Video Understanding Capabilities For Embodied Ai, by Sijie Cheng et al.

Related Posts