Loading Now

Summary of Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal Llms, by Sihang Zhao et al.


Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs

by Sihang Zhao, Youliang Yuan, Xiaoying Tang, Pinjia He

First submitted to arxiv on: 15 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper investigates why multimodal large language models (MLLMs) fail on simple visual question-answering (VQA) problems despite performing well on complex tasks. It proposes a benchmark, LazyBench, to systematically analyze model behavior and discovers that current advanced MLLMs exhibit “model laziness,” failing on easy questions while successfully describing images. The study finds that stronger models are more prone to this issue and that chain of thought (CoT) can potentially mitigate it.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper looks at why big language models have trouble answering simple questions about pictures, even if they’re good at explaining what’s in the picture. It makes a special test called LazyBench to figure out what’s going on and finds that these models often get easy questions wrong while getting harder ones right. The study says that better models are more likely to make this mistake and that one way to fix it is with something called chain of thought.

Keywords

» Artificial intelligence  » Question answering