Summary of Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal Llms, by Sihang Zhao et al.
Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs
by Sihang Zhao, Youliang Yuan, Xiaoying Tang, Pinjia He
First submitted to arxiv on: 15 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper investigates why multimodal large language models (MLLMs) fail on simple visual question-answering (VQA) problems despite performing well on complex tasks. It proposes a benchmark, LazyBench, to systematically analyze model behavior and discovers that current advanced MLLMs exhibit “model laziness,” failing on easy questions while successfully describing images. The study finds that stronger models are more prone to this issue and that chain of thought (CoT) can potentially mitigate it. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper looks at why big language models have trouble answering simple questions about pictures, even if they’re good at explaining what’s in the picture. It makes a special test called LazyBench to figure out what’s going on and finds that these models often get easy questions wrong while getting harder ones right. The study says that better models are more likely to make this mistake and that one way to fix it is with something called chain of thought. |
Keywords
» Artificial intelligence » Question answering