Summary of Skipping Computations in Multimodal Llms, by Mustafa Shukor and Matthieu Cord

Skipping Computations in Multimodal LLMs

by Mustafa Shukor, Matthieu Cord

First submitted to arxiv on: 12 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper investigates computation redundancy in Multimodal Large Language Models (MLLMs) during inference. It proposes methods to skip computations, such as skipping entire blocks, FFN or self-attention layers, and parallelizing certain layers. The study finds that significant computations can be avoided at inference time, especially for tasks like Visual Question Answering (VQA). Skipping computations during training can recover 97% of the original performance by skipping half of the blocks or removing 70% of the weights. Alternatively, properly training with smaller LLMs can yield comparable performance to larger ones. The work is extended to recent MLLMs, such as LLaVA-1.5, showing similar observations.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper looks at how to make Large Language Models more efficient without sacrificing their ability to do tasks. It finds that some parts of the model are not necessary for certain tasks and can be skipped or done in parallel. This means we can use smaller models that are faster and cheaper but still get good results. The study also shows that we can train these models to work just as well as larger ones, which is important because bigger models require more computer power and storage.

Keywords

» Artificial intelligence » Inference » Question answering » Self attention

Skipping Computations in Multimodal LLMs

by Mustafa Shukor, Matthieu Cord

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Deep Transfer Learning: Model Framework and Error Analysis, by Yuling Jiao et al.

Summary of Hg2p: Hippocampus-inspired High-reward Graph and Model-free Q-gradient Penalty For Path Planning and Motion Control, by Haoran Wang et al.

Related Posts