Summary of Mbq: Modality-balanced Quantization For Large Vision-language Models, by Shiyao Li et al.

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

by Shiyao Li, Yingchun Hu, Xuefei Ning, Xihui Liu, Ke Hong, Xiaotao Jia, Xiuhong Li, Yaqi Yan, Pei Ran, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang

First submitted to arxiv on: 27 Dec 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper proposes a novel approach to post-training quantization (PTQ) for vision-language models (VLMs), specifically addressing the differences in sensitivity between language and vision tokens. The existing PTQ methods mainly focus on large language models, neglecting the distinct characteristics of other modalities. The proposed Modality-Balanced Quantization (MBQ) method incorporates these sensitivities during calibration to minimize reconstruction loss for better quantization parameters. Experimental results demonstrate that MBQ improves task accuracy by up to 4.4% and 11.6% under W3 and W4A8 quantization for VLMs with parameter sizes ranging from 7B to 70B, outperforming state-of-the-art (SOTA) baselines. Additionally, the authors implement a W3 GPU kernel that fuses dequantization and GEMV operators, achieving a 1.4x speedup on LLaVA-onevision-7B using an RTX 4090.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper is about making big computer models smaller so they can work faster and use less memory. These models are used for things like recognizing pictures and understanding what people say. The problem is that these models need a lot of processing power, which makes them slow and hard to use. To solve this, the researchers developed a new way to make these models smaller while keeping them just as good at doing their jobs. They tested it on some big models and found that it worked really well, making the models faster and more efficient.

Keywords

* Artificial intelligence * Quantization

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

by Shiyao Li, Yingchun Hu, Xuefei Ning, Xihui Liu, Ke Hong, Xiaotao Jia, Xiuhong Li, Yaqi Yan, Pei Ran, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of A Theory Of Appropriateness with Applications to Generative Artificial Intelligence, by Joel Z. Leibo et al.

Summary of P3s-diffusion:a Selective Subject-driven Generation Framework Via Point Supervision, by Junjie Hu et al.

Related Posts