Loading Now

Summary of What Makes Quantization For Large Language Models Hard? An Empirical Study From the Lens Of Perturbation, by Zhuocheng Gong et al.


What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation

by Zhuocheng Gong, Jiahao Liu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

First submitted to arxiv on: 11 Mar 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the relationship between quantization and the performance of large language models (LLMs). The authors propose a new perspective on quantization, viewing it as perturbations added to the weights and activations of LLMs, which they call “the lens of perturbation”. They conduct experiments with various artificial perturbations to explore their impact on LLM performance. Their findings reveal connections between the properties of perturbations and LLM performance, providing insights into the failure cases of uniform quantization and suggesting potential solutions to improve the robustness of LLM quantization. The authors implement a simple non-uniform quantization approach based on their insights, achieving minimal performance degradation on both 4-bit weight quantization and 8-bit quantization for weights and activations.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper looks at how we can make large language models work better with less memory and computer power. They have a new way of thinking about this, called “the lens of perturbation”, which helps us understand what happens when we change the model’s weights and activations. The authors did some experiments to see how different changes affect the model’s performance. Their results show that some changes can make the model work better or worse, depending on what we’re trying to do. They also came up with a new way of changing the model that works well without sacrificing too much performance.

Keywords

* Artificial intelligence  * Quantization