Summary of Claq: Pushing the Limits Of Low-bit Post-training Quantization For Llms, by Haoyu Wang et al.

CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

by Haoyu Wang, Bei Liu, Hang Shao, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian

First submitted to arxiv on: 27 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper presents a novel framework, Column-Level Adaptive weight Quantization (CLAQ), for reducing memory costs and improving computational efficiency of Large Language Models (LLMs). The CLAQ framework introduces three adaptive strategies to overcome the limitations of existing methods in low-bit scenarios. Specifically, it uses K-Means clustering-based algorithms to dynamically generate quantization centroids for each column of a parameter matrix, an outlier-guided adaptive precision search strategy to assign varying bit-widths to different columns, and a dynamic outlier reservation scheme to retain some parameters in their original float point precision. The framework is evaluated on mainstream open-source LLMs, including LLaMA-1, LLaMA-2, and Yi, achieving state-of-the-art results across different bit settings, particularly in extremely low-bit scenarios.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper helps make computer models that understand human language more efficient. It presents a new way to reduce the memory needed for these models while keeping their performance good. The method uses three strategies to adapt to different situations and choose the best way to represent each part of the model. This helps the model work better in low-bit scenarios, which is important because it means the model can be used on devices with limited storage. The new method was tested on several well-known language models and showed better results than previous methods.

Keywords

» Artificial intelligence » Clustering » K means » Llama » Precision » Quantization

CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

by Haoyu Wang, Bei Liu, Hang Shao, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Finding Good Policies in Average-reward Markov Decision Processes Without Prior Knowledge, by Adrienne Tuynman et al.

Summary of Transformer In-context Learning For Categorical Data, by Aaron T. Wang and Ricardo Henao and Lawrence Carin

Related Posts