Summary of Quamba: a Post-training Quantization Recipe For Selective State Space Models, by Hung-yueh Chiang et al.
Quamba: A Post-Training Quantization Recipe for Selective State Space Models
by Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Diana Marculescu
First submitted to arxiv on: 17 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a novel quantization method for State Space Models (SSMs) to improve their efficiency and deployment on resource-limited edge applications. SSMs, an alternative to Transformers, achieve state-of-the-art accuracy with constant memory complexity, but existing quantization techniques are poorly suited for them due to sensitive feature maps and massive outliers in the output activations. The proposed static 8-bit per-tensor SSM quantization method suppresses maximum input activation values and quantizes output activations in an outlier-free space using Hadamard transform. This approach achieves a 1.72x lower generation latency on Nvidia Orin Nano 8G, with only a 0.9% drop in average accuracy on zero-shot tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper talks about a new way to make State Space Models (SSMs) work better on devices that don’t have as much power or memory. SSMs are like super-smart calculators that can understand language, but they use up too many resources right now. The authors found a way to shrink the models and make them faster, so we can use them on things like smart home devices or cars. This is important because it means we can have more powerful AI helpers everywhere. |
Keywords
* Artificial intelligence * Quantization * Zero shot