Summary of Pushing the Envelope Of Low-bit Llm Via Dynamic Error Compensation, by Yeonhong Park et al.
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensationby Yeonhong Park, Jake Hyun, Hojoon…
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensationby Yeonhong Park, Jake Hyun, Hojoon…
Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scalesby Shuokai Pan, Gerti Tuzi, Sudarshan Sreeram,…
Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computingby Inpyo Hong, Youngwan…
Semantic Residual for Multimodal Unified Discrete Representationby Hai Huang, Shulei Wang, Yan XiaFirst submitted to…
Recommending Pre-Trained Models for IoT Devicesby Parth V. Patil, Wenxin Jiang, Huiyun Peng, Daniel Lugo,…
Unified Stochastic Framework for Neural Network Quantization and Pruningby Haoyu Zhang, Rayan SaabFirst submitted to…
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inferenceby Chao Zeng, Songwei Liu,…
Improving Quantization-aware Training of Low-Precision Network via Block Replacement on Full-Precision Counterpartby Chengting Yu, Shu…
Preventing Local Pitfalls in Vector Quantization via Optimal Transportby Borui Zhang, Wenzhao Zheng, Jie Zhou,…
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Designby Zhen Zheng, Xiaonan…