Summary of An Exploration Of the Effect Of Quantisation on Energy Consumption and Inference Time Of Starcoder2, by Pepijn De Reus et al.
An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2by…
An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2by…
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inferenceby Janghwan Lee, Jiwoong…
The Super Weight in Large Language Modelsby Mengxia Yu, De Wang, Qi Shan, Colorado Reed,…
Qwen2.5-32B: Leveraging Self-Consistent Tool-Integrated Reasoning for Bengali Mathematical Olympiad Problem Solvingby Saad Tahmid, Sourav SarkerFirst…
Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Modelsby Xiao Liu, Lijun Zhang, Deepak Ganesan, Hui…
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximationby Shih-Yang Liu, Maksim Khadkevich, Nai…
A Counterexample in Cross-Correlation Template Matchingby Serap A. SavariFirst submitted to arxiv on: 24 Oct…
Catastrophic Failure of LLM Unlearning via Quantizationby Zhiwei Zhang, Fali Wang, Xiaomin Li, Zongyu Wu,…
Lossless KV Cache Compression to 2%by Zhen Yang, J.N.Han, Kan Wu, Ruobing Xie, An Wang,…
Channel-Wise Mixed-Precision Quantization for Large Language Modelsby Zihan Chen, Bike Xie, Jundong Li, Cong ShenFirst…