Summary of Exaq: Exponent Aware Quantization For Llms Acceleration, by Moran Shkolnik et al.
EXAQ: Exponent Aware Quantization For LLMs Accelerationby Moran Shkolnik, Maxim Fishman, Brian Chmiel, Hilla Ben-Yaacov,…
EXAQ: Exponent Aware Quantization For LLMs Accelerationby Moran Shkolnik, Maxim Fishman, Brian Chmiel, Hilla Ben-Yaacov,…
Overcoming Representation Bias in Fairness-Aware data Repair using Optimal Transportby Abigail Langbridge, Anthony Quinn, Robert…
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Accelerationby Jintao Zhang, Jia wei, Haofeng Huang, Pengle…
SEAL: SEmantic-Augmented Imitation Learning via Language Modelby Chengyang Gu, Yuxin Pan, Haotian Bai, Hui Xiong,…
Quantized and Asynchronous Federated Learningby Tomas Ortega, Hamid JafarkhaniFirst submitted to arxiv on: 30 Sep…
Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inferenceby Ke Yi, Zengke Liu, Jianwei…
Constraint Guided Model Quantization of Neural Networksby Quinten Van Baelen, Peter KarsmakersFirst submitted to arxiv…
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Coresby Shaobo Ma, Chao…
Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Modelsby Hui-Po Wang,…
INT-FlashAttention: Enabling Flash Attention for INT8 Quantizationby Shimao Chen, Zirui Liu, Zhiying Wu, Ce Zheng,…