Summary of Mixpe: Quantization and Hardware Co-design For Efficient Llm Inference, by Yu Zhang et al.
MixPE: Quantization and Hardware Co-design for Efficient LLM Inferenceby Yu Zhang, Mingzi Wang, Lancheng Zou,…
MixPE: Quantization and Hardware Co-design for Efficient LLM Inferenceby Yu Zhang, Mingzi Wang, Lancheng Zou,…
FLARE: FP-Less PTQ and Low-ENOB ADC Based AMS-PiM for Error-Resilient, Fast, and Efficient Transformer Accelerationby…
AutoMixQ: Self-Adjusting Quantization for High Performance Memory-Efficient Fine-Tuningby Changhai Zhou, Shiyang Zhang, Yuhua Zhou, Zekai…
EfQAT: An Efficient Framework for Quantization-Aware Trainingby Saleh Ashkboos, Bram Verhoef, Torsten Hoefler, Evangelos Eleftheriou,…
Communication Compression for Tensor Parallel LLM Inferenceby Jan Hansen-Palmus, Michael Truong Le, Oliver Hausdörfer, Alok…
Towards Low-bit Communication for Tensor Parallel LLM Inferenceby Harry Dong, Tyler Johnson, Minsik Cho, Emad…
ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantizationby Weibo Zhao, Yubin Shi,…
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysisby Zhijie Chen, Qiaobo Li, Arindam BanerjeeFirst…
Expansion Quantization Network: An Efficient Micro-emotion Annotation and Detection Frameworkby Jingyi Zhou, Senlin Luo, Haofan…
Intelligent Fault Diagnosis of Type and Severity in Low-Frequency, Low Bit-Depth Signalsby Tito Spadini, Kenji…