Summary of The Uniqueness Of Llama3-70b Series with Per-channel Quantization, by Minghai Qin
The Uniqueness of LLaMA3-70B Series with Per-Channel Quantizationby Minghai QinFirst submitted to arxiv on: 27…
The Uniqueness of LLaMA3-70B Series with Per-Channel Quantizationby Minghai QinFirst submitted to arxiv on: 27…
GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMsby Maxim Zhelnin, Viktor Moskvoretskii, Egor…
Variational autoencoder-based neural network model compressionby Liang Cheng, Peiyuan Guan, Amir Taherkordi, Lei Liu, Dapeng…
Adaptive Resolution Inference (ARI): Energy-Efficient Machine Learning for Internet of Thingsby Ziheng Wang, Pedro Reviriego,…
1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bitby Chang Gao, Jianfei Chen,…
Jamba-1.5: Hybrid Transformer-Mamba Models at Scaleby Jamba Team, Barak Lenz, Alan Arazi, Amir Bergman, Avshalom…
Smartphone-based Eye Tracking System using Edge Intelligence and Model Optimisationby Nishan Gunawardena, Gough Yumu Lui,…
Matmul or No Matmul in the Era of 1-bit LLMsby Jinendra Malekar, Mohammed E. Elbtity,…
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Modelsby Elias Frantar, Roberto L. Castro, Jiale…