Summary of Kv Cache Is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization, by Tianyi Zhang et al.
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantizationby…
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantizationby…
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Lengthby Xuezhe Ma, Xiaomeng Yang, Wenhan…
A Dataset and Benchmark for Hospital Course Summarization with Adapted Large Language Modelsby Asad Aali,…
Hyperparameter Tuning MLPs for Probabilistic Time Series Forecastingby Kiran Madhusudhanan, Shayan Jawed, Lars Schmidt-ThiemeFirst submitted…
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attentionby Tianyi Zhang, Jonah Wonkyu Yi, Bowen…
Towards Understanding Inductive Bias in Transformers: A View From Infinityby Itay Lavie, Guy Gur-Ari, Zohar…
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantizationby Coleman Hooper, Sehoon…