Summary of Matryoshkakv: Adaptive Kv Compression Via Trainable Orthogonal Projection, by Bokai Lin et al.
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projectionby Bokai Lin, Zihao Zeng, Zipeng Xiao, Siqi…
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projectionby Bokai Lin, Zihao Zeng, Zipeng Xiao, Siqi…
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level…
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokensby Zhepeng Cen, Yao Liu, Siliang…
Decomposing The Dark Matter of Sparse Autoencodersby Joshua Engels, Logan Riggs, Max TegmarkFirst submitted to…
Large Language Models Are Overparameterized Text Encodersby Thennal D K, Tim Fischer, Chris BiemannFirst submitted…
Dual-Label Learning With Irregularly Present Labelsby Mingqian Li, Qiao Han, Yiteng Zhai, Ruifeng Li, Yao…
Unscrambling disease progression at scale: fast inference of event permutations with optimal transportby Peter A.…
Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Mediaby Owen…
Revisiting SLO and Goodput Metrics in LLM Servingby Zhibin Wang, Shipeng Li, Yuhang Zhou, Xue…
Latent Weight Diffusion: Generating Policies from Trajectoriesby Shashank Hegde, Gautam Salhotra, Gaurav S. SukhatmeFirst submitted…