Summary of Progressive Mixed-precision Decoding For Efficient Llm Inference, by Hao Mark Chen et al.
Progressive Mixed-Precision Decoding for Efficient LLM Inferenceby Hao Mark Chen, Fuwen Tan, Alexandros Kouris, Royson…
Progressive Mixed-Precision Decoding for Efficient LLM Inferenceby Hao Mark Chen, Fuwen Tan, Alexandros Kouris, Royson…
A theoretical perspective on mode collapse in variational inferenceby Roman Soletskyi, Marylou GabriĆ©, Bruno LoureiroFirst…
LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Modelsby David Hoffmann, Kailash Budhathoki, Matthaeus…
BiasJailbreak:Analyzing Ethical Biases and Jailbreak Vulnerabilities in Large Language Modelsby Isack Lee, Haebin SeongFirst submitted…
GeSubNet: Gene Interaction Inference for Disease Subtype Network Generationby Ziwei Yang, Zheng Chen, Xin Liu,…
AERO: Softmax-Only LLMs for Efficient Private Inferenceby Nandan Kumar Jha, Brandon ReagenFirst submitted to arxiv…
Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Modelsby Jie Ren, Kangrui Chen, Chen Chen,…
In-context KV-Cache Eviction for LLMs via Attention-Gateby Zihao Zeng, Bokai Lin, Tianqi Hou, Hao Zhang,…
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyondby Costin-Andrei Oncescu,…
RecurFormer: Not All Transformer Heads Need Self-Attentionby Ruiqing Yan, Linghan Zheng, Xingbo Du, Han Zou,…