Summary of Hydra: Sequentially-dependent Draft Heads For Medusa Decoding, by Zachary Ankner et al.
Hydra: Sequentially-Dependent Draft Heads for Medusa Decodingby Zachary Ankner, Rishab Parthasarathy, Aniruddha Nrusimha, Christopher Rinard,…
Hydra: Sequentially-Dependent Draft Heads for Medusa Decodingby Zachary Ankner, Rishab Parthasarathy, Aniruddha Nrusimha, Christopher Rinard,…
Probabilistic ML Verification via Weighted Model Integrationby Paolo Morettin, Andrea Passerini, Roberto SebastianiFirst submitted to…
Open-Vocabulary Calibration for Fine-tuned CLIPby Shuoyuan Wang, Jindong Wang, Guoqing Wang, Bob Zhang, Kaiyang Zhou,…
Online Cascade Learning for Efficient Inference over Streamsby Lunyiu Nie, Zhimin Ding, Erdong Hu, Christopher…
Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning with Triplet Graph Transformersby Md Shamim…
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooksby Albert Tseng, Jerry Chee,…
AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policiesby Xixi Hu, Bo Liu, Xingchao Liu, Qiang LiuFirst…
BiLLM: Pushing the Limit of Post-Training Quantization for LLMsby Wei Huang, Yangdong Liu, Haotong Qin,…
Adaptive Inference: Theoretical Limits and Unexplored Opportunitiesby Soheil Hor, Ying Qian, Mert Pilanci, Amin ArbabianFirst…
NeRCC: Nested-Regression Coded Computing for Resilient Distributed Prediction Serving Systemsby Parsa Moradi, Mohammad Ali Maddah-AliFirst…