Summary of Lazyllm: Dynamic Token Pruning For Efficient Long Context Llm Inference, by Qichen Fu et al.
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inferenceby Qichen Fu, Minsik Cho, Thomas…
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inferenceby Qichen Fu, Minsik Cho, Thomas…
Forecasting GPU Performance for Deep Learning Training and Inferenceby Seonho Lee, Amar Phanishayee, Divya MahajanFirst…
MeshFeat: Multi-Resolution Features for Neural Fields on Meshesby Mihir Mahajan, Florian Hofherr, Daniel CremersFirst submitted…
Attention Based Simple Primitives for Open World Compositional Zero-Shot Learningby Ans Munir, Faisal Z. Qureshi,…
Mixture of Experts based Multi-task Supervise Learning from Crowdsby Tao Han, Huaixuan Shi, Xinyi Ding,…
Improving Out-of-Distribution Generalization of Trajectory Prediction for Autonomous Driving via Polynomial Representationsby Yue Yao, Shengchao…
A light-weight and efficient punctuation and word casing prediction model for on-device streaming ASRby Jian…
Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regressionby Aryan Gulati, Xingjian Dong, Carlos Hurtado,…
A Resolution Independent Neural Operatorby Bahador Bahmani, Somdatta Goswami, Ioannis G. Kevrekidis, Michael D. ShieldsFirst…
Scaling Retrieval-Based Language Models with a Trillion-Token Datastoreby Rulin Shao, Jacqueline He, Akari Asai, Weijia…