Summary of Compute or Load Kv Cache? Why Not Both?, by Shuowei Jin et al.
Compute Or Load KV Cache? Why Not Both?by Shuowei Jin, Xueshen Liu, Qingzhao Zhang, Z.…
Compute Or Load KV Cache? Why Not Both?by Shuowei Jin, Xueshen Liu, Qingzhao Zhang, Z.…
UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inferenceby Jing Xiong, Jianghan Shen, Fanghua…
LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategyby Rongzhi Zhang, Kuang…
ProcBench: Benchmark for Multi-Step Reasoning and Following Procedureby Ippei Fujisawa, Sensho Nobe, Hiroki Seto, Rina…
DecTrain: Deciding When to Train a Monocular Depth DNN Onlineby Zih-Sing Fu, Soumya Sudhakar, Sertac…
DANA: Domain-Aware Neurosymbolic Agents for Consistency and Accuracyby Vinh Luong, Sang Dinh, Shruti Raghavan, William…
LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferencesby Zhenxiao Fu, Fan Chen, Shan Zhou,…
Selective Attention Improves Transformerby Yaniv Leviathan, Matan Kalman, Yossi MatiasFirst submitted to arxiv on: 3…
Large Language Models as Markov Chainsby Oussama Zekri, Ambroise Odonnat, Abdelhakim Benechehab, Linus Bleistein, Nicolas…
Stochastic variance-reduced Gaussian variational inference on the Bures-Wasserstein manifoldby Hoang Phuc Hau Luu, Hanlin Yu,…