Summary of Block-attention For Efficient Rag, by East Sun et al.
Block-Attention for Efficient RAGby East Sun, Yan Wang, Lan TianFirst submitted to arxiv on: 14…
Block-Attention for Efficient RAGby East Sun, Yan Wang, Lan TianFirst submitted to arxiv on: 14…
A Diagonal Structured State Space Model on Loihi 2 for Efficient Streaming Sequence Processingby Svea…
Patch Ranking: Efficient CLIP by Learning to Rank Local Patchesby Cheng-En Wu, Jinhong Lin, Yu…
Backtracking Improves Generation Safetyby Yiming Zhang, Jianfeng Chi, Hailey Nguyen, Kartikeya Upasani, Daniel M. Bikel,…
On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialistsby Dongyang Fan, Bettina Messmer,…
Test Time Learning for Time Series Forecastingby Panayiotis Christou, Shichu Chen, Xupeng Chen, Parijat DubeFirst…
Context-Aware Membership Inference Attacks against Pre-trained Large Language Modelsby Hongyan Chang, Ali Shahin Shamsabadi, Kleomenis…
TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learningby Shivam Shandilya, Menglin Xia, Supriyo Ghosh,…
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Mannerby Yuzhang Shang, Bingxin Xu, Weitai Kang,…
GRIN: GRadient-INformed MoEby Liyuan Liu, Young Jin Kim, Shuohang Wang, Chen Liang, Yelong Shen, Hao…