Summary of Chunkattention: Efficient Self-attention with Prefix-aware Kv Cache and Two-phase Partition, by Lu Ye et al.
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partitionby Lu Ye, Ze Tao, Yong…
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partitionby Lu Ye, Ze Tao, Yong…
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformersby M. Emrullah Ildiz, Yixiao…
Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modelingby Guoqi…
An end-to-end attention-based approach for learning on graphsby David Buterez, Jon Paul Janet, Dino Oglic,…
Transformers, parallel computation, and logarithmic depthby Clayton Sanford, Daniel Hsu, Matus TelgarskyFirst submitted to arxiv…
Investigating Out-of-Distribution Generalization of GNNs: An Architecture Perspectiveby Kai Guo, Hongzhi Wen, Wei Jin, Yaming…
The I/O Complexity of Attention, or How Optimal is Flash Attention?by Barna Saha, Christopher YeFirst…
Mesoscale Traffic Forecasting for Real-Time Bottleneck and Shockwave Predictionby Raphael Chekroun, Han Wang, Jonathan Lee,…
Implicit Bias and Fast Convergence Rates for Self-attentionby Bhavya Vasudeva, Puneesh Deora, Christos ThrampoulidisFirst submitted…
Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease Detectionby Pramit…