Self attention – Page 31 – GrooveSquid.com

July 13, 2025

State Space Model for New-Generation Network Alternative to Transformers: A Surveyby Xiao Wang, Shiao Wang,…

July 13, 2025

BERT-LSH: Reducing Absolute Compute For Attentionby Zezheng Li, Kingston YipFirst submitted to arxiv on: 12…

July 13, 2025

Inheritune: Training Smaller Yet More Attentive Language Modelsby Sunny Sanyal, Ravid Shwartz-Ziv, Alexandros G. Dimakis,…

July 13, 2025

LLoCO: Learning Long Contexts Offlineby Sijun Tan, Xiuyu Li, Shishir Patil, Ziyang Wu, Tianjun Zhang,…

July 13, 2025

Localizing Moments of Actions in Untrimmed Videos of Infants with Autism Spectrum Disorderby Halil Ismail…

July 13, 2025

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budgetby Zihao Wang, Bin…

July 13, 2025

Enhancing IoT Intelligence: A Transformer-based Reinforcement Learning Methodologyby Gaith Rjoub, Saidul Islam, Jamal Bentahar, Mohammed…

July 13, 2025

Graph Neural Networks for Electric and Hydraulic Data Fusion to Enhance Short-term Forecasting of Pumped-storage…

July 13, 2025

On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformersby Cai Zhou,…

July 13, 2025

Optimizing the Deployment of Tiny Transformers on Low-Power MCUsby Victor J.B. Jung, Alessio Burrello, Moritz…