Attention – Page 199 – GrooveSquid.com

July 13, 2025

The Fine-Grained Complexity of Gradient Computation for Training Large Language Modelsby Josh Alman, Zhao SongFirst…

July 13, 2025

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicryby Michael Zhang, Kush Bhatia,…

July 13, 2025

Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chainsby Ashok Vardhan…

July 13, 2025

CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformersby Adjorn van Engelenhoven, Nicola Strisciuglio, Estefanía…

July 13, 2025

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasksby Jongho Park,…

July 13, 2025

Provably learning a multi-head attention layerby Sitan Chen, Yuanzhi LiFirst submitted to arxiv on: 6…

July 13, 2025

A phase transition between positional and semantic learning in a solvable model of dot-product attentionby…

July 13, 2025

Return-Aligned Decision Transformerby Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, Tetsuro Morimura, Edgar Simo-SerraFirst submitted to…

July 13, 2025

Reinforcement Learning from Bagged Rewardby Yuting Tang, Xin-Qiang Cai, Yao-Xiang Ding, Qiyu Wu, Guoqing Liu,…

July 13, 2025

Face Detection: Present State and Research Directionsby Purnendu Prabhat, Himanshu Gupta, Ajeet Kumar VishwakarmaFirst submitted…