Attention – Page 139 – GrooveSquid.com

July 13, 2025

Efficient World Models with Context-Aware Tokenizationby Vincent Micheli, Eloi Alonso, François FleuretFirst submitted to arxiv…

July 13, 2025

All Random Features Representations are Equivalentby Luke Sernau, Silvano Bonacina, Rif A. SaurousFirst submitted to…

July 13, 2025

A Closer Look into Mixture-of-Experts in Large Language Modelsby Ka Man Lo, Zeyu Huang, Zihan…

July 13, 2025

Transformer Normalisation Layers and the Independence of Semantic Subspacesby Stephen Menary, Samuel Kaski, Andre FreitasFirst…

July 13, 2025

Interpreting Attention Layer Outputs with Sparse Autoencodersby Connor Kissane, Robert Krzyzanowski, Joseph Isaac Bloom, Arthur…

July 13, 2025

Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errorsby Vikas Yadav, Zheng…

July 13, 2025

Are Language Models Actually Useful for Time Series Forecasting?by Mingtian Tan, Mike A. Merrill, Vinayak…

July 13, 2025

Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformersby Chao Lou,…

July 13, 2025

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Modelsby Yash Akhauri, Ahmed F AbouElhamayed, Jordan Dotzel,…

July 13, 2025

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilizationby Cheng-Yu Hsieh, Yung-Sung…