Attention – Page 158 – GrooveSquid.com

July 13, 2025

Mixture of Experts Meets Prompt-Based Continual Learningby Minh Le, An Nguyen, Huy Nguyen, Trang Nguyen,…

July 13, 2025

Attention as an RNNby Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio,…

July 13, 2025

Mitigating Interference in the Knowledge Continuum through Attention-Guided Incremental Learningby Prashant Bhat, Bharath Renjith, Elahe…

July 13, 2025

DCT-Based Decorrelated Attention for Vision Transformersby Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Koushik Biswas, Ahmet…

July 13, 2025

A Transformer variant for multi-step forecasting of water level and hydrometeorological sensitivity analysis based on…

July 13, 2025

Generalized Laplace Approximationby Yinsong Chen, Samson S. Yu, Zhong Li, Chee Peng LimFirst submitted to…

July 13, 2025

FAITH: Frequency-domain Attention In Two Horizons for Time Series Forecastingby Ruiqi Li, Maowei Jiang, Kai…

July 13, 2025

Dynamic Context Adaptation and Information Flow Control in Transformers: Introducing the Evaluator Adjuster Unit and…

July 13, 2025

Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculumby Hadi Pouransari, Chun-Liang Li, Jen-Hao…

July 13, 2025

Reducing Transformer Key-Value Cache Size with Cross-Layer Attentionby William Brandon, Mayank Mishra, Aniruddha Nrusimha, Rameswar…