Self attention – Page 26 – GrooveSquid.com

July 13, 2025

Summary of Interpretable Lightweight Transformer Via Unrolling Of Learned Graph Smoothness Priors, by Tam Thuc Do et al.

Interpretable Lightweight Transformer via Unrolling of Learned Graph Smoothness Priorsby Tam Thuc Do, Parham Eftekhar,…

July 13, 2025

Summary of Pointer-guided Pre-training: Infusing Large Language Models with Paragraph-level Contextual Awareness, by Lars Hillebrand et al.

Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awarenessby Lars Hillebrand, Prabhupad Pradhan, Christian…

July 13, 2025

Summary of Convolutional Neural Networks and Vision Transformers For Fashion Mnist Classification: a Literature Review, by Sonia Bbouzidi et al.

Convolutional Neural Networks and Vision Transformers for Fashion MNIST Classification: A Literature Reviewby Sonia Bbouzidi,…

July 13, 2025

Summary of Block Transformer: Global-to-local Language Modeling For Fast Inference, by Namgyu Ho et al.

Block Transformer: Global-to-Local Language Modeling for Fast Inferenceby Namgyu Ho, Sangmin Bae, Taehyeon Kim, Hyunjik…

July 13, 2025

Summary of Long Range Propagation on Continuous-time Dynamic Graphs, by Alessio Gravina et al.

Long Range Propagation on Continuous-Time Dynamic Graphsby Alessio Gravina, Giulio Lovisotto, Claudio Gallicchio, Davide Bacciu,…

July 13, 2025

Summary of Multi-layer Learnable Attention Mask For Multimodal Tasks, by Wayner Barrios and Souyoung Jin

Multi-layer Learnable Attention Mask for Multimodal Tasksby Wayner Barrios, SouYoung JinFirst submitted to arxiv on:…

July 13, 2025

Summary of A Temporal Kolmogorov-arnold Transformer For Time Series Forecasting, by Remi Genet and Hugo Inzirillo

A Temporal Kolmogorov-Arnold Transformer for Time Series Forecastingby Remi Genet, Hugo InzirilloFirst submitted to arxiv…

July 13, 2025

Summary of Loki: Low-rank Keys For Efficient Sparse Attention, by Prajwal Singhania et al.

Loki: Low-rank Keys for Efficient Sparse Attentionby Prajwal Singhania, Siddharth Singh, Shwai He, Soheil Feizi,…

July 13, 2025

Summary of What Improves the Generalization Of Graph Transformers? a Theoretical Dive Into the Self-attention and Positional Encoding, by Hongkang Li et al.

What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional…

July 13, 2025

Summary of Ffnet: Metamixer-based Efficient Convolutional Mixer Design, by Seokju Yun et al.

FFNet: MetaMixer-based Efficient Convolutional Mixer Designby Seokju Yun, Dongheon Lee, Youngmin RoFirst submitted to arxiv…