Transformer – Page 138 – GrooveSquid.com

July 13, 2025

Summary of Hdt: Hierarchical Document Transformer, by Haoyu He et al.

HDT: Hierarchical Document Transformerby Haoyu He, Markus Flicke, Jan Buchmann, Iryna Gurevych, Andreas GeigerFirst submitted…

July 13, 2025

Summary of Transformer Block Coupling and Its Correlation with Generalization in Llms, by Murdock Aubry et al.

Transformer Block Coupling and its Correlation with Generalization in LLMsby Murdock Aubry, Haoming Meng, Anton…

July 13, 2025

Summary of Uncovering Layer-dependent Activation Sparsity Patterns in Relu Transformers, by Cody Wild et al.

Uncovering Layer-Dependent Activation Sparsity Patterns in ReLU Transformersby Cody Wild, Jesper AndersonFirst submitted to arxiv…

July 13, 2025

Summary of Toto: Time Series Optimized Transformer For Observability, by Ben Cohen et al.

Toto: Time Series Optimized Transformer for Observabilityby Ben Cohen, Emaad Khwaja, Kan Wang, Charles Masson,…

July 13, 2025

Summary of Teaching Transformers Causal Reasoning Through Axiomatic Training, by Aniket Vashishtha et al.

Teaching Transformers Causal Reasoning through Axiomatic Trainingby Aniket Vashishtha, Abhinav Kumar, Abbavaram Gowtham Reddy, Vineeth…

July 13, 2025

Summary of Fine-tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmetic, by Ruochen Jin et al.

Fine-Tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmeticby Ruochen Jin, Bojian Hou, Jiancong…

July 13, 2025

Summary of Fbi-llm: Scaling Up Fully Binarized Llms From Scratch Via Autoregressive Distillation, by Liqun Ma and Mingjie Sun and Zhiqiang Shen

FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillationby Liqun Ma, Mingjie Sun,…

July 13, 2025

Summary of A Predictive Model Based on Transformer with Statistical Feature Embedding in Manufacturing Sensor Dataset, by Gyeong Taek Lee and Oh-ran Kwon

A Predictive Model Based on Transformer with Statistical Feature Embedding in Manufacturing Sensor Datasetby Gyeong…

July 13, 2025

Summary of Solo: a Single Transformer For Scalable Vision-language Modeling, by Yangyi Chen et al.

SOLO: A Single Transformer for Scalable Vision-Language Modelingby Yangyi Chen, Xingyao Wang, Hao Peng, Heng…

July 13, 2025

Summary of Self-supervised Pretraining For Partial Differential Equations, by Varun Madhavan and Amal S Sebastian and Bharath Ramsundar and Venkatasubramanian Viswanathan

Self-supervised Pretraining for Partial Differential Equationsby Varun Madhavan, Amal S Sebastian, Bharath Ramsundar, Venkatasubramanian ViswanathanFirst…