Transformer – Page 179 – GrooveSquid.com

July 13, 2025

Hierarchical Attention Models for Multi-Relational Graphsby Roshni G. Iyer, Wei Wang, Yizhou SunFirst submitted to…

July 13, 2025

Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformersby Doron Haviv, Russell Zhang Kunes, Thomas Dougherty,…

July 13, 2025

RF-Diffusion: Radio Signal Generation via Time-Frequency Diffusionby Guoxuan Chi, Zheng Yang, Chenshu Wu, Jingao Xu,…

July 13, 2025

TransformerFAM: Feedback attention is working memoryby Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim,…

July 13, 2025

Foundational GPT Model for MEGby Richard Csaky, Mats W.J. van Es, Oiwi Parker Jones, Mark…

July 13, 2025

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Lengthby Xuezhe Ma, Xiaomeng Yang, Wenhan…

July 13, 2025

The Illusion of State in State-Space Modelsby William Merrill, Jackson Petty, Ashish SabharwalFirst submitted to…

July 13, 2025

Inheritune: Training Smaller Yet More Attentive Language Modelsby Sunny Sanyal, Ravid Shwartz-Ziv, Alexandros G. Dimakis,…

July 13, 2025

Transformer-based Joint Modelling for Automatic Essay Scoring and Off-Topic Detectionby Sourya Dipta Das, Yash Vadi,…

July 13, 2025

Revealing Trends in Datasets from the 2022 ACL and EMNLP Conferencesby Jesse Atuhurra, Hidetaka KamigaitoFirst…