Token – Page 74 – GrooveSquid.com

July 13, 2025

Mechanics of Next Token Prediction with Self-Attentionby Yingcong Li, Yixiao Huang, M. Emrullah Ildiz, Ankit…

July 13, 2025

Conditional computation in neural networks: principles and research trendsby Simone Scardapane, Alessandro Baiocchi, Alessio Devoto,…

July 13, 2025

Transformers Learn Low Sensitivity Functions: Investigations and Implicationsby Bhavya Vasudeva, Deqing Fu, Tianyi Zhou, Elliott…

July 13, 2025

The pitfalls of next-token predictionby Gregor Bachmann, Vaishnavh NagarajanFirst submitted to arxiv on: 11 Mar…

July 13, 2025

Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantificationby Ekaterina Fadeeva, Aleksandr Rubashevskii,…

July 13, 2025

Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxyby Yu Zhu, Chuxiong Sun,…

July 13, 2025

Learning to Decode Collaboratively with Multiple Language Modelsby Shannon Zejiang Shen, Hunter Lang, Bailin Wang,…

July 13, 2025

On the Origins of Linear Representations in Large Language Modelsby Yibo Jiang, Goutham Rajendran, Pradeep…

July 13, 2025

TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmaxby Tobias…

July 13, 2025

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serveby Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree…