Language model – Page 69 – GrooveSquid.com

July 13, 2025

Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspectiveby Kaiyue Wen, Zhiyuan Li, Jason…

July 13, 2025

DEPT: Decoupled Embeddings for Pre-training Language Modelsby Alex Iacob, Lorenzo Sani, Meghdad Kurmanji, William F.…

July 13, 2025

Learning How Hard to Think: Input-Adaptive Allocation of LM Computationby Mehul Damani, Idan Shenfeld, Andi…

July 13, 2025

Language Model-Driven Data Pruning Enables Efficient Active Learningby Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha…

July 13, 2025

Mixture of Attentions For Speculative Decodingby Matthieu Zimmer, Milan Gritta, Gerasimos Lampouras, Haitham Bou Ammar,…

July 13, 2025

Can Mamba Always Enjoy the “Free Lunch”?by Ruifeng Ren, Zhicong Li, Yong LiuFirst submitted to…

July 13, 2025

AIME: AI System Optimization via Multiple LLM Evaluatorsby Bhrij Patel, Souradip Chakraborty, Wesley A. Suttle,…

July 13, 2025

Permissive Information-Flow Analysis for Large Language Modelsby Shoaib Ahmed Siddiqui, Radhika Gaonkar, Boris Köpf, David…

July 13, 2025

MetaOOD: Automatic Selection of OOD Detection Modelsby Yuehan Qin, Yichi Zhang, Yi Nian, Xueying Ding,…

July 13, 2025

Fine-Tuning Language Models with Differential Privacy through Adaptive Noise Allocationby Xianzhi Li, Ran Zmigrod, Zhiqiang…