Summary of Understanding Warmup-stable-decay Learning Rates: a River Valley Loss Landscape Perspective, by Kaiyue Wen et al.
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspectiveby Kaiyue Wen, Zhiyuan Li, Jason…
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspectiveby Kaiyue Wen, Zhiyuan Li, Jason…
DEPT: Decoupled Embeddings for Pre-training Language Modelsby Alex Iacob, Lorenzo Sani, Meghdad Kurmanji, William F.…
Learning How Hard to Think: Input-Adaptive Allocation of LM Computationby Mehul Damani, Idan Shenfeld, Andi…
Language Model-Driven Data Pruning Enables Efficient Active Learningby Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha…
Mixture of Attentions For Speculative Decodingby Matthieu Zimmer, Milan Gritta, Gerasimos Lampouras, Haitham Bou Ammar,…
Can Mamba Always Enjoy the “Free Lunch”?by Ruifeng Ren, Zhicong Li, Yong LiuFirst submitted to…
AIME: AI System Optimization via Multiple LLM Evaluatorsby Bhrij Patel, Souradip Chakraborty, Wesley A. Suttle,…
Permissive Information-Flow Analysis for Large Language Modelsby Shoaib Ahmed Siddiqui, Radhika Gaonkar, Boris Köpf, David…
MetaOOD: Automatic Selection of OOD Detection Modelsby Yuehan Qin, Yichi Zhang, Yi Nian, Xueying Ding,…
Fine-Tuning Language Models with Differential Privacy through Adaptive Noise Allocationby Xianzhi Li, Ran Zmigrod, Zhiqiang…