Gradient descent – Page 31

July 13, 2025

Stacking as Accelerated Gradient Descentby Naman Agarwal, Pranjal Awasthi, Satyen Kale, Eric ZhaoFirst submitted to…

July 13, 2025

Directional Smoothness and Gradient Methods: Convergence and Adaptivityby Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron…

July 13, 2025

On the Origins of Linear Representations in Large Language Modelsby Yibo Jiang, Goutham Rajendran, Pradeep…

July 13, 2025

Inverse-Free Fast Natural Gradient Descent Method for Deep Learningby Xinwei Ou, Ce Zhu, Xiaolin Huang,…

July 13, 2025

Level Set Teleportation: An Optimization Perspectiveby Aaron Mishkin, Alberto Bietti, Robert M. GowerFirst submitted to…

July 13, 2025

How Well Can Transformers Emulate In-context Newton’s Method?by Angeliki Giannou, Liu Yang, Tianhao Wang, Dimitris…

July 13, 2025

SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrixby Mrinmay Sen, A. K. Qin, Gayathri C,…

July 13, 2025

Noise misleads rotation invariant algorithms on sparse targetsby Manfred K. Warmuth, Wojciech Kotłowski, Matt Jones,…

July 13, 2025

Enhancing LLM Safety via Constrained Direct Preference Optimizationby Zixuan Liu, Xiaolin Sun, Zizhan ZhengFirst submitted…

July 13, 2025

From Zero to Hero: How local curvature at artless initial conditions leads away from bad…