Gradient descent – Page 12

July 13, 2025

Benign Overfitting in Single-Head Attentionby Roey Magen, Shuning Shang, Zhiwei Xu, Spencer Frei, Wei Hu,…

July 13, 2025

Toward generalizable learning of all (linear) first-order methods via memory augmented Transformersby Sanchayan Dutta, Suvrit…

July 13, 2025

Utilizing Lyapunov Exponents in designing deep neural networksby Tirthankar MittraFirst submitted to arxiv on: 8…

July 13, 2025

Extended convexity and smoothness and their applications in deep learningby Binchuan Qi, Wei Gong, Li…

July 13, 2025

On the Impacts of the Random Initialization in the Neural Tangent Kernel Theoryby Guhan Chen,…

July 13, 2025

Score-Based Variational Inference for Inverse Problemsby Zhipeng Xue, Penghao Cai, Xiaojun Yuan, Xiqi GaoFirst submitted…

July 13, 2025

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descentby Bingrui Li, Wei…

July 13, 2025

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapseby Arthur Jacot, Peter Súkeník,…

July 13, 2025

SGD with memory: fundamental properties and stochastic accelerationby Dmitry Yarotsky, Maksim VelikanovFirst submitted to arxiv…

July 13, 2025

Visualising Feature Learning in Deep Neural Networks by Diagonalizing the Forward Feature Mapby Yoonsoo Nam,…