Stochastic gradient descent – Page 3

July 13, 2025

Summary of Exponential Moving Average Of Weights in Deep Learning: Dynamics and Benefits, by Daniel Morales-brotons et al.

Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefitsby Daniel Morales-Brotons, Thijs Vogels,…

July 13, 2025

Summary of Distributed Sign Momentum with Local Steps For Training Transformers, by Shuhua Yu et al.

Distributed Sign Momentum with Local Steps for Training Transformersby Shuhua Yu, Ding Zhou, Cong Xie,…

July 13, 2025

Summary of Fast Training Of Large Kernel Models with Delayed Projections, by Amirhesam Abedsoltan et al.

Fast training of large kernel models with delayed projectionsby Amirhesam Abedsoltan, Siyuan Ma, Parthe Pandit,…

July 13, 2025

Summary of Differentially Private Learning Beyond the Classical Dimensionality Regime, by Cynthia Dwork et al.

Differentially Private Learning Beyond the Classical Dimensionality Regimeby Cynthia Dwork, Pranay Tankala, Linjun ZhangFirst submitted…

July 13, 2025

Summary of A Unified Analysis For Finite Weight Averaging, by Peng Wang et al.

A Unified Analysis for Finite Weight Averagingby Peng Wang, Li Shen, Zerui Tao, Yan Sun,…

July 13, 2025

Summary of General Framework For Online-to-nonconvex Conversion: Schedule-free Sgd Is Also Effective For Nonconvex Optimization, by Kwangjun Ahn et al.

General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimizationby Kwangjun Ahn,…

July 13, 2025

Summary of An Energy-based Self-adaptive Learning Rate For Stochastic Gradient Descent: Enhancing Unconstrained Optimization with Vav Method, by Jiahao Zhang et al.

An Energy-Based Self-Adaptive Learning Rate for Stochastic Gradient Descent: Enhancing Unconstrained Optimization with VAV methodby…

July 13, 2025

Summary of Impact Of Label Noise on Learning Complex Features, by Rahul Vashisht and P. Krishna Kumar and Harsha Vardhan Govind and Harish G. Ramaswamy

Impact of Label Noise on Learning Complex Featuresby Rahul Vashisht, P. Krishna Kumar, Harsha Vardhan…

July 13, 2025

Summary of Statistical-computational Trade-offs For Recursive Adaptive Partitioning Estimators, by Yan Shuo Tan et al.

Statistical-Computational Trade-offs for Recursive Adaptive Partitioning Estimatorsby Yan Shuo Tan, Jason M. Klusowski, Krishnakumar BalasubramanianFirst…

July 13, 2025

Summary of Scalable Dp-sgd: Shuffling Vs. Poisson Subsampling, by Lynn Chua et al.

Scalable DP-SGD: Shuffling vs. Poisson Subsamplingby Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi,…