Summary of Non-convergence Of Adam and Other Adaptive Stochastic Gradient Descent Optimization Methods For Non-vanishing Learning Rates, by Steffen Dereich and Robin Graeber and Arnulf Jentzen
Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning ratesby…