Loading Now

Summary of Gradient Descent on Logistic Regression with Non-separable Data and Large Step Sizes, by Si Yi Meng et al.


Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes

by Si Yi Meng, Antonio Orvieto, Daniel Yiming Cao, Christopher De Sa

First submitted to arxiv on: 7 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Optimization and Control (math.OC)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper studies the behavior of gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. It shows that while GD converges to the minimizer for linearly-separable data, the situation is more complex when the problem is not separable. The authors demonstrate a sequence of period-doubling bifurcations beginning at the critical step size 2/λ, where λ is the largest eigenvalue of the Hessian at the solution. They also find that using a smaller-than-critical step size guarantees convergence if initialized nearby the solution, but does not ensure global convergence. In one dimension, a step size less than 1/λ suffices for global convergence, while in higher dimensions, this is possible even for step sizes less than 1/λ. The results show that local convergence is guaranteed for all step sizes less than the critical step size, but global convergence is not, and GD may instead converge to a cycle depending on the initialization.
Low GrooveSquid.com (original content) Low Difficulty Summary
GD dynamics are studied on logistic regression problems with large step sizes. For linearly-separable data, GD converges with any size. But what happens when it’s not separable? The researchers find that using a small step size helps, but doesn’t guarantee convergence everywhere. They show that for big enough step sizes, the algorithm can even get stuck in a loop! This study looks at how this works in one dimension and in more complex cases.

Keywords

* Artificial intelligence  * Gradient descent  * Logistic regression