Summary of Convergence Rate Analysis Of Lion, by Yiming Dong et al.
Convergence Rate Analysis of LION
by Yiming Dong, Huan Li, Zhouchen Lin
First submitted to arxiv on: 12 Nov 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Optimization and Control (math.OC)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A newly discovered optimizer, LION (evoLved sIgn mOmeNtum), has shown impressive performance in training large-scale deep neural networks. Google found this simple sign update-based optimizer through program search. While previous studies have explored its convergence properties, a comprehensive analysis of the convergence rate is still lacking. This paper fills this gap by demonstrating LION’s convergence to the Karush-Kuhn-Tucker (KKT) point at a rate of O(sqrt(d)K^-1/4) measured by gradient L1 norm. The authors also remove constraints and show that LION converges to the critical point of the general unconstrained problem at the same rate. This rate matches the theoretical lower bound for nonconvex stochastic optimization algorithms, which is typically measured using the gradient L2 norm. Through extensive experiments, LION achieves lower loss and higher performance compared to standard SGD. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary LION is a new optimizer that helps train big neural networks quickly and accurately. Google found it through a search of programs. People have studied its properties before, but nobody has looked at how fast it converges yet. This paper shows that LION gets close to the best answer (called the KKT point) really quickly – in fact, it’s as fast as it can be for a problem of this type. It also works well without any constraints. The authors tested LION and found that it does better than another popular optimizer called SGD. |
Keywords
* Artificial intelligence * Optimization