Summary of Cautious Optimizers: Improving Training with One Line Of Code, by Kaizhao Liang et al.
Cautious Optimizers: Improving Training with One Line of Code
by Kaizhao Liang, Lizhang Chen, Bo Liu, Qiang Liu
First submitted to arxiv on: 25 Nov 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Discrete Mathematics (cs.DM)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a single-line modification to momentum-based optimizers, such as AdamW and Lion, which they rename cautious optimizer (C-AdamW and C-Lion). This modification preserves the Hamiltonian function of Adam and does not break its convergence guarantee. Theoretical analysis shows that this modification is stable and preserves the optimizer’s properties. Empirical experiments demonstrate speed-ups on pretraining tasks, such as Llama and MAE, by up to 1.47 times, as well as improved results in post-training tasks for LLM. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper makes a simple change to some computer programs that help train artificial intelligence models. This change is based on an old idea called AdamW, which has been used for many years. The new idea preserves the good properties of AdamW and doesn’t make it worse. In fact, it makes some AI models work better and faster. The researchers tested this new method and found that it can speed up some tasks by as much as 1.47 times. |
Keywords
» Artificial intelligence » Llama » Mae » Pretraining