Loading Now

Summary of Cautious Optimizers: Improving Training with One Line Of Code, by Kaizhao Liang et al.


Cautious Optimizers: Improving Training with One Line of Code

by Kaizhao Liang, Lizhang Chen, Bo Liu, Qiang Liu

First submitted to arxiv on: 25 Nov 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Discrete Mathematics (cs.DM)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes a single-line modification to momentum-based optimizers, such as AdamW and Lion, which they rename cautious optimizer (C-AdamW and C-Lion). This modification preserves the Hamiltonian function of Adam and does not break its convergence guarantee. Theoretical analysis shows that this modification is stable and preserves the optimizer’s properties. Empirical experiments demonstrate speed-ups on pretraining tasks, such as Llama and MAE, by up to 1.47 times, as well as improved results in post-training tasks for LLM.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper makes a simple change to some computer programs that help train artificial intelligence models. This change is based on an old idea called AdamW, which has been used for many years. The new idea preserves the good properties of AdamW and doesn’t make it worse. In fact, it makes some AI models work better and faster. The researchers tested this new method and found that it can speed up some tasks by as much as 1.47 times.

Keywords

» Artificial intelligence  » Llama  » Mae  » Pretraining