Loading Now

Summary of Improving Line Search Methods For Large Scale Neural Network Training, by Philip Kenneweg et al.


Improving Line Search Methods for Large Scale Neural Network Training

by Philip Kenneweg, Tristan Kenneweg, Barbara Hammer

First submitted to arxiv on: 27 Mar 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper identifies issues with state-of-the-art line search methods, proposes enhancements, and rigorously evaluates their effectiveness on larger datasets and more complex data domains than before. Specifically, it improves the Armijo line search by integrating the momentum term from ADAM in its search direction, enabling efficient large-scale training. The optimized approach outperforms both the previous Armijo implementation and tuned learning rate schedules for Adam. The evaluation focuses on Transformers and CNNs in NLP and image data domains. The work is publicly available as a Python package, which provides a hyperparameter-free PyTorch optimizer.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper makes some algorithms better for training big models. It fixes problems with old methods and tries new ways to make them more efficient. By combining two techniques, it gets even better results than before. The test shows that this new method works well on many different kinds of data and tasks. This work is available online as a package for Python, which makes it easy to use.

Keywords

* Artificial intelligence  * Hyperparameter  * Nlp