Loading Now

Summary of Scalable Nested Optimization For Deep Learning, by Jonathan Lorraine


Scalable Nested Optimization for Deep Learning

by Jonathan Lorraine

First submitted to arxiv on: 1 Jul 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper presents novel tools for scaling gradient-based optimization in machine learning, specifically focusing on bilevel or nested optimization of subsets of parameters. The authors motivate their work through examples of hyperparameter optimization and generative adversarial networks. Despite the widespread success of classical methods, they demonstrate that these approaches often fail when applied naively to large-scale nested problems. To address this challenge, the thesis develops scalable tools for nested optimization, suitable for deep learning applications.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps us solve a big problem in machine learning. Right now, we can only use one way of updating parameters at a time to make a model better or worse. But there are many situations where we need to update different groups of parameters based on different goals. The authors show that the usual methods don’t work well when we try to do this on a large scale. So, they developed new tools to help us solve these more complex problems.

Keywords

» Artificial intelligence  » Deep learning  » Hyperparameter  » Machine learning  » Optimization