Summary of Ngd Converges to Less Degenerate Solutions Than Sgd, by Moosa Saghir et al.
NGD converges to less degenerate solutions than SGD
by Moosa Saghir, N. R. Raghavendra, Zihe Liu, Evan Ryan Gunter
First submitted to arxiv on: 7 Sep 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel approach to measuring model complexity is proposed in this paper. The traditional method of counting free parameters is shown to be inaccurate, as models with many parameters can still generalize well if they are memorizing their training data. Instead, the authors introduce the concept of effective dimension, which counts only the number of parameters required to represent a model’s functionality. The learning coefficient λ is proposed as a measure of effective dimension, capturing information from higher-order terms that describe the rate of increase in the volume of the parameter space around a local minimum. The authors compare the λ values of models trained using natural gradient descent (NGD) and stochastic gradient descent (SGD), finding that those trained with NGD have consistently higher effective dimensions when evaluated using two methods: Hessian trace Tr(H) and local learning coefficient LLC. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about a new way to measure how complex a machine learning model is. Right now, people usually count the number of free parameters in a model, but this doesn’t always work well. Some models with many parameters can still do a good job on new data if they are just memorizing what they learned during training. The authors want to find a better way to measure complexity, so they came up with something called effective dimension. This is like counting only the number of parameters that actually help the model learn and make predictions. They also use something called the learning coefficient λ to measure this complexity. The authors tested their idea on two different types of models: those trained using natural gradient descent (NGD) and those trained using stochastic gradient descent (SGD). They found that NGD-trained models tend to be more complex. |
Keywords
» Artificial intelligence » Gradient descent » Machine learning » Stochastic gradient descent