Summary of The Optimization Landscape Of Sgd Across the Feature Learning Strength, by Alexander Atanasov et al.
The Optimization Landscape of SGD Across the Feature Learning Strength
by Alexander Atanasov, Alexandru Meterez, James B. Simon, Cengiz Pehlevan
First submitted to arxiv on: 6 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates neural networks with a final layer scaled by a fixed hyperparameter γ, exploring its impact on feature learning and performance. Recent studies have shown that increasing γ leads to richer feature-learning dynamics, resulting in improved task performance. The authors empirically examine the effect of scaling γ across various models and datasets during online training, identifying several regimes in the γ-η (learning rate) plane. They find that the optimal learning rate η* scales non-trivially with γ, leading to characteristic loss curves at large γ values. Their findings suggest that optimizing networks in these regimes can lead to improved performance. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at special kinds of neural networks where the last layer is adjusted by a certain number called γ. Research has shown that when γ gets bigger, the network learns more features and does better on tasks. The authors studied how changing γ affects the network’s behavior during training and found some surprising patterns. They also discovered that if you don’t adjust γ just right, your network might not be as good as it could be. |
Keywords
» Artificial intelligence » Hyperparameter