Summary of Information-theoretic Foundations For Neural Scaling Laws, by Hong Jun Jeon et al.
Information-Theoretic Foundations for Neural Scaling Laws
by Hong Jun Jeon, Benjamin Van Roy
First submitted to arxiv on: 28 Jun 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Neural scaling laws aim to understand how out-of-sample error changes as model and training dataset size increase. Existing theories lack rigor, entangling information and optimization roles. This work develops rigorous information-theoretic foundations for neural scaling laws, allowing us to characterize scaling laws for two-layer infinite-width neural networks. We find that the optimal relation between data and model size is linear, up to logarithmic factors, consistent with large-scale empirical investigations. Our concise yet general results may clarify this topic and inform future research. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Scientists are trying to understand how well a computer program works when it’s bigger or has more training data. Right now, there isn’t a clear way to know what will happen if you make the program bigger or add more data. This paper tries to fix that by creating a new way to think about this problem using special math rules. The results show that making the program bigger and adding more data are related in a simple way, which is good news for people who want to use these programs. |
Keywords
* Artificial intelligence * Optimization * Scaling laws