Loading Now

Summary of Information-theoretic Foundations For Neural Scaling Laws, by Hong Jun Jeon et al.


Information-Theoretic Foundations for Neural Scaling Laws

by Hong Jun Jeon, Benjamin Van Roy

First submitted to arxiv on: 28 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Neural scaling laws aim to understand how out-of-sample error changes as model and training dataset size increase. Existing theories lack rigor, entangling information and optimization roles. This work develops rigorous information-theoretic foundations for neural scaling laws, allowing us to characterize scaling laws for two-layer infinite-width neural networks. We find that the optimal relation between data and model size is linear, up to logarithmic factors, consistent with large-scale empirical investigations. Our concise yet general results may clarify this topic and inform future research.
Low GrooveSquid.com (original content) Low Difficulty Summary
Scientists are trying to understand how well a computer program works when it’s bigger or has more training data. Right now, there isn’t a clear way to know what will happen if you make the program bigger or add more data. This paper tries to fix that by creating a new way to think about this problem using special math rules. The results show that making the program bigger and adding more data are related in a simple way, which is good news for people who want to use these programs.

Keywords

* Artificial intelligence  * Optimization  * Scaling laws