Summary of Information-theoretic Foundations For Neural Scaling Laws, by Hong Jun Jeon et al.

Information-Theoretic Foundations for Neural Scaling Laws

by Hong Jun Jeon, Benjamin Van Roy

First submitted to arxiv on: 28 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Neural scaling laws aim to understand how out-of-sample error changes as model and training dataset size increase. Existing theories lack rigor, entangling information and optimization roles. This work develops rigorous information-theoretic foundations for neural scaling laws, allowing us to characterize scaling laws for two-layer infinite-width neural networks. We find that the optimal relation between data and model size is linear, up to logarithmic factors, consistent with large-scale empirical investigations. Our concise yet general results may clarify this topic and inform future research.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Scientists are trying to understand how well a computer program works when it’s bigger or has more training data. Right now, there isn’t a clear way to know what will happen if you make the program bigger or add more data. This paper tries to fix that by creating a new way to think about this problem using special math rules. The results show that making the program bigger and adding more data are related in a simple way, which is good news for people who want to use these programs.

Keywords

* Artificial intelligence * Optimization * Scaling laws

Information-Theoretic Foundations for Neural Scaling Laws

by Hong Jun Jeon, Benjamin Van Roy

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Fastclip: a Suite Of Optimization Techniques to Accelerate Clip Training with Limited Resources, by Xiyuan Wei et al.

Summary of Graph Neural Network As Computationally Efficient Emulator Of Ice-sheet and Sea-level System Model (issm), by Younghyun Koo et al.

Related Posts