Summary of An Exactly Solvable Model For Emergence and Scaling Laws in the Multitask Sparse Parity Problem, by Yoonsoo Nam et al.
An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem
by Yoonsoo Nam, Nayara Fonseca, Seok Hyeong Lee, Chris Mingard, Ard A. Louis
First submitted to arxiv on: 26 Apr 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Deep learning models can exhibit sudden abilities to solve problems as training time, data, or model size increases, known as emergence. Our framework represents each new ability (a skill) as a basis function and solves a simple multi-linear model, finding analytic expressions for emergence and scaling laws of loss with training time, data size, model size, and optimal compute. We compare our calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, capturing the sigmoidal emergence of multiple new skills. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Deep learning models can suddenly become good at solving problems when they’re trained for a long time or have more data or are bigger. We found that each new skill (ability) can be represented as a basis function and that we can solve simple math problems using these functions to find how skills emerge and grow. We compared our math to simulations of a neural network learning multiple tasks at once, and our model matched the results using just one parameter. |
Keywords
» Artificial intelligence » Deep learning » Neural network » Scaling laws