Summary of Beyond Chinchilla-optimal: Accounting For Inference in Language Model Scaling Laws, by Nikhil Sardana and Jacob Portes and Sasha Doubov and Jonathan Frankle
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
by Nikhil Sardana, Jacob Portes, Sasha Doubov, Jonathan Frankle
First submitted to arxiv on: 31 Dec 2023
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper modifies the popular Deepmind Chinchilla scaling laws for large language models (LLMs) to include the cost of inference. The modified formula estimates the optimal LLM parameter count and pre-training data size to achieve a given quality and inference demand. The analysis considers both compute budget and real-world costs, finding that researchers expecting reasonably large inference demand (~1B requests) should train smaller and longer models than Chinchilla-optimal. The paper also validates its formula by training 47 models of varying sizes and parameter counts, showing that model quality continues to improve at extreme ranges (up to 10,000 tokens per parameter). Additionally, the procedure used to fit the Chinchilla scaling law coefficients is ablated, revealing that developing scaling laws only from typical token/parameter ratios overestimates the impact of additional tokens. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper makes big language models better by including the cost of using them. It changes a popular formula called Chinchilla to make it more realistic. The new formula helps us decide how many parameters and training data we need for a model that can handle a certain number of requests. We tested this formula with 47 different models, and it worked well. We also found out that if we only use typical examples to fit the formula, it will be too optimistic. |
Keywords
* Artificial intelligence * Inference * Scaling laws * Token