Summary of Beyond Chinchilla-optimal: Accounting For Inference in Language Model Scaling Laws, by Nikhil Sardana and Jacob Portes and Sasha Doubov and Jonathan Frankle

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

by Nikhil Sardana, Jacob Portes, Sasha Doubov, Jonathan Frankle

First submitted to arxiv on: 31 Dec 2023

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper modifies the popular Deepmind Chinchilla scaling laws for large language models (LLMs) to include the cost of inference. The modified formula estimates the optimal LLM parameter count and pre-training data size to achieve a given quality and inference demand. The analysis considers both compute budget and real-world costs, finding that researchers expecting reasonably large inference demand (~1B requests) should train smaller and longer models than Chinchilla-optimal. The paper also validates its formula by training 47 models of varying sizes and parameter counts, showing that model quality continues to improve at extreme ranges (up to 10,000 tokens per parameter). Additionally, the procedure used to fit the Chinchilla scaling law coefficients is ablated, revealing that developing scaling laws only from typical token/parameter ratios overestimates the impact of additional tokens.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper makes big language models better by including the cost of using them. It changes a popular formula called Chinchilla to make it more realistic. The new formula helps us decide how many parameters and training data we need for a model that can handle a certain number of requests. We tested this formula with 47 different models, and it worked well. We also found out that if we only use typical examples to fit the formula, it will be too optimistic.

Keywords

* Artificial intelligence * Inference * Scaling laws * Token

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

by Nikhil Sardana, Jacob Portes, Sasha Doubov, Jonathan Frankle

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Energy-efficient Power Control For Multiple-task Split Inference in Uavs: a Tiny Learning-based Approach, by Chenxi Zhao et al.

Summary of Kernel Density Estimation For Multiclass Quantification, by Alejandro Moreo et al.

Related Posts