Summary of Scaling Laws For Predicting Downstream Performance in Llms, by Yangyi Chen et al.

Scaling Laws for Predicting Downstream Performance in LLMs

by Yangyi Chen, Binxuan Huang, Yifan Gao, Zhengyang Wang, Jingfeng Yang, Heng Ji

First submitted to arxiv on: 11 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel approach to predict downstream performance in large language models (LLMs) prior to training is proposed. The method, called FLP-M, utilizes the pre-training loss as a more computation-efficient metric for performance estimation. It involves two stages: first, estimating a function that maps computational resources (e.g., FLOPs) to the pre-training loss using a series of sampling models; and then, mapping the pre-training loss to downstream task performance after the critical “emergent phase”. Preliminary experiments show that this approach accurately predicts the performance of LLMs with 7B and 13B parameters, achieving error margins of 5% and 10%, respectively. FLP-M also addresses the practical need to integrate datasets from multiple sources during pre-training, specifically blending general corpora with code data.
Low	GrooveSquid.com (original content)	Low Difficulty Summary A new way is found to guess how well big language models will work before they’re even trained! The idea uses a special kind of loss (like a score) that’s easier to calculate than just trying the model and seeing what happens. They take smaller versions of these models, called sampling models, and use them to figure out how the bigger model will do. This method is really good at predicting how well big language models will work, even when they’re trained on mixed data from different sources.

Keywords

* Artificial intelligence

Scaling Laws for Predicting Downstream Performance in LLMs

by Yangyi Chen, Binxuan Huang, Yifan Gao, Zhengyang Wang, Jingfeng Yang, Heng Ji

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Evaluating the Effects Of Data Sparsity on the Link-level Bicycling Volume Estimation: a Graph Convolutional Neural Network Approach, by Mohit Gupta et al.

Summary of Ignn-solver: a Graph Neural Solver For Implicit Graph Neural Networks, by Junchao Lin et al.

Related Posts