Loading Now

Summary of Compute-constrained Data Selection, by Junjie Oscar Yin et al.


Compute-Constrained Data Selection

by Junjie Oscar Yin, Alexander M. Rush

First submitted to arxiv on: 21 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper addresses the challenge of finetuning large language models (LLMs) under budget constraints by introducing a cost-aware utility function for data selection. It formulates the problem as a trade-off between initial-data-selection cost and training gain. The authors conduct experiments across various tasks, scaling finetuning tokens, model sizes, and data selection compute to explore the efficacy of different methods. Surprisingly, many powerful data selection methods are not computationally optimal, and cheaper alternatives dominate from both theoretical and empirical perspectives. For compute-optimal training, the paper finds that perplexity and gradient data selection require specific ratios of training-to-selection model size.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research explores ways to improve large language models without using too much computer power or money. The team created a special formula to choose the best data for fine-tuning these models. They tested different methods, scaling up or down to see what works best. What they found was interesting: many advanced techniques aren’t the most efficient when it comes to using computer resources. Instead, simpler approaches work just as well and are more cost-effective. This is important because it helps us make better use of our computing power and budget.

Keywords

» Artificial intelligence  » Fine tuning  » Perplexity