Loading Now

Summary of Aioli: a Unified Optimization Framework For Language Model Data Mixing, by Mayee F. Chen et al.


Aioli: A Unified Optimization Framework for Language Model Data Mixing

by Mayee F. Chen, Michael Y. Hu, Nicholas Lourie, Kyunghyun Cho, Christopher Ré

First submitted to arxiv on: 8 Nov 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper studies the optimal mixture of data groups for training language models. Prior work has proposed various methods to efficiently learn mixture proportions, but surprisingly, a simple stratified sampling baseline consistently outperforms these methods in terms of average test perplexity per group. The authors unify existing methods into a standard optimization framework and show that all methods aim to minimize total loss, subject to method-specific mixing laws. They find that existing parameterizations of mixing laws can express the true loss-proportion relationship empirically, but methods often set mixing law parameters inaccurately, leading to poor performance. The paper introduces Aioli, an online method that estimates mixing law parameters and adjusts proportions dynamically during training. Empirically, Aioli outperforms stratified sampling on 6 out of 6 datasets by an average of 0.28 test perplexity points.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper looks at how to mix different types of data to train language models. People have tried different ways to do this, but surprisingly, a simple way is still the best. The authors look at why this is and find that most methods try to make total loss as small as possible by adjusting how they mix the data. They also show that some methods can get stuck because of wrong assumptions about how loss relates to mixing. To fix this, the paper introduces Aioli, a new way to estimate mixing parameters and adjust them during training. This helps improve performance on many datasets.

Keywords

» Artificial intelligence  » Optimization  » Perplexity