Loading Now

Summary of Scaling Laws For Downstream Task Performance in Machine Translation, by Berivan Isik et al.


Scaling Laws for Downstream Task Performance in Machine Translation

by Berivan Isik, Natalia Ponomareva, Hussein Hazimeh, Dimitris Paparas, Sergei Vassilvitskii, Sanmi Koyejo

First submitted to arxiv on: 6 Feb 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The study explores scaling laws in transfer learning settings for large language models (LLMs) finetuned for machine translation tasks. Researchers investigate how the choice of pretraining data and its size affect downstream performance metrics such as cross-entropy, BLEU, and COMET scores. They find that sufficient distribution alignment between pretraining and downstream data leads to improved translation quality with increasing pretraining data, which can be predicted using a log-law. However, moderate misalignment can cause fluctuations or worsen translation scores while improving cross-entropy.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models (LLMs) are getting better at understanding human languages. Researchers looked at how these models perform when they’re “finetuned” for specific tasks like translating text from one language to another. They wanted to know what happens when you give these models more data to train on, and if there’s a way to predict how well they’ll do. The results show that if the model is trained on similar types of texts before being used for translation, it gets better at translating as it receives more training. But if the training data is very different from what the model is actually supposed to translate, it can get worse or stay the same even with more training.

Keywords

* Artificial intelligence  * Alignment  * Bleu  * Cross entropy  * Pretraining  * Scaling laws  * Transfer learning  * Translation