Loading Now

Summary of Bridging the Training-inference Gap in Llms by Leveraging Self-generated Tokens, By Zhepeng Cen and Yao Liu and Siliang Zeng and Pratik Chaudhari and Huzefa Rangwala and George Karypis and Rasool Fakoor


Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens

by Zhepeng Cen, Yao Liu, Siliang Zeng, Pratik Chaudhari, Huzefa Rangwala, George Karypis, Rasool Fakoor

First submitted to arxiv on: 18 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper addresses the gap between language model training and inference. During training, models are optimized for predicting the next token given past tokens. However, at inference time, they generate text sequentially using previously generated tokens as input. This difference can lead to unpredictable behavior due to marginal changes cascading over successive steps. To bridge this gap, the authors propose two approaches: Batch-Scheduled Sampling and Reference-Answer-based Correction. The former stochastically incorporates ground-truth tokens or model-generated tokens during training, modifying the context window. The latter explicitly enables self-correction capabilities into the model during training. Experimental results on summarization, general question-answering, and math question-answering tasks demonstrate improved performance over baseline methods.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about how language models work differently when they’re trained versus when they’re used to generate text. When models are trained, they try to predict the next word based on what came before. But when they’re actually generating text, they use those predictions to make new ones. This can cause problems because small changes early on can add up and result in unexpected behavior. The authors propose two ways to fix this issue: one involves mixing in real words from the training data with generated words, while the other lets the model correct its own mistakes during training. They tested these approaches on different tasks and found that they improved performance.

Keywords

» Artificial intelligence  » Context window  » Inference  » Language model  » Question answering  » Summarization  » Token