Summary of Bridging the Training-inference Gap in Llms by Leveraging Self-generated Tokens, By Zhepeng Cen and Yao Liu and Siliang Zeng and Pratik Chaudhari and Huzefa Rangwala and George Karypis and Rasool Fakoor
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
by Zhepeng Cen, Yao Liu, Siliang Zeng, Pratik Chaudhari, Huzefa Rangwala, George Karypis, Rasool Fakoor
First submitted to arxiv on: 18 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper addresses the gap between language model training and inference. During training, models are optimized for predicting the next token given past tokens. However, at inference time, they generate text sequentially using previously generated tokens as input. This difference can lead to unpredictable behavior due to marginal changes cascading over successive steps. To bridge this gap, the authors propose two approaches: Batch-Scheduled Sampling and Reference-Answer-based Correction. The former stochastically incorporates ground-truth tokens or model-generated tokens during training, modifying the context window. The latter explicitly enables self-correction capabilities into the model during training. Experimental results on summarization, general question-answering, and math question-answering tasks demonstrate improved performance over baseline methods. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about how language models work differently when they’re trained versus when they’re used to generate text. When models are trained, they try to predict the next word based on what came before. But when they’re actually generating text, they use those predictions to make new ones. This can cause problems because small changes early on can add up and result in unexpected behavior. The authors propose two ways to fix this issue: one involves mixing in real words from the training data with generated words, while the other lets the model correct its own mistakes during training. They tested these approaches on different tasks and found that they improved performance. |
Keywords
» Artificial intelligence » Context window » Inference » Language model » Question answering » Summarization » Token