Summary of Longskywork: a Training Recipe For Efficiently Extending Context Length in Large Language Models, by Liang Zhao et al.
LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models
by Liang Zhao, Tianwen Wei, Liang Zeng, Cheng Cheng, Liu Yang, Peng Cheng, Lijie Wang, Chenxia Li, Xuejie Wu, Bo Zhu, Yimeng Gan, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou
First submitted to arxiv on: 2 Jun 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The LongSkywork model is a type of Large Language Model that can process up to 200,000 tokens, a significant improvement over previous models. The key innovation is the incorporation of a long-context SFT stage following the standard SFT stage, which requires only 200 iterations to convert the standard SFT model into a long-context model. To reduce the effort in collecting and annotating data for long-context language modeling, the authors develop two novel methods for creating synthetic data that are applied during both the continual pretraining phase and the Supervised Fine-Tuning (SFT) phase. The findings suggest that synthetic long-context SFT data can surpass the performance of human-curated data to some extent. LongSkywork achieves outstanding performance on a variety of long-context benchmarks, including the Needle test for information retrieval, where it achieved perfect accuracy across multiple context spans. Furthermore, in realistic application scenarios, LongSkywork-13B demonstrates performance comparable to Claude2.1, the leading long-context model. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary LongSkywork is a new way to process very long pieces of text using AI. It’s like having a superpower for understanding what’s going on when you read something really long! The team behind it figured out how to make it work by adding a special step in the training process that helps the model learn from really long texts. They also came up with ways to create fake data that can help train the model, which makes things easier and faster. The results show that LongSkywork is very good at understanding long texts and even beats some other models that were trained on human-made data. |
Keywords
» Artificial intelligence » Fine tuning » Large language model » Pretraining » Supervised » Synthetic data