Summary of Training Llms Over Neurally Compressed Text, by Brian Lester et al.
Training LLMs over Neurally Compressed Text
by Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant
First submitted to arxiv on: 4 Apr 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper explores the idea of training large language models (LLMs) over highly compressed text. Neural text compressors can achieve much higher rates of compression than standard subword tokenizers, which could lead to advantages in training and serving efficiency, as well as easier handling of long text spans. However, strong compression tends to produce opaque outputs that are not well-suited for learning by LLMs. The authors propose a novel compression technique called Equal-Info Windows to overcome this issue. They demonstrate effective learning over neurally compressed text using this method, which improves with scale and outperforms byte-level baselines on perplexity and inference speed benchmarks. While the method delivers worse perplexity than subword tokenizers for models trained with the same parameter count, it has the benefit of shorter sequence lengths, reducing latency. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about making computers understand language better. Right now, computers need a lot of information to learn new things, and that takes up a lot of space. But what if we could teach computers to work with really short pieces of text? That would make it faster and more efficient! The problem is that when we try to shrink text down too much, the computer can’t understand it anymore. So the authors came up with a new way to compress text called Equal-Info Windows. They used this method to train computers to learn from really short pieces of text, and they found that it worked better than other methods! This could be important for making computers more helpful in our daily lives. |
Keywords
* Artificial intelligence * Inference * Perplexity