Summary of Training Llms Over Neurally Compressed Text, by Brian Lester et al.

Training LLMs over Neurally Compressed Text

by Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant

First submitted to arxiv on: 4 Apr 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper explores the idea of training large language models (LLMs) over highly compressed text. Neural text compressors can achieve much higher rates of compression than standard subword tokenizers, which could lead to advantages in training and serving efficiency, as well as easier handling of long text spans. However, strong compression tends to produce opaque outputs that are not well-suited for learning by LLMs. The authors propose a novel compression technique called Equal-Info Windows to overcome this issue. They demonstrate effective learning over neurally compressed text using this method, which improves with scale and outperforms byte-level baselines on perplexity and inference speed benchmarks. While the method delivers worse perplexity than subword tokenizers for models trained with the same parameter count, it has the benefit of shorter sequence lengths, reducing latency.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper is about making computers understand language better. Right now, computers need a lot of information to learn new things, and that takes up a lot of space. But what if we could teach computers to work with really short pieces of text? That would make it faster and more efficient! The problem is that when we try to shrink text down too much, the computer can’t understand it anymore. So the authors came up with a new way to compress text called Equal-Info Windows. They used this method to train computers to learn from really short pieces of text, and they found that it worked better than other methods! This could be important for making computers more helpful in our daily lives.

Keywords

* Artificial intelligence * Inference * Perplexity

Training LLMs over Neurally Compressed Text

by Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Mitigating the Impact Of Outlier Channels For Language Model Quantization with Activation Regularization, by Aniruddha Nrusimha et al.

Summary of Wordepth: Variational Language Prior For Monocular Depth Estimation, by Ziyao Zeng et al.

Related Posts