Summary of Adapting Language Models Via Token Translation, by Zhili Feng et al.
Adapting Language Models via Token Translation
by Zhili Feng, Tanya Marwah, Nicolo Fusi, David Alvarez-Melis, Lester Mackey
First submitted to arxiv on: 1 Nov 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper introduces Sparse Sinkhorn Token Translation (S2T2), a novel approach to improve the performance of large language models when fine-tuning them for new target domains. Current methods rely on fixed tokenizers, which can lead to inferior compression and reduced semantic alignment in the target domain. S2T2 trains a tailored tokenizer for the target domain and learns to translate between target and source tokens, enabling more effective reuse of the pre-trained next-source-token predictor. The authors demonstrate the effectiveness of S2T2 by improving both perplexity and compression of out-of-domain protein sequences using finetuned English language models. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large language models have a hard time understanding new types of text, like proteins. This is because they were trained on one type of text and struggle to adapt to others. The authors came up with an idea called S2T2 that helps the model learn a new way to understand this new type of text. It’s like teaching the model a new language! They tested it and found that it works really well, even when using smaller models to help bigger ones learn faster. |
Keywords
* Artificial intelligence * Alignment * Fine tuning * Perplexity * Token * Tokenizer * Translation