Loading Now

Summary of Learning to Decode Collaboratively with Multiple Language Models, by Shannon Zejiang Shen et al.


Learning to Decode Collaboratively with Multiple Language Models

by Shannon Zejiang Shen, Hunter Lang, Bailin Wang, Yoon Kim, David Sontag

First submitted to arxiv on: 6 Mar 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a method to teach multiple large language models (LLMs) to collaborate by interleaving their generations at the token level. The model treats the decision of which LLM generates the next token as a latent variable, optimizing the marginal likelihood of a training set under this latent variable model. This allows each base LLM to learn when to generate itself and when to call on an “assistant” language model without direct supervision. Token-level collaboration during decoding enables a fusion of each model’s expertise tailored to the specific task at hand. The authors demonstrate the effectiveness of their collaborative decoding method on instruction-following, domain-specific QA, and reasoning tasks, showing that the joint system outperforms individual models.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps language models work together better by mixing their ideas when writing text. It’s like having a team of experts working together to solve a problem! The model learns when to contribute its own ideas and when to ask for help from other “assistant” models. This collaboration is especially useful when the models need to solve problems that require specific knowledge, like understanding medical jargon or explaining complex science concepts. By working together, the language models can perform better than they do alone.

Keywords

* Artificial intelligence  * Language model  * Likelihood  * Token