Loading Now

Summary of Superposed Decoding: Multiple Generations From a Single Autoregressive Inference Pass, by Ethan Shen et al.


Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

by Ethan Shen, Alan Fan, Sarah M. Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati

First submitted to arxiv on: 28 May 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A new decoding algorithm called Superposed Decoding is proposed to generate multiple auto-complete drafts at a significantly reduced computation cost. This is achieved by feeding a superposition of token embeddings from the drafts as input to the next decoding step, allowing for the generation of k drafts with only one autoregressive inference pass. The algorithm combines the drafts with top-k tokens to produce k^2 new drafts and caches the k most likely options using n-gram interpolation to filter out incoherent generations. Experimental results show that Superposed Decoding generates at least as coherent and factual text as Nucleus Sampling and Greedy Decoding, while being at least 2.44 times faster for k >= 3. User evaluations also favor text generated by Superposed Decoding over Nucleus Sampling in a compute-normalized setting. The algorithm can be combined with other decoding strategies to achieve universal coverage gains when scaling inference time compute.
Low GrooveSquid.com (original content) Low Difficulty Summary
A new way to make computers suggest words and sentences is being tested. This method, called Superposed Decoding, lets the computer generate many suggestions at once instead of just one. It works by giving the computer a mix of ideas from all the suggestions it’s been making so far. The computer then uses these mixed-up ideas to come up with even more suggestions. Tests show that this new way is just as good at coming up with useful and true sentences as other methods, but it takes less time. People like this new method better than some others when they’re trying to get a lot of work done quickly.

Keywords

» Artificial intelligence  » Autoregressive  » Inference  » N gram  » Token