Loading Now

Summary of Sam Decoding: Speculative Decoding Via Suffix Automaton, by Yuxuan Hu et al.


SAM Decoding: Speculative Decoding via Suffix Automaton

by Yuxuan Hu, Ke Wang, Xiaokang Zhang, Fanjin Zhang, Cuiping Li, Hong Chen, Jing Zhang

First submitted to arxiv on: 16 Nov 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Medium Difficulty summary: This paper presents a novel speculative decoding method, SAM-Decoding, for efficient and accurate lossless large language model (LLM) inference acceleration. Unlike existing methods, SAM-Decoding uses suffix automaton (SAM) to find the exact longest suffix match, achieving an average time complexity of O(1) per generation step. It can also integrate with existing methods, adapting to broader domains by selecting a draft generation strategy based on match length. Experimental results on Spec-Bench show that our method is 18%+ faster than other retrieval-based SD methods, and when combined with EAGLE-2, it provides an additional speedup of 3.28% – 11.13% across various-sized LLM backbones.
Low GrooveSquid.com (original content) Low Difficulty Summary
Low Difficulty summary: This paper introduces a new way to make language models work faster without losing accuracy. It’s called SAM-Decoding and uses a special tool called suffix automaton to find the best match for a given text. Unlike other methods, SAM-Decoding is very fast and can be used with different types of language models. The results show that this method is 18%+ faster than others, and when combined with another technique, it’s even faster. This research can help improve the performance of language models in many applications.

Keywords

» Artificial intelligence  » Inference  » Large language model  » Sam