Loading Now

Summary of Eagle-2: Faster Inference Of Language Models with Dynamic Draft Trees, by Yuhui Li et al.


EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

by Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang

First submitted to arxiv on: 24 Jun 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The abstract presents a novel approach to accelerate inference with Large Language Models (LLMs), building upon the existing method called EAGLE. The authors identify that traditional speculative sampling methods, like EAGLE, assume a static draft tree and ignore context-dependent factors. To address this limitation, they introduce EAGLE-2, which incorporates a dynamic draft tree informed by the confidence scores from the LLM’s draft model. This improvement enables faster inference while preserving the quality of generated text. The authors demonstrate the effectiveness of EAGLE-2 through extensive evaluations on three LLM series and six tasks, achieving speedup ratios 3.05x-4.26x compared to EAGLE-1.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper proposes a new method to make language models faster without sacrificing quality. It starts by pointing out that current methods assume the same acceptance rate for words in different contexts. The authors then introduce a better way, called EAGLE-2, which uses information from the model’s confidence scores to adjust its draft tree. This makes it possible to generate text 20%-40% faster than before while keeping the quality the same. The paper shows that EAGLE-2 works well on different language models and tasks.

Keywords

* Artificial intelligence  * Inference