Loading Now

Summary of Scaling Llm Inference with Optimized Sample Compute Allocation, by Kexun Zhang et al.


Scaling LLM Inference with Optimized Sample Compute Allocation

by Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li

First submitted to arxiv on: 29 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel algorithm, OSCA (Optimizes Sample Compute Allocation), is proposed to efficiently scale up inference-time algorithms for large language models. By formulating the sampling configuration choices as a learning problem, OSCA finds an optimal mix of different inference configurations, achieving better accuracy with 128x less compute on code generation and 25x less compute on four reasoning tasks. The algorithm’s effectiveness is also demonstrated in agentic workflows beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration.
Low GrooveSquid.com (original content) Low Difficulty Summary
OSCA is a new way to make large language models work faster without losing accuracy. It’s like a game where you try different combinations of settings to find the best one that uses the least amount of computer power. The results are amazing – it works just as well with much less computing power, which is great for when we need to do lots of tasks quickly.

Keywords

» Artificial intelligence  » Inference