Summary of Scaling Llm Inference with Optimized Sample Compute Allocation, by Kexun Zhang et al.
Scaling LLM Inference with Optimized Sample Compute Allocation
by Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li
First submitted to arxiv on: 29 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel algorithm, OSCA (Optimizes Sample Compute Allocation), is proposed to efficiently scale up inference-time algorithms for large language models. By formulating the sampling configuration choices as a learning problem, OSCA finds an optimal mix of different inference configurations, achieving better accuracy with 128x less compute on code generation and 25x less compute on four reasoning tasks. The algorithm’s effectiveness is also demonstrated in agentic workflows beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary OSCA is a new way to make large language models work faster without losing accuracy. It’s like a game where you try different combinations of settings to find the best one that uses the least amount of computer power. The results are amazing – it works just as well with much less computing power, which is great for when we need to do lots of tasks quickly. |
Keywords
» Artificial intelligence » Inference