Summary of Scaling Llm Inference with Optimized Sample Compute Allocation, by Kexun Zhang et al.

Scaling LLM Inference with Optimized Sample Compute Allocation

by Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li

First submitted to arxiv on: 29 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel algorithm, OSCA (Optimizes Sample Compute Allocation), is proposed to efficiently scale up inference-time algorithms for large language models. By formulating the sampling configuration choices as a learning problem, OSCA finds an optimal mix of different inference configurations, achieving better accuracy with 128x less compute on code generation and 25x less compute on four reasoning tasks. The algorithm’s effectiveness is also demonstrated in agentic workflows beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration.
Low	GrooveSquid.com (original content)	Low Difficulty Summary OSCA is a new way to make large language models work faster without losing accuracy. It’s like a game where you try different combinations of settings to find the best one that uses the least amount of computer power. The results are amazing – it works just as well with much less computing power, which is great for when we need to do lots of tasks quickly.

Keywords

» Artificial intelligence » Inference

Scaling LLM Inference with Optimized Sample Compute Allocation

by Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Contextiq: a Multimodal Expert-based Video Retrieval System For Contextual Advertising, by Ashutosh Chaubey et al.

Summary of Dataset Awareness Is Not Enough: Implementing Sample-level Tail Encouragement in Long-tailed Self-supervised Learning, by Haowen Xiao et al.

Related Posts