Loading Now

Summary of Planning in Natural Language Improves Llm Search For Code Generation, by Evan Wang et al.


Planning In Natural Language Improves LLM Search For Code Generation

by Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen Han, Sean Hendryx, Summer Yue, Hugh Zhang

First submitted to arxiv on: 5 Sep 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper presents PlanSearch, a novel search algorithm that addresses the inefficiencies in large language model (LLM) inference compute by exploring diverse candidate plans for solving natural language problems. Unlike traditional methods that repeatedly sample similar yet incorrect generations, PlanSearch generates a range of observations about the problem and uses these to construct potential solutions. By searching over plans rather than code solutions, PlanSearch achieves state-of-the-art results on benchmarks such as LiveCodeBench, outperforming both baseline search methods and models without search. PlanSearch is particularly effective when used in combination with Claude 3.5 Sonnet, achieving a pass@200 of 77.0% on LiveCodeBench compared to 41.4% without search and 60.6% using repeated sampling. Furthermore, the paper shows that performance gains due to search can be accurately predicted as a function of the diversity over generated ideas, highlighting the potential for PlanSearch to improve LLM inference compute. The authors demonstrate the efficacy of PlanSearch across various models, algorithms, and benchmarks, including HumanEval+, MBPP+, and LiveCodeBench. Overall, this paper contributes to the development of more efficient and effective large language model-based systems by providing a new approach to addressing the limitations of traditional search methods.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research is about finding better ways for computers to solve problems using natural language. Right now, big language models can do a great job solving some problems, but they’re not always very good at it. The problem is that these models are limited because they only look at similar solutions and don’t explore other possibilities. This paper presents a new approach called PlanSearch that helps computers find better solutions by looking at different ideas first. This way, the computer can try out more potential solutions before settling on one. The authors tested PlanSearch with several language models and showed that it worked really well. For example, when used with Claude 3.5 Sonnet, PlanSearch achieved a high level of accuracy in solving problems. The paper also showed that the amount of improvement depends on how diverse the ideas are that the computer generates. Overall, this research has the potential to improve computers’ ability to solve complex problems using natural language.

Keywords

* Artificial intelligence  * Claude  * Inference  * Large language model