Summary of Q*: Improving Multi-step Reasoning For Llms with Deliberative Planning, by Chaojie Wang et al.
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
by Chaojie Wang, Yanchen Deng, Zhiyi Lyu, Liang Zeng, Jujie He, Shuicheng Yan, Bo An
First submitted to arxiv on: 20 Jun 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel framework, Q, is introduced to guide Large Language Models (LLMs) in multi-step reasoning tasks. The auto-regressive generation process of LLMs can lead to errors and hallucinations, but Q addresses this by casting multi-step reasoning as a heuristic search problem. A plug-and-play Q-value model is learned to estimate expected future rewards, allowing Q* to guide LLMs’ decoding process with deliberative planning. This approach avoids fine-tuning LLMs for each task, reducing computational overhead and potential performance degeneration on other tasks. Experimental results on GSM8K, MATH, and MBPP demonstrate the superiority of Q* over existing open-source LLMs. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A new way to make Large Language Models (LLMs) better at complex thinking is presented. Right now, LLMs can be very good at some tasks, but they sometimes make mistakes or say things that aren’t true. This paper introduces a method called Q* that helps LLMs make better decisions when they need to think about something for multiple steps. Instead of trying to teach the LLM exactly what to do, Q* teaches it how to estimate the best next step based on its past experiences. This approach is more efficient and doesn’t hurt the model’s ability to perform well on other tasks. The method is tested on several datasets and shows significant improvements over current models. |
Keywords
» Artificial intelligence » Fine tuning