Summary of Improve Mathematical Reasoning in Language Models by Automated Process Supervision, By Liangchen Luo et al.
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
by Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Meiqi Guo, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi
First submitted to arxiv on: 5 Jun 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A recent challenge in large language models (LLMs) is solving complex multi-step reasoning tasks, such as mathematical problems or code generation. Verifying LLM outputs with an Outcome Reward Model (ORM) has been a standard technique to enhance their performance. However, this method still falls short for tasks requiring lengthy or multi-hop reasoning chains. Process supervision addresses this limitation by assigning intermediate rewards during the reasoning process. To efficiently collect high-quality process supervision data, we propose OmegaPRM, a novel Monte Carlo Tree Search algorithm that identifies the first error in the Chain of Thought (CoT) and balances positive and negative examples. This allows for the collection of over 1.5 million annotations to train Process Reward Models (PRMs). Combining this with the weighted self-consistency algorithm, we improved the math reasoning performance of Gemini Pro and Gemma2 27B models on various datasets. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine trying to solve a difficult math problem or write code for a robot. Even very smart computer programs can struggle with these kinds of tasks. To help them do better, researchers have developed a technique called process supervision. This method gives rewards or penalties during the thinking process to improve the program’s performance. However, collecting enough data to train this system has been a big challenge. In this paper, we propose a new way to collect and use this data that can be done automatically without human help. We tested our approach with two different computer programs and saw significant improvements in their ability to solve math problems. |
Keywords
» Artificial intelligence » Gemini