Summary of Subtle Errors Matter: Preference Learning Via Error-injected Self-editing, by Kaishuai Xu et al.

Subtle Errors Matter: Preference Learning via Error-injected Self-editing

by Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li

First submitted to arxiv on: 9 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Large Language Models (LLMs) have shown impressive mathematical abilities, solving arithmetic problems up to competition-level math challenges. However, they often make subtle yet critical errors like miscalculations or incorrect substitutions, hindering their full potential. Existing approaches to improve LLMs’ mathematical skills involve applying preference learning to step-wise solution pairs, but overlook crucial subtle errors. This study proposes a novel preference learning framework called eRror-Injected Self-Editing (RISE) that injects predefined subtle errors into critical tokens in reasoning or computation steps to construct hard pairs for error mitigation. RISE uses the LLM itself to edit a few tokens in the solution, injecting designed subtle errors. The paper then uses these self-edited solutions and their correct counterparts, along with incorrect solutions obtained through sampling, for subtle error-aware DPO training. Compared to other preference learning methods, RISE refines the training objective without requiring fine-grained sampling or preference annotation. Extensive experiments demonstrate the effectiveness of RISE, achieving notable improvements on GSM8K and MATH benchmarks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large Language Models are really good at math, but they sometimes make mistakes that can’t be fixed easily. To help them learn from these mistakes, researchers have developed a new way to train LLMs called eRror-Injected Self-Editing (RISE). RISE takes the LLM’s own mistakes and corrects them by adding small errors into the solution. This helps the LLM learn to avoid making these same mistakes in the future.

Keywords

» Artificial intelligence

Subtle Errors Matter: Preference Learning via Error-injected Self-editing

by Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Conceptagent: Llm-driven Precondition Grounding and Tree Search For Robust Task Planning and Execution, by Corban Rivera et al.

Summary of Mentalarena: Self-play Training Of Language Models For Diagnosis and Treatment Of Mental Health Disorders, by Cheng Li et al.

Related Posts