Loading Now

Summary of Subtle Errors Matter: Preference Learning Via Error-injected Self-editing, by Kaishuai Xu et al.


Subtle Errors Matter: Preference Learning via Error-injected Self-editing

by Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li

First submitted to arxiv on: 9 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Large Language Models (LLMs) have shown impressive mathematical abilities, solving arithmetic problems up to competition-level math challenges. However, they often make subtle yet critical errors like miscalculations or incorrect substitutions, hindering their full potential. Existing approaches to improve LLMs’ mathematical skills involve applying preference learning to step-wise solution pairs, but overlook crucial subtle errors. This study proposes a novel preference learning framework called eRror-Injected Self-Editing (RISE) that injects predefined subtle errors into critical tokens in reasoning or computation steps to construct hard pairs for error mitigation. RISE uses the LLM itself to edit a few tokens in the solution, injecting designed subtle errors. The paper then uses these self-edited solutions and their correct counterparts, along with incorrect solutions obtained through sampling, for subtle error-aware DPO training. Compared to other preference learning methods, RISE refines the training objective without requiring fine-grained sampling or preference annotation. Extensive experiments demonstrate the effectiveness of RISE, achieving notable improvements on GSM8K and MATH benchmarks.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large Language Models are really good at math, but they sometimes make mistakes that can’t be fixed easily. To help them learn from these mistakes, researchers have developed a new way to train LLMs called eRror-Injected Self-Editing (RISE). RISE takes the LLM’s own mistakes and corrects them by adding small errors into the solution. This helps the LLM learn to avoid making these same mistakes in the future.

Keywords

» Artificial intelligence