Loading Now

Summary of Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning Of Language Models, by Changyu Chen et al.


Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models

by Changyu Chen, Xiting Wang, Ting-En Lin, Ang Lv, Yuchuan Wu, Xin Gao, Ji-Rong Wen, Rui Yan, Yongbin Li

First submitted to arxiv on: 4 Mar 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a novel method to improve the performance of large language models in reasoning tasks by introducing perturbations to the input data. The approach, which masks certain tokens within the chain of thought, achieves a 5% improvement in GSM8K accuracy and a 10% improvement in GSM-IC accuracy over standard supervised fine-tuning. This method is complementary to existing techniques and can be integrated with explicit data augmentation methods to improve performance across multiple datasets and base models. The paper also provides insights into the mechanisms behind this improvement through case studies and quantitative analysis, suggesting that it may help capture long-distance dependencies in language processing.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps us make big language models better at understanding questions by adding a little bit of noise to the information they learn from. Instead of using more human helpers or bigger models, researchers found that making some parts of the input data missing can actually improve results. This new technique works well with other ways to help language models get smarter and can be used for different types of tasks and datasets. By looking at how this method works in practice, scientists hope to learn more about how language models think and make better decisions.

Keywords

* Artificial intelligence  * Data augmentation  * Fine tuning  * Supervised