Summary of Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning Of Language Models, by Changyu Chen et al.
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
by Changyu Chen, Xiting Wang, Ting-En Lin, Ang Lv, Yuchuan Wu, Xin Gao, Ji-Rong Wen, Rui Yan, Yongbin Li
First submitted to arxiv on: 4 Mar 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a novel method to improve the performance of large language models in reasoning tasks by introducing perturbations to the input data. The approach, which masks certain tokens within the chain of thought, achieves a 5% improvement in GSM8K accuracy and a 10% improvement in GSM-IC accuracy over standard supervised fine-tuning. This method is complementary to existing techniques and can be integrated with explicit data augmentation methods to improve performance across multiple datasets and base models. The paper also provides insights into the mechanisms behind this improvement through case studies and quantitative analysis, suggesting that it may help capture long-distance dependencies in language processing. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps us make big language models better at understanding questions by adding a little bit of noise to the information they learn from. Instead of using more human helpers or bigger models, researchers found that making some parts of the input data missing can actually improve results. This new technique works well with other ways to help language models get smarter and can be used for different types of tasks and datasets. By looking at how this method works in practice, scientists hope to learn more about how language models think and make better decisions. |
Keywords
* Artificial intelligence * Data augmentation * Fine tuning * Supervised