Summary of Guiding Through Complexity: What Makes Good Supervision For Hard Math Reasoning Tasks?, by Xuan He et al.
Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks?
by Xuan He, Da Yin, Nanyun Peng
First submitted to arxiv on: 27 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper explores the potential of “weak teacher models” to improve large language model (LLM) performance on complex reasoning tasks. Specifically, it investigates how annotators or existing AI systems can provide effective supervision for LLMs to enhance their ability to tackle challenging problems that require expertise or practice. The authors propose two strategies: using lower-quality supervision from complete tasks that match the difficulty of the target task, and leveraging higher-quality supervision from easier subtasks. Surprisingly, they find that even with high error rates (up to 90%), training on hard task supervision can outperform perfectly correct supervision of easier subtasks on multiple math benchmarks. Furthermore, the authors identify step-wise error rates as a critical factor influencing training performance, leading to a 30% accuracy gap on MATH benchmark. The results also suggest that supplementing hard task supervision with corresponding subtask supervision can yield notable improvements. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how people or AI systems can help train language models to do better on tricky math problems. They want to know if these “weak teacher models” can improve the language models’ performance even when they’re not perfect themselves. The researchers try two approaches: using lower-quality supervision from complete tasks that match the difficulty of the target problem, and using higher-quality supervision from easier parts of the problem. They find that even with mistakes (up to 90%), training on hard task supervision can do better than training on perfectly correct supervision of easier problems. They also discover that how big the mistake is matters a lot, leading to a difference in accuracy of 30%. Finally, they show that combining hard task supervision with supervision from easier parts of the problem can lead to even bigger improvements. |
Keywords
» Artificial intelligence » Large language model