Loading Now

Summary of Mathcamps: Fine-grained Synthesis Of Mathematical Problems From Human Curricula, by Shubhra Mishra et al.


MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula

by Shubhra Mishra, Gabriel Poesia, Belinda Mo, Noah D. Goodman

First submitted to arxiv on: 1 Jul 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Mathematical problem-solving is an essential capability for Large Language Models (LLMs), serving as a proxy for various reasoning abilities. Existing benchmarks assess a wide range of skills, but aggregate accuracy metrics obscure specific strengths and weaknesses. Moreover, they are challenging to extend with new problems, risking data contamination over time. To address these challenges, the authors propose MathCAMPS: a method to generate high-quality mathematical problems at scale, grounded on 44 fine-grained “standards” from the Mathematics Common Core (CC) Standard for K-8 grades. The team encodes each standard in a formal grammar, allowing them to sample diverse symbolic problems and their answers. They then use LLMs to realize the symbolic problems into word problems. A cycle-consistency method is proposed for validating problem faithfulness. Additionally, follow-up questions are derived from symbolic structures and converted into follow-up word problems – a novel task of mathematical dialogue that probes for robustness in understanding. Experiments on 23 LLMs reveal surprising failures even in the strongest models when asked simple follow-up questions. Furthermore, training checkpoints of Pythia 12B are evaluated on MathCAMPS, enabling analysis of when particular mathematical skills develop during its training.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about teaching Large Language Models (LLMs) to solve math problems better. Right now, we test LLMs with lots of different math questions, but this makes it hard to see what they’re really good or bad at. Also, it’s difficult to add new math problems without messing up the results. The authors came up with a way to create many high-quality math problems that can be used to test LLMs. They used a special set of guidelines for math education and turned them into symbolic math problems and answers. Then, they used the LLMs to change these symbolic problems into word problems. This helps us understand if the LLMs are really good at understanding math or not. Surprisingly, even the best LLMs didn’t do well when asked simple follow-up questions. The authors also looked at how Pythia 12B learned math skills over time.

Keywords

» Artificial intelligence