Summary of Numcot: Numerals and Units Of Measurement in Chain-of-thought Reasoning Using Large Language Models, by Ancheng Xu et al.
NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models
by Ancheng Xu, Minghuan Tan, Lei Wang, Min Yang, Ruifeng Xu
First submitted to arxiv on: 5 Jun 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates how minor changes in numerical systems and units of measurement affect the performance of Large Language Models (LLMs). Existing evaluations of LLMs focus on mathematical reasoning, but ignore the impact of different numerical representations. The authors construct datasets with perturbations to examine how LLMs process numerals and units. They first dissect math word problems into sub-procedures like numeral conversions and measurement conversions based on units. Then, they annotate math word problems from ancient Chinese arithmetic works that challenge LLMs in numerals and units of measurement. The results show that LLMs struggle with handling numeral and measurement conversions. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how computers understand numbers and measurements, like inches or kilograms. Right now, people evaluate computer models on math problems without thinking about how different ways of writing numbers can make things easier or harder for the computers. The authors created special datasets to test these language models with small changes in numbers and measurements. They broke down math word problems into smaller parts, like converting words into numbers, and then tested their ideas on ancient Chinese math problems that are tricky for computers. The results show that computers still get stuck when dealing with different ways of writing numbers and measurements. |