Loading Now

Summary of Numcot: Numerals and Units Of Measurement in Chain-of-thought Reasoning Using Large Language Models, by Ancheng Xu et al.


NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models

by Ancheng Xu, Minghuan Tan, Lei Wang, Min Yang, Ruifeng Xu

First submitted to arxiv on: 5 Jun 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates how minor changes in numerical systems and units of measurement affect the performance of Large Language Models (LLMs). Existing evaluations of LLMs focus on mathematical reasoning, but ignore the impact of different numerical representations. The authors construct datasets with perturbations to examine how LLMs process numerals and units. They first dissect math word problems into sub-procedures like numeral conversions and measurement conversions based on units. Then, they annotate math word problems from ancient Chinese arithmetic works that challenge LLMs in numerals and units of measurement. The results show that LLMs struggle with handling numeral and measurement conversions.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper looks at how computers understand numbers and measurements, like inches or kilograms. Right now, people evaluate computer models on math problems without thinking about how different ways of writing numbers can make things easier or harder for the computers. The authors created special datasets to test these language models with small changes in numbers and measurements. They broke down math word problems into smaller parts, like converting words into numbers, and then tested their ideas on ancient Chinese math problems that are tricky for computers. The results show that computers still get stuck when dealing with different ways of writing numbers and measurements.

Keywords

» Artificial intelligence