Summary of Numcot: Numerals and Units Of Measurement in Chain-of-thought Reasoning Using Large Language Models, by Ancheng Xu et al.

NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models

by Ancheng Xu, Minghuan Tan, Lei Wang, Min Yang, Ruifeng Xu

First submitted to arxiv on: 5 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper investigates how minor changes in numerical systems and units of measurement affect the performance of Large Language Models (LLMs). Existing evaluations of LLMs focus on mathematical reasoning, but ignore the impact of different numerical representations. The authors construct datasets with perturbations to examine how LLMs process numerals and units. They first dissect math word problems into sub-procedures like numeral conversions and measurement conversions based on units. Then, they annotate math word problems from ancient Chinese arithmetic works that challenge LLMs in numerals and units of measurement. The results show that LLMs struggle with handling numeral and measurement conversions.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper looks at how computers understand numbers and measurements, like inches or kilograms. Right now, people evaluate computer models on math problems without thinking about how different ways of writing numbers can make things easier or harder for the computers. The authors created special datasets to test these language models with small changes in numbers and measurements. They broke down math word problems into smaller parts, like converting words into numbers, and then tested their ideas on ancient Chinese math problems that are tricky for computers. The results show that computers still get stuck when dealing with different ways of writing numbers and measurements.

Keywords

» Artificial intelligence

NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models

by Ancheng Xu, Minghuan Tan, Lei Wang, Min Yang, Ruifeng Xu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-supervised Learning, by Lukas Christ et al.

Summary of Da-flow: Dual Attention Normalizing Flow For Skeleton-based Video Anomaly Detection, by Ruituo Wu et al.

Related Posts