Loading Now

Summary of Investigating Symbolic Capabilities Of Large Language Models, by Neisarg Dave et al.


Investigating Symbolic Capabilities of Large Language Models

by Neisarg Dave, Daniel Kifer, C. Lee Giles, Ankur Mali

First submitted to arxiv on: 21 May 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The abstract discusses the capabilities of Large Language Models (LLMs) in handling complex symbolic tasks such as addition, multiplication, modulus arithmetic, numerical precision, and symbolic counting. It rigorously evaluates eight LLMs, including four enterprise-grade and four open-source models, on these tasks using a framework anchored in Chomsky’s Hierarchy. The evaluation employs minimally explained prompts and the zero-shot Chain of Thoughts technique to allow models to navigate the solution process autonomously. The findings show that even fine-tuned GPT3.5 exhibits only marginal improvements, mirroring performance trends observed in other models. All models demonstrated limited generalization ability on these symbol-intensive tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
LLMs are super smart computers that can do many things, including math problems! This study wants to see how well they can solve really hard math problems using symbols like numbers and letters. The researchers tested eight different LLMs, each with its own strengths and weaknesses, to see who does best. They used special ways of asking the questions to make it fair for all the models. What they found is that even the best LLMs struggle with super-hard math problems and can’t always figure them out on their own.

Keywords

» Artificial intelligence  » Generalization  » Precision  » Zero shot