Loading Now

Summary of Next-token Prediction Task Assumes Optimal Data Ordering For Llm Training in Proof Generation, by Chenyang An et al.


Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation

by Chenyang An, Shima Imani, Feng Yao, Chengyu Dong, Ali Abbasi, Harsh Shrivastava, Samuel Buss, Jingbo Shang, Gayathri Mahalingam, Pramod Sharma, Maurice Diesendruck

First submitted to arxiv on: 30 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper explores the limitations of large language model (LLM)-based proof generation. Despite training on extensive datasets like OpenWebMath and Arxiv, LLMs exhibit only modest performance on proving tasks of moderate difficulty. The authors propose that this is partly due to the suboptimal order in which proof data is used for training. They introduce the concept of intuitively sequential order, where intermediate supervision for a particular proof step is always positioned to the left of that step. This order aims to facilitate learning and verification of proofs rather than just facilitating verification. The authors validate their claims using two tasks: intuitionistic propositional logic theorem-proving and digit multiplication. Their experiments show that training with the optimal order leads to an 11 percent improvement in proof success rate, highlighting the significance of ordering for effective proof generation.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about how computers can generate proofs (step-by-step explanations) using large amounts of text data. Currently, these computer programs are not very good at generating proofs. The authors think this might be because the way they learn from examples is not very helpful for figuring out the steps needed to prove something. They suggest a different way of learning, where each step in the proof is explained before the next one, making it easier for both computers and people to understand. To test their idea, they tried using two types of math problems: logical puzzles and simple arithmetic. The results show that this new way of learning helps computers generate better proofs.

Keywords

» Artificial intelligence  » Large language model