Loading Now

Summary of Can Language Models Solve Olympiad Programming?, by Quan Shi et al.


Can Language Models Solve Olympiad Programming?

by Quan Shi, Michael Tang, Karthik Narasimhan, Shunyu Yao

First submitted to arxiv on: 16 Apr 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper introduces the USACO benchmark, a collection of 307 programming problems from the USA Computing Olympiad. The authors provide high-quality resources, including unit tests, reference code, and official analyses for each problem. These resources enable the evaluation of language models (LMs) for competitive programming. The authors find that GPT-4 achieves an 8.7% pass@1 accuracy using zero-shot chain-of-thought prompting, but their best inference method improves this to 20.2%. However, this is still far from solving the benchmark. To better understand the remaining challenges, the authors design a novel human-in-the-loop study and find that targeted hints enable GPT-4 to solve 13 out of 15 previously unsolvable problems. This paper serves as an initial step toward developing LMs with grounded, creative, and algorithmic reasoning.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research introduces a new way to test how well computer programs can solve complex math and programming problems. The authors create a big collection of these problems, called the USACO benchmark, and provide tools to help evaluate language models that can do this kind of problem-solving. They find that one type of AI model, GPT-4, is not very good at solving these problems on its own. But when they give it hints, it can solve many more problems correctly. This research helps us understand how to make AI better at solving complex math and programming problems.

Keywords

» Artificial intelligence  » Gpt  » Inference  » Prompting  » Zero shot