Summary of Can Language Models Solve Olympiad Programming?, by Quan Shi et al.

Can Language Models Solve Olympiad Programming?

by Quan Shi, Michael Tang, Karthik Narasimhan, Shunyu Yao

First submitted to arxiv on: 16 Apr 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper introduces the USACO benchmark, a collection of 307 programming problems from the USA Computing Olympiad. The authors provide high-quality resources, including unit tests, reference code, and official analyses for each problem. These resources enable the evaluation of language models (LMs) for competitive programming. The authors find that GPT-4 achieves an 8.7% pass@1 accuracy using zero-shot chain-of-thought prompting, but their best inference method improves this to 20.2%. However, this is still far from solving the benchmark. To better understand the remaining challenges, the authors design a novel human-in-the-loop study and find that targeted hints enable GPT-4 to solve 13 out of 15 previously unsolvable problems. This paper serves as an initial step toward developing LMs with grounded, creative, and algorithmic reasoning.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This research introduces a new way to test how well computer programs can solve complex math and programming problems. The authors create a big collection of these problems, called the USACO benchmark, and provide tools to help evaluate language models that can do this kind of problem-solving. They find that one type of AI model, GPT-4, is not very good at solving these problems on its own. But when they give it hints, it can solve many more problems correctly. This research helps us understand how to make AI better at solving complex math and programming problems.

Keywords

* Artificial intelligence * Gpt * Inference * Prompting * Zero shot

Can Language Models Solve Olympiad Programming?

by Quan Shi, Michael Tang, Karthik Narasimhan, Shunyu Yao

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Causal Effect Estimation Using Random Hyperplane Tessellations, by Abhishek Dalvi et al.

Summary of Dupe: Detection Undermining Via Prompt Engineering For Deepfake Text, by James Weichert and Chinecherem Dimobi

Related Posts