Loading Now

Summary of From Code to Play: Benchmarking Program Search For Games Using Large Language Models, by Manuel Eberhardinger et al.


From Code to Play: Benchmarking Program Search for Games Using Large Language Models

by Manuel Eberhardinger, James Goodman, Alexander Dockhorn, Diego Perez-Liebana, Raluca D. Gaina, Duygu Çakmak, Setareh Maghsudi, Simon Lucas

First submitted to arxiv on: 5 Dec 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper explores the potential of large language models (LLMs) in directly synthesizing usable code for various gaming applications. The researchers used an evolutionary hill-climbing algorithm, controlled by LLMs, to generate code for Python and Java. They evaluated 12 models for Python and 8 for Java across 29 tasks, including miniature versions of Atari games, levels of Baba is You, and a maze generation task. The findings suggest that the performance of LLMs depends more on the task than model size. Larger models generated more executable programs but did not always result in higher-quality solutions.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models can generate code for different types of games. Researchers used these models to make computer programs that could play mini-games and solve puzzles. They tested many different models to see which ones worked best. The results showed that the model’s size doesn’t matter as much as what type of game it is trying to solve.

Keywords

» Artificial intelligence