Loading Now

Summary of Automated Rewards Via Llm-generated Progress Functions, by Vishnu Sarukkai et al.


Automated Rewards via LLM-Generated Progress Functions

by Vishnu Sarukkai, Brennan Shacklett, Zander Majercik, Kush Bhatia, Christopher Ré, Kayvon Fatahalian

First submitted to arxiv on: 11 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed framework uses Large Language Models (LLMs) to automate reward engineering by generating effective reward functions with significantly fewer samples than the prior state-of-the-art work. The approach leverages LLMs’ broad domain knowledge and code synthesis abilities to author progress functions that estimate task progress from a given state, reducing the problem of generating task-specific rewards. This two-step solution generates count-based intrinsic rewards using the low-dimensional state space, which is essential for performance gains.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large Language Models have the potential to automate reward engineering by using their broad domain knowledge across various tasks. However, they often need many iterations of trial-and-error to generate effective reward functions. The proposed framework introduces an LLM-driven reward generation framework that produces state-of-the-art policies on a challenging benchmark with 20 times fewer reward function samples than before. The approach reduces the problem of generating task-specific rewards to coarsely estimating task progress, then uses this notion of progress to discretize states and generate count-based intrinsic rewards.

Keywords

* Artificial intelligence