Summary of Automated Rewards Via Llm-generated Progress Functions, by Vishnu Sarukkai et al.
Automated Rewards via LLM-Generated Progress Functions
by Vishnu Sarukkai, Brennan Shacklett, Zander Majercik, Kush Bhatia, Christopher Ré, Kayvon Fatahalian
First submitted to arxiv on: 11 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed framework uses Large Language Models (LLMs) to automate reward engineering by generating effective reward functions with significantly fewer samples than the prior state-of-the-art work. The approach leverages LLMs’ broad domain knowledge and code synthesis abilities to author progress functions that estimate task progress from a given state, reducing the problem of generating task-specific rewards. This two-step solution generates count-based intrinsic rewards using the low-dimensional state space, which is essential for performance gains. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large Language Models have the potential to automate reward engineering by using their broad domain knowledge across various tasks. However, they often need many iterations of trial-and-error to generate effective reward functions. The proposed framework introduces an LLM-driven reward generation framework that produces state-of-the-art policies on a challenging benchmark with 20 times fewer reward function samples than before. The approach reduces the problem of generating task-specific rewards to coarsely estimating task progress, then uses this notion of progress to discretize states and generate count-based intrinsic rewards. |