Summary of A Large Language Model-driven Reward Design Framework Via Dynamic Feedback For Reinforcement Learning, by Shengjie Sun et al.

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning

by Shengjie Sun, Runze Liu, Jiafei Lyu, Jing-Wen Yang, Liangpeng Zhang, Xiu Li

First submitted to arxiv on: 18 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary CARD proposes a Large Language Models (LLMs)-driven Reward Design framework called CARD that iteratively generates and improves reward function code. The framework consists of a Coder that generates and verifies code, an Evaluator that provides dynamic feedback to guide the Coder, eliminating the need for human intervention or repetitive RL training. CARD also introduces Trajectory Preference Evaluation (TPE) to evaluate the current reward function based on trajectory preferences. Empirical results show that CARD achieves an effective balance between task performance and token efficiency, outperforming or matching baselines across all tasks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary CARD helps design better reward functions for Reinforcement Learning (RL) tasks using Large Language Models (LLMs). This makes it easier to get good rewards without needing human help. The system uses a Coder that generates code and an Evaluator that gives feedback to make the code better. It also has something called Trajectory Preference Evaluation that checks if the reward is working well or not. Tests show that this method works well and gets good results on many tasks.

Keywords

* Artificial intelligence * Reinforcement learning * Token

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning

by Shengjie Sun, Runze Liu, Jiafei Lyu, Jing-Wen Yang, Liangpeng Zhang, Xiu Li

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Bridging the Training-inference Gap in Llms by Leveraging Self-generated Tokens, By Zhepeng Cen and Yao Liu and Siliang Zeng and Pratik Chaudhari and Huzefa Rangwala and George Karypis and Rasool Fakoor

Summary of Online Reinforcement Learning with Passive Memory, by Anay Pattanaik and Lav R. Varshney

Related Posts