Summary of A Large Language Model-driven Reward Design Framework Via Dynamic Feedback For Reinforcement Learning, by Shengjie Sun et al.
A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning
by Shengjie Sun, Runze Liu, Jiafei Lyu, Jing-Wen Yang, Liangpeng Zhang, Xiu Li
First submitted to arxiv on: 18 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary CARD proposes a Large Language Models (LLMs)-driven Reward Design framework called CARD that iteratively generates and improves reward function code. The framework consists of a Coder that generates and verifies code, an Evaluator that provides dynamic feedback to guide the Coder, eliminating the need for human intervention or repetitive RL training. CARD also introduces Trajectory Preference Evaluation (TPE) to evaluate the current reward function based on trajectory preferences. Empirical results show that CARD achieves an effective balance between task performance and token efficiency, outperforming or matching baselines across all tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary CARD helps design better reward functions for Reinforcement Learning (RL) tasks using Large Language Models (LLMs). This makes it easier to get good rewards without needing human help. The system uses a Coder that generates code and an Evaluator that gives feedback to make the code better. It also has something called Trajectory Preference Evaluation that checks if the reward is working well or not. Tests show that this method works well and gets good results on many tasks. |
Keywords
» Artificial intelligence » Reinforcement learning » Token