Loading Now

Summary of A Large Language Model-driven Reward Design Framework Via Dynamic Feedback For Reinforcement Learning, by Shengjie Sun et al.


A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning

by Shengjie Sun, Runze Liu, Jiafei Lyu, Jing-Wen Yang, Liangpeng Zhang, Xiu Li

First submitted to arxiv on: 18 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
CARD proposes a Large Language Models (LLMs)-driven Reward Design framework called CARD that iteratively generates and improves reward function code. The framework consists of a Coder that generates and verifies code, an Evaluator that provides dynamic feedback to guide the Coder, eliminating the need for human intervention or repetitive RL training. CARD also introduces Trajectory Preference Evaluation (TPE) to evaluate the current reward function based on trajectory preferences. Empirical results show that CARD achieves an effective balance between task performance and token efficiency, outperforming or matching baselines across all tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
CARD helps design better reward functions for Reinforcement Learning (RL) tasks using Large Language Models (LLMs). This makes it easier to get good rewards without needing human help. The system uses a Coder that generates code and an Evaluator that gives feedback to make the code better. It also has something called Trajectory Preference Evaluation that checks if the reward is working well or not. Tests show that this method works well and gets good results on many tasks.

Keywords

» Artificial intelligence  » Reinforcement learning  » Token