Summary of Potential-based Reward Shaping For Intrinsic Motivation, by Grant C. Forbes et al.
Potential-Based Reward Shaping For Intrinsic Motivation
by Grant C. Forbes, Nitish Gupta, Leonardo Villalobos-Arias, Colin M. Potts, Arnav Jhala, David L. Roberts
First submitted to arxiv on: 12 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A recent surge in intrinsic motivation (IM) reward-shaping methods has led to concerns about their potential impact on optimal policies. These methods can inadvertently alter the set of optimal behaviors, resulting in subpar performance. While previous work on mitigating these risks focused on potential-based reward shaping (PBRS), it wasn’t applicable to many IM methods due to their complexity and reliance on various variables. This paper proposes an extension to PBRS that preserves optimal policies under a broader range of functions, as well as Potential-Based Intrinsic Motivation (PBIM), which converts IM rewards into a usable potential-based form without altering optimal policies. Experiments in MiniGrid environments demonstrate the effectiveness of PBIM in preventing suboptimal policy convergence and accelerating training. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Recently, there has been a lot of research on how to motivate machines to learn new things. One way is by giving them rewards for doing certain actions. But sometimes these rewards can actually make it harder for the machine to learn what’s best. Researchers have come up with ways to fix this problem, but they didn’t work well with all types of rewards. This paper presents a new way to keep the rewards from messing things up and even makes learning faster. |