Loading Now

Summary of Potential-based Reward Shaping For Intrinsic Motivation, by Grant C. Forbes et al.


Potential-Based Reward Shaping For Intrinsic Motivation

by Grant C. Forbes, Nitish Gupta, Leonardo Villalobos-Arias, Colin M. Potts, Arnav Jhala, David L. Roberts

First submitted to arxiv on: 12 Feb 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A recent surge in intrinsic motivation (IM) reward-shaping methods has led to concerns about their potential impact on optimal policies. These methods can inadvertently alter the set of optimal behaviors, resulting in subpar performance. While previous work on mitigating these risks focused on potential-based reward shaping (PBRS), it wasn’t applicable to many IM methods due to their complexity and reliance on various variables. This paper proposes an extension to PBRS that preserves optimal policies under a broader range of functions, as well as Potential-Based Intrinsic Motivation (PBIM), which converts IM rewards into a usable potential-based form without altering optimal policies. Experiments in MiniGrid environments demonstrate the effectiveness of PBIM in preventing suboptimal policy convergence and accelerating training.
Low GrooveSquid.com (original content) Low Difficulty Summary
Recently, there has been a lot of research on how to motivate machines to learn new things. One way is by giving them rewards for doing certain actions. But sometimes these rewards can actually make it harder for the machine to learn what’s best. Researchers have come up with ways to fix this problem, but they didn’t work well with all types of rewards. This paper presents a new way to keep the rewards from messing things up and even makes learning faster.

Keywords

* Artificial intelligence