Summary of Mentor: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint, by Xinglin Zhou et al.
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint
by Xinglin Zhou, Yifu Yuan, Shaofu Yang, Jianye Hao
First submitted to arxiv on: 22 Feb 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a hierarchical reinforcement learning framework, called MENTOR, which incorporates human feedback and dynamic distance constraints to improve the stability and efficiency of the learning process. By dividing tasks into subgoals and completing them sequentially, HRL has shown promise for complex tasks with sparse rewards. However, current methods struggle to find suitable subgoals without additional guidance. MENTOR addresses this issue by using human feedback to inform high-level policy learning, while also designing a dual policy for exploration-exploitation decoupling at the low level. The framework also includes a Dynamic Distance Constraint (DDC) mechanism that adjusts the space of optional subgoals based on the difficulty and ease of the task. Experimental results demonstrate significant improvement in complex tasks with sparse rewards using MENTOR, requiring only a small amount of human feedback. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps computers learn to do better jobs by breaking down big tasks into smaller ones and completing them one by one. It’s like having a teacher help you figure out the right way to solve a puzzle. The new method uses a little bit of help from humans to make sure the computer is learning correctly, and it makes sure the computer doesn’t get stuck on easy or hard parts. This can help computers do harder tasks that are really important for things like self-driving cars and medical research. |
Keywords
* Artificial intelligence * Reinforcement learning