Summary of Q-wsl: Optimizing Goal-conditioned Rl with Weighted Supervised Learning Via Dynamic Programming, by Xing Lei et al.
Q-WSL: Optimizing Goal-Conditioned RL with Weighted Supervised Learning via Dynamic Programming
by Xing Lei, Xuetao Zhang, Zifeng Zhuang, Donglin Wang
First submitted to arxiv on: 9 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel class of algorithms, Goal-Conditioned Weighted Supervised Learning (GCWSL), has emerged to tackle sparse rewards in goal-conditioned reinforcement learning (RL). GCWSL delivers strong performance across diverse tasks due to its simplicity, effectiveness, and stability. However, GCWSL lacks trajectory stitching capability, essential for optimal policies when faced with unseen skills during testing. Traditional TD-based RL methods, such as Q-learning, utilize Dynamic Programming but often experience instability in value function approximation. This paper proposes Q-WSL, a novel framework that incorporates strengths of Dynamic Programming to overcome GCWSL limitations. Q-WSL leverages Dynamic Programming results to output optimal actions across different trajectories within the replay buffer, synergizing strengths of both Q-learning and GCWSL. Empirical evaluations demonstrate Q-WSL surpasses other goal-conditioned approaches in terms of performance and sample efficiency. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Goal-reaching tasks can be challenging when rewards are sparse. A new approach called Goal-Conditioned Weighted Supervised Learning (GCWSL) has been developed to tackle this issue. GCWSL works well but has a limitation – it doesn’t work well with unseen skills during testing. Other approaches, like Q-learning, use Dynamic Programming but can be unstable. This paper proposes a new approach called Q-WSL that combines the strengths of both GCWSL and Q-learning. Q-WSL uses Dynamic Programming to choose the best action for different situations. Tests show that Q-WSL works better than other approaches in goal-reaching tasks. |
Keywords
* Artificial intelligence * Reinforcement learning * Supervised