Summary of Q-value Regularized Decision Convformer For Offline Reinforcement Learning, by Teng Yan et al.
Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning
by Teng Yan, Zhendong Ruan, Yaobang Cai, Yu Han, Wenxian Li, Yang Zhang
First submitted to arxiv on: 12 Sep 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Robotics (cs.RO)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Offline reinforcement learning has been framed as sequence modeling, where the Decision Transformer (DT) excels. Unlike previous methods fitting value functions or computing policy gradients, DT adjusts an autoregressive model based on expected returns, past states, and actions using a causally masked Transformer to output optimal actions. However, inconsistent sampled returns within single trajectories and optimal returns across multiple trajectories make it challenging to set expected returns and stitch together suboptimal trajectories. The Decision ConvFormer (DC) is easier to understand in Markov Decision Process context compared to DT. We propose the Q-value Regularized Decision ConvFormer (QDC), combining DC’s understanding of RL trajectories with a term maximizing action values using dynamic programming during training, ensuring consistent expected returns. QDC achieves excellent performance on the D4RL benchmark, outperforming or approaching optimal levels in all tested environments, demonstrating outstanding competitiveness in trajectory stitching capability. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about improving how computers learn from experiences without direct feedback. They use a new method called Decision ConvFormer (QDC) to make better decisions. QDC combines two ideas: understanding what actions are best in different situations and making sure the computer’s “memory” of those situations is correct. This allows QDC to make excellent decisions, beating other methods on tests. |
Keywords
» Artificial intelligence » Autoregressive » Reinforcement learning » Transformer