Summary of Q-value Regularized Decision Convformer For Offline Reinforcement Learning, by Teng Yan et al.

Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning

by Teng Yan, Zhendong Ruan, Yaobang Cai, Yu Han, Wenxian Li, Yang Zhang

First submitted to arxiv on: 12 Sep 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Offline reinforcement learning has been framed as sequence modeling, where the Decision Transformer (DT) excels. Unlike previous methods fitting value functions or computing policy gradients, DT adjusts an autoregressive model based on expected returns, past states, and actions using a causally masked Transformer to output optimal actions. However, inconsistent sampled returns within single trajectories and optimal returns across multiple trajectories make it challenging to set expected returns and stitch together suboptimal trajectories. The Decision ConvFormer (DC) is easier to understand in Markov Decision Process context compared to DT. We propose the Q-value Regularized Decision ConvFormer (QDC), combining DC’s understanding of RL trajectories with a term maximizing action values using dynamic programming during training, ensuring consistent expected returns. QDC achieves excellent performance on the D4RL benchmark, outperforming or approaching optimal levels in all tested environments, demonstrating outstanding competitiveness in trajectory stitching capability.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper is about improving how computers learn from experiences without direct feedback. They use a new method called Decision ConvFormer (QDC) to make better decisions. QDC combines two ideas: understanding what actions are best in different situations and making sure the computer’s “memory” of those situations is correct. This allows QDC to make excellent decisions, beating other methods on tests.

Keywords

» Artificial intelligence » Autoregressive » Reinforcement learning » Transformer

Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning

by Teng Yan, Zhendong Ruan, Yaobang Cai, Yu Han, Wenxian Li, Yang Zhang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Control+shift: Generating Controllable Distribution Shifts, by Roy Friedman and Rhea Chowers

Summary of Fine-tuning Large Language Models For Entity Matching, by Aaron Steiner et al.

Related Posts