Loading Now

Summary of Tackling Data Corruption in Offline Reinforcement Learning Via Sequence Modeling, by Jiawei Xu et al.


Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling

by Jiawei Xu, Rui Yang, Shuang Qiu, Feng Luo, Meng Fang, Baoxiang Wang, Lei Han

First submitted to arxiv on: 5 Jul 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Offline reinforcement learning (RL) has the potential to scale data-driven decision-making while avoiding costly online interactions. However, real-world datasets often contain noise and errors, posing a significant challenge for existing offline RL methods, particularly when the dataset is limited. Our study reveals that adapting predominant offline RL methods based on temporal difference learning still falls short under data corruption when the dataset is limited. In contrast, vanilla sequence modeling methods, such as Decision Transformer (DT), exhibit robustness against data corruption even without specialized modifications. We propose Robust Decision Rransformer (RDT) by incorporating three simple yet effective robust techniques: embedding dropout to improve the model’s robustness against erroneous inputs, Gaussian weighted learning to mitigate the effects of corrupted labels, and iterative data correction to eliminate corrupted data from the source. Extensive experiments on MuJoCo, Kitchen, and Adroit tasks demonstrate RDT’s superior performance under various data corruption scenarios compared to prior methods.
Low GrooveSquid.com (original content) Low Difficulty Summary
This study explores how we can learn from real-world data that has mistakes or noise in it. Right now, most offline reinforcement learning (RL) methods struggle with this kind of data. But some new sequence modeling methods, like Decision Transformer, are actually quite good at handling noisy data. To make these methods even better, we propose a new approach called Robust Decision Rransformer that includes three simple techniques to help the model learn from noisy data. We tested our method on several different tasks and found that it performs much better than other methods when dealing with noisy data.

Keywords

* Artificial intelligence  * Dropout  * Embedding  * Reinforcement learning  * Transformer