Summary of Dred: Zero-shot Transfer in Reinforcement Learning Via Data-regularised Environment Design, by Samuel Garcin et al.
DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design
by Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
First submitted to arxiv on: 5 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Autonomous agents trained using deep reinforcement learning (RL) often struggle to generalize to new environments, even when these environments share characteristics with those they’ve encountered during training. This work investigates how sampling individual environment instances affects the zero-shot generalization ability of RL agents. We find that prioritizing levels according to their value loss minimizes the mutual information between the agent’s internal representation and the set of training levels in the generated data, providing a novel theoretical justification for certain adaptive sampling strategies. We also explore unsupervised environment design methods, which assume control over level generation, and discover they can significantly shift the training distribution, leading to low zero-shot generalization performance. To prevent both overfitting and distributional shift, we introduce data-regularized environment design (DRED), generating levels using a generative model trained to approximate the ground truth distribution of initial set parameters. DRED achieves significant improvements in zero-shot generalization over adaptive level sampling strategies and UED methods. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Autonomous agents can’t generalize well to new environments, even if they share characteristics with those they’ve seen before. This paper looks at how agents are trained using deep reinforcement learning (RL) and finds that the way individual environment instances are chosen affects how well the agent generalizes. The research also explores ways to design new environments for training RL agents and finds that current methods can make it harder for agents to generalize. To solve this problem, the paper proposes a new method called data-regularized environment design (DRED), which generates levels in a way that helps agents generalize better. |
Keywords
* Artificial intelligence * Generalization * Generative model * Overfitting * Reinforcement learning * Unsupervised * Zero shot