Summary of How the Level Sampling Process Impacts Zero-shot Generalisation in Deep Reinforcement Learning, by Samuel Garcin et al.
How the level sampling process impacts zero-shot generalisation in deep reinforcement learning
by Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
First submitted to arxiv on: 5 Oct 2023
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the limitations of autonomous agents trained via deep reinforcement learning (RL) when generalizing to new environments. The authors find that a non-uniform sampling strategy for individual environment instances affects zero-shot generalization (ZSG) ability, with overfitting and over-generalization as potential failure modes. They introduce mutual information (MI) between the agent’s internal representation and training levels, which is well-correlated to instance overfitting. Adaptive sampling strategies prioritizing levels based on value loss are more effective at maintaining lower MI. The authors also explore unsupervised environment design (UED) methods, which adaptively generate new training levels but shift the distribution, leading to worse ZSG performance. To address this, they introduce self-supervised environment design (SSED), which generates levels using a variational autoencoder and reduces MI while minimizing the distribution shift, resulting in statistically significant improvements in ZSG. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research paper looks at why computers trained to make decisions on their own struggle to learn from new situations. The authors tested different ways of training these computers and found that some methods are better than others at letting them generalize to new environments. They also created a new method called self-supervised environment design, which helps computers learn more effectively without needing additional information or supervision. |
Keywords
* Artificial intelligence * Generalization * Overfitting * Reinforcement learning * Self supervised * Unsupervised * Variational autoencoder * Zero shot