Summary of No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in Ppo, by Skander Moalla et al.
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
by Skander Moalla, Andrea Miele, Daniil Pyatko, Razvan Pascanu, Caglar Gulcehre
First submitted to arxiv on: 1 May 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Deep reinforcement learning (RL) methods must adapt to changing observations and targets during training. Previous works have observed that networks trained under non-stationarity exhibit a loss of plasticity and eventually collapse in performance. This phenomenon has been correlated with a decrease in representation rank and capacity loss for off-policy deep value-based RL methods. However, the connection to representation dynamics has not been studied in on-policy policy optimization methods. This paper empirically studies representation dynamics in Proximal Policy Optimization (PPO) on Atari and MuJoCo environments, revealing that PPO agents are affected by feature rank deterioration and capacity loss, which is exacerbated by stronger non-stationarity. The authors show that the actor’s performance collapses regardless of the critic’s performance. They find a connection between representation collapse and trust region degradation, ultimately leading to performance collapse. To mitigate this collapse, they propose Proximal Feature Optimization (PFO), a novel auxiliary loss that regularizes representation dynamics. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Reinforcement learning is an area of artificial intelligence where machines learn from rewards or punishments. The paper talks about how some deep reinforcement learning methods have trouble adapting to changing situations during training. This can cause the method to stop improving and eventually perform poorly. Researchers studied this issue in a specific type of deep reinforcement learning called Proximal Policy Optimization (PPO). They found that PPO agents also experience this problem, which is connected to changes in how they represent information. The authors then proposed a new way to improve this representation, called Proximal Feature Optimization (PFO), which helps mitigate the problem. |
Keywords
» Artificial intelligence » Optimization » Reinforcement learning