Loading Now

Summary of No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in Ppo, by Skander Moalla et al.


No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO

by Skander Moalla, Andrea Miele, Daniil Pyatko, Razvan Pascanu, Caglar Gulcehre

First submitted to arxiv on: 1 May 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Deep reinforcement learning (RL) methods must adapt to changing observations and targets during training. Previous works have observed that networks trained under non-stationarity exhibit a loss of plasticity and eventually collapse in performance. This phenomenon has been correlated with a decrease in representation rank and capacity loss for off-policy deep value-based RL methods. However, the connection to representation dynamics has not been studied in on-policy policy optimization methods. This paper empirically studies representation dynamics in Proximal Policy Optimization (PPO) on Atari and MuJoCo environments, revealing that PPO agents are affected by feature rank deterioration and capacity loss, which is exacerbated by stronger non-stationarity. The authors show that the actor’s performance collapses regardless of the critic’s performance. They find a connection between representation collapse and trust region degradation, ultimately leading to performance collapse. To mitigate this collapse, they propose Proximal Feature Optimization (PFO), a novel auxiliary loss that regularizes representation dynamics.
Low GrooveSquid.com (original content) Low Difficulty Summary
Reinforcement learning is an area of artificial intelligence where machines learn from rewards or punishments. The paper talks about how some deep reinforcement learning methods have trouble adapting to changing situations during training. This can cause the method to stop improving and eventually perform poorly. Researchers studied this issue in a specific type of deep reinforcement learning called Proximal Policy Optimization (PPO). They found that PPO agents also experience this problem, which is connected to changes in how they represent information. The authors then proposed a new way to improve this representation, called Proximal Feature Optimization (PFO), which helps mitigate the problem.

Keywords

» Artificial intelligence  » Optimization  » Reinforcement learning