Summary of Overcoming Reward Overoptimization Via Adversarial Policy Optimization with Lightweight Uncertainty Estimation, by Xiaoying Zhang et al.
Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimationby Xiaoying Zhang, Jean-Francois Ton,…