Summary of Enhancing Safety in Reinforcement Learning with Human Feedback Via Rectified Policy Optimization, by Xiyue Peng et al.
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimizationby Xiyue Peng, Hengquan…