Summary of Offline Regularised Reinforcement Learning For Large Language Models Alignment, by Pierre Harvey Richemond et al.
Offline Regularised Reinforcement Learning for Large Language Models Alignmentby Pierre Harvey Richemond, Yunhao Tang, Daniel…