Summary of Generalized Preference Optimization: a Unified Approach to Offline Alignment, by Yunhao Tang et al.
Generalized Preference Optimization: A Unified Approach to Offline Alignmentby Yunhao Tang, Zhaohan Daniel Guo, Zeyu…
Generalized Preference Optimization: A Unified Approach to Offline Alignmentby Yunhao Tang, Zhaohan Daniel Guo, Zeyu…
Personalized Language Modeling from Personalized Human Feedbackby Xinyu Li, Ruiyang Zhou, Zachary C. Lipton, Liu…
A Theoretical Framework for Partially Observed Reward-States in RLHFby Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano,…
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedbackby Gaurav Pandey, Yatin Nandwani,…
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedbackby Yifu Yuan,…
Dense Reward for Free in Reinforcement Learning from Human Feedbackby Alex J. Chan, Hao Sun,…
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensembleby Shun Zhang, Zhenfang Chen,…
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHFby Banghua Zhu, Michael I. Jordan,…
Secrets of RLHF in Large Language Models Part II: Reward Modelingby Binghai Wang, Rui Zheng,…
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensemblesby Yuanzhao Zhai, Han Zhang,…