Summary of Active Preference Optimization For Sample Efficient Rlhf, by Nirjhar Das et al.
Active Preference Optimization for Sample Efficient RLHFby Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray…
Active Preference Optimization for Sample Efficient RLHFby Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray…
Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilizationby Yihan Du, Anna Winnicki,…
Reward Generalization in RLHF: A Topological Perspectiveby Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan,…
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language…
InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modelingby Yuchun Miao, Sen Zhang, Liang…
Reinforcement Learning from Human Feedback with Active Queriesby Kaixuan Ji, Jiafan He, Quanquan GuFirst submitted…
MaxMin-RLHF: Alignment with Diverse Human Preferencesby Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong…
Online Iterative Reinforcement Learning from Human Feedback with General Preference Modelby Chenlu Ye, Wei Xiong,…
ODIN: Disentangled Reward Mitigates Hacking in RLHFby Lichang Chen, Chen Zhu, Davit Soselia, Jiuhai Chen,…
How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?by Ryan Liu, Theodore R.…