Summary of Fairness in Reinforcement Learning: a Survey, by Anka Reuel et al.
Fairness in Reinforcement Learning: A Surveyby Anka Reuel, Devin MaFirst submitted to arxiv on: 11…
Fairness in Reinforcement Learning: A Surveyby Anka Reuel, Devin MaFirst submitted to arxiv on: 11…
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignmentby Gerald Shen, Zhilin Wang, Olivier Delalleau, Jiaqi Zeng,…
MetaRM: Shifted Distributions Alignment via Meta-Learningby Shihan Dou, Yan Liu, Enyu Zhou, Tianlong Li, Haoxiang…
DPO Meets PPO: Reinforced Token Optimization for RLHFby Han Zhong, Zikang Shan, Guhao Feng, Wei…
Weak-to-Strong Extrapolation Expedites Alignmentby Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun PengFirst submitted…
Filtered Direct Preference Optimizationby Tetsuro Morimura, Mitsuki Sakamoto, Yuu Jinnai, Kenshi Abe, Kaito AriuFirst submitted…
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Databy Chandeepa Dissanayake, Lahiru…
Dataset Reset Policy Optimization for RLHFby Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley,…
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMsby Shreyas Chaudhari,…
Investigating Regularization of Self-Play Language Modelsby Reda Alami, Abdalgader Abubaker, Mastane Achab, Mohamed El Amine…