Summary of Sera: Self-reviewing and Alignment Of Large Language Models Using Implicit Reward Margins, by Jongwoo Ko et al.
SeRA: Self-Reviewing and Alignment of Large Language Models using Implicit Reward Marginsby Jongwoo Ko, Saket…
SeRA: Self-Reviewing and Alignment of Large Language Models using Implicit Reward Marginsby Jongwoo Ko, Saket…
Accelerated Preference Optimization for Large Language Model Alignmentby Jiafan He, Huizhuo Yuan, Quanquan GuFirst submitted…
Reward Learning From Preference With Tiesby Jinsong Liu, Dongdong Ge, Ruihao ZhuFirst submitted to arxiv…
SePPO: Semi-Policy Preference Optimization for Diffusion Alignmentby Daoan Zhang, Guangchen Lan, Dong-Jun Han, Wenlin Yao,…
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignmentby Yifan Zhang, Ge Zhang,…
Evaluating Robustness of Reward Models for Mathematical Reasoningby Sunghwan Kim, Dongjin Kang, Taeyoon Kwon, Hyungjoo…
The Perfect Blend: Redefining RLHF with Mixture of Judgesby Tengyu Xu, Eryk Helenowski, Karthik Abinav…
Calibrating Language Models with Adaptive Temperature Scalingby Johnathan Xie, Annie S. Chen, Yoonho Lee, Eric…
HybridFlow: A Flexible and Efficient RLHF Frameworkby Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu,…
VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedbackby Guoxi Zhang, Jiuding DuanFirst submitted…