Summary of A Framework For Fine-tuning Llms Using Heterogeneous Feedback, by Ryan Aponte (1) et al.
A Framework for Fine-Tuning LLMs using Heterogeneous Feedbackby Ryan Aponte, Ryan A. Rossi, Shunan Guo,…
A Framework for Fine-Tuning LLMs using Heterogeneous Feedbackby Ryan Aponte, Ryan A. Rossi, Shunan Guo,…
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Driftby Seongho Son, William Bankes,…
Exploring and Addressing Reward Confusion in Offline Preference Learningby Xin Chen, Sam Toyer, Florian ShkurtiFirst…
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecificationby Thomas Kwa,…
BOND: Aligning LLMs with Best-of-N Distillationby Pier Giuseppe Sessa, Robert Dadashi, LĂ©onard Hussenot, Johan Ferret,…
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimizationby Audrey Huang,…
Does Refusal Training in LLMs Generalize to the Past Tense?by Maksym Andriushchenko, Nicolas FlammarionFirst submitted…
Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigationby Yi-Chen Li, Fuxiang Zhang, Wenjie…
Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learningby Yifang Chen, Shuohang Wang, Ziyi…
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learningby Yuheng Zhang, Dian…