RLHF – Page 13 – GrooveSquid.com

Loading Now

July 13, 2025

Summary of A Framework For Fine-tuning Llms Using Heterogeneous Feedback, by Ryan Aponte (1) et al.

A Framework for Fine-Tuning LLMs using Heterogeneous Feedbackby Ryan Aponte, Ryan A. Rossi, Shunan Guo,…

July 13, 2025

Summary of Right Now, Wrong Then: Non-stationary Direct Preference Optimization Under Preference Drift, by Seongho Son et al.

Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Driftby Seongho Son, William Bankes,…

July 13, 2025

Summary of Exploring and Addressing Reward Confusion in Offline Preference Learning, by Xin Chen et al.

Exploring and Addressing Reward Confusion in Offline Preference Learningby Xin Chen, Sam Toyer, Florian ShkurtiFirst…

July 13, 2025

Summary of Catastrophic Goodhart: Regularizing Rlhf with Kl Divergence Does Not Mitigate Heavy-tailed Reward Misspecification, by Thomas Kwa et al.

Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecificationby Thomas Kwa,…

July 13, 2025

Summary of Bond: Aligning Llms with Best-of-n Distillation, by Pier Giuseppe Sessa et al.

BOND: Aligning LLMs with Best-of-N Distillationby Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret,…

July 13, 2025

Summary of Correcting the Mythos Of Kl-regularization: Direct Alignment Without Overoptimization Via Chi-squared Preference Optimization, by Audrey Huang et al.

Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimizationby Audrey Huang,…

July 13, 2025

Summary of Does Refusal Training in Llms Generalize to the Past Tense?, by Maksym Andriushchenko and Nicolas Flammarion

Does Refusal Training in LLMs Generalize to the Past Tense?by Maksym Andriushchenko, Nicolas FlammarionFirst submitted…

July 13, 2025

Summary of Q-adapter: Customizing Pre-trained Llms to New Preferences with Forgetting Mitigation, by Yi-chen Li et al.

Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigationby Yi-Chen Li, Fuxiang Zhang, Wenjie…

July 13, 2025

Summary of Cost-effective Proxy Reward Model Construction with On-policy and Active Learning, by Yifang Chen et al.

Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learningby Yifang Chen, Shuohang Wang, Ziyi…

July 13, 2025

Summary of Iterative Nash Policy Optimization: Aligning Llms with General Preferences Via No-regret Learning, by Yuheng Zhang et al.

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learningby Yuheng Zhang, Dian…