Summary of Deal: Decoding-time Alignment For Large Language Models, by James Y. Huang et al.
DeAL: Decoding-time Alignment for Large Language Modelsby James Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-an…
DeAL: Decoding-time Alignment for Large Language Modelsby James Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-an…
Direct Language Model Alignment from Online AI Feedbackby Shangmin Guo, Biao Zhang, Tianlin Liu, Tianqi…
Transforming and Combining Rewards for Aligning Large Language Modelsby Zihao Wang, Chirag Nagpal, Jonathan Berant,…
Investigating Bias Representations in Llama 2 Chat via Activation Steeringby Dawn Lu, Nina RimskyFirst submitted…
The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contextsby Lingfeng Shen, Weiting Tan,…
Reinforcement learning for question answering in programming domain using public community scoring as a human…
Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedbackby Seong Jin Lee, Will Wei Sun, Yufeng…
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHFby Flint Xiaofeng Fan, Cheston Tan,…
MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samplesby Shuo Xie, Fangzhi Zhu,…
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Modelby Yuzhong Hong, Hanshan…