Summary of Deal: Decoding-time Alignment For Large Language Models, by James Y. Huang et al.
DeAL: Decoding-time Alignment for Large Language Modelsby James Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-an…
DeAL: Decoding-time Alignment for Large Language Modelsby James Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-an…
Direct Language Model Alignment from Online AI Feedbackby Shangmin Guo, Biao Zhang, Tianlin Liu, Tianqi…
Investigating Bias Representations in Llama 2 Chat via Activation Steeringby Dawn Lu, Nina RimskyFirst submitted…
Transforming and Combining Rewards for Aligning Large Language Modelsby Zihao Wang, Chirag Nagpal, Jonathan Berant,…
The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contextsby Lingfeng Shen, Weiting Tan,…
Reinforcement learning for question answering in programming domain using public community scoring as a human…
Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedbackby Seong Jin Lee, Will Wei Sun, Yufeng…
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHFby Flint Xiaofeng Fan, Cheston Tan,…
MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samplesby Shuo Xie, Fangzhi Zhu,…
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Modelby Yuzhong Hong, Hanshan…