Summary of Bi-factorial Preference Optimization: Balancing Safety-helpfulness in Language Models, by Wenxuan Zhang et al.
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Modelsby Wenxuan Zhang, Philip H.S. Torr, Mohamed Elhoseiny,…