Summary of Self-play Preference Optimization For Language Model Alignment, by Yue Wu and Zhiqing Sun and Huizhuo Yuan and Kaixuan Ji and Yiming Yang and Quanquan Gu
Self-Play Preference Optimization for Language Model Alignmentby Yue Wu, Zhiqing Sun, Huizhuo Yuan, Kaixuan Ji,…