Summary of Provably Mitigating Overoptimization in Rlhf: Your Sft Loss Is Implicitly An Adversarial Regularizer, by Zhihan Liu et al.
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizerby Zhihan Liu,…