Summary of How to Leverage Demonstration Data in Alignment For Large Language Model? a Self-imitation Learning Perspective, by Teng Xiao et al.
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspectiveby…