Summary of Averaging Log-likelihoods in Direct Alignment, by Nathan Grinsztajn et al.
Averaging log-likelihoods in direct alignment
by Nathan Grinsztajn, Yannis Flet-Berliac, Mohammad Gheshlaghi Azar, Florian Strub, Bill Wu, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Olivier Pietquin, Matthieu Geist
First submitted to arxiv on: 27 Jun 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A new approach to training Large Language Models (LLMs) is proposed to better align them with human judgment. The method, called Reinforcement Learning from Human Feedback (RLHF), learns a reward model and optimizes it using regularized RL. To improve this process, direct alignment methods were introduced that learn a fine-tuned model directly from a preference dataset without computing a proxy reward function. However, these methods have limitations when dealing with completions of varying lengths. The proposed approach introduces a principled method to make direct alignment length-invariant by averaging the log-likelihood within the loss. This is achieved through a new averaging operator that translates into averaging token-wise. The effectiveness of this approach is empirically studied, showing a trade-off between the length of generations and their scores. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large Language Models (LLMs) are getting better at understanding human language, but they still don’t quite get it right. To fix this, scientists have developed ways to train these models using feedback from humans. A new approach called direct alignment is a way to learn how to make the model more accurate without needing lots of data. The problem with this method is that it doesn’t work well when the things being compared are different lengths. The solution is to take an average of the things being compared, which helps the model understand longer and shorter texts better. This new approach has been tested and shows that there’s a trade-off between how long the text is and how good it is. |
Keywords
» Artificial intelligence » Alignment » Log likelihood » Reinforcement learning from human feedback » Rlhf » Token