Loading Now

Summary of Averaging Log-likelihoods in Direct Alignment, by Nathan Grinsztajn et al.


Averaging log-likelihoods in direct alignment

by Nathan Grinsztajn, Yannis Flet-Berliac, Mohammad Gheshlaghi Azar, Florian Strub, Bill Wu, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Olivier Pietquin, Matthieu Geist

First submitted to arxiv on: 27 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A new approach to training Large Language Models (LLMs) is proposed to better align them with human judgment. The method, called Reinforcement Learning from Human Feedback (RLHF), learns a reward model and optimizes it using regularized RL. To improve this process, direct alignment methods were introduced that learn a fine-tuned model directly from a preference dataset without computing a proxy reward function. However, these methods have limitations when dealing with completions of varying lengths. The proposed approach introduces a principled method to make direct alignment length-invariant by averaging the log-likelihood within the loss. This is achieved through a new averaging operator that translates into averaging token-wise. The effectiveness of this approach is empirically studied, showing a trade-off between the length of generations and their scores.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large Language Models (LLMs) are getting better at understanding human language, but they still don’t quite get it right. To fix this, scientists have developed ways to train these models using feedback from humans. A new approach called direct alignment is a way to learn how to make the model more accurate without needing lots of data. The problem with this method is that it doesn’t work well when the things being compared are different lengths. The solution is to take an average of the things being compared, which helps the model understand longer and shorter texts better. This new approach has been tested and shows that there’s a trade-off between how long the text is and how good it is.

Keywords

» Artificial intelligence  » Alignment  » Log likelihood  » Reinforcement learning from human feedback  » Rlhf  » Token