Loading Now

Summary of Aligning Large Language Models by On-policy Self-judgment, By Sangkyu Lee et al.


Aligning Large Language Models by On-Policy Self-Judgment

by Sangkyu Lee, Sungdong Kim, Ashkan Yousefpour, Minjoon Seo, Kang Min Yoo, Youngjae Yu

First submitted to arxiv on: 17 Feb 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper presents a novel alignment framework called SELF-JUDGE that enables on-policy learning without requiring a separate reward model (RM). The framework uses Judge-augmented Supervised Fine-Tuning (JSFT) to train a single model that acts as both policy and judge. By viewing the pairwise judgment task as a special case of instruction-following, the model can evaluate preferences for on-the-fly responses generated by its current policy. Experimental results show that SELF-JUDGE outperforms baselines in preference benchmarks.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about creating a new way to teach machines to make good choices. Right now, we need two different models: one to decide what actions to take and another to judge whether those actions are good or bad. The authors of this paper came up with a new idea that lets us use just one model for both tasks. They called it SELF-JUDGE. It works by treating the task of choosing between two responses as a special kind of instruction-following problem. This means the model can learn to make good choices without needing an extra judge. The results show that this approach is better than other methods.

Keywords

* Artificial intelligence  * Alignment  * Fine tuning  * Supervised