Summary of Orpo: Monolithic Preference Optimization Without Reference Model, by Jiwoo Hong et al.
ORPO: Monolithic Preference Optimization without Reference Model
by Jiwoo Hong, Noah Lee, James Thorne
First submitted to arxiv on: 12 Mar 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the role of supervised fine-tuning (SFT) in preference alignment algorithms for language models. The authors find that a minor penalty for disfavored generation styles is sufficient to achieve successful convergence during SFT. They introduce an innovative reference model-free monolithic odds ratio preference optimization algorithm, ORPO, which eliminates the need for an additional preference alignment phase. Experimental results demonstrate the effectiveness of ORPO, surpassing state-of-the-art language models with larger parameters on tasks such as text evaluation and machine translation. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research explores how to make computer programs that create language understand what people like or dislike. The authors found a way to make these programs better by giving them a small penalty when they don’t do what people want. They also created a new method called ORPO, which helps the program learn from its mistakes without needing extra help. This new method works well and is better than some other methods that use more computer power. |
Keywords
* Artificial intelligence * Alignment * Fine tuning * Optimization * Supervised * Translation