Summary of Interpretable Preferences Via Multi-objective Reward Modeling and Mixture-of-experts, by Haoxiang Wang et al.
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts
by Haoxiang Wang, Wei Xiong, Tengyang Xie, Han Zhao, Tong Zhang
First submitted to arxiv on: 18 Jun 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed two-stage approach in this paper addresses the limitations of conventional reinforcement learning from human feedback (RLHF) models. Specifically, the Absolute-Rating Multi-Objective Reward Model (ArmoRM) is trained on multi-dimensional absolute-rating data to provide interpretable preferences. This is achieved by employing a Mixture-of-Experts (MoE) strategy with a gating network that selects suitable reward objectives based on context. The trained ArmoRM-Llama3-8B model outperforms state-of-the-art models, including the LLM-as-a-judge method with GPT-4 judges, and approaches the performance of larger Nemotron-4 340B reward models. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps create better language models by making them understand why certain responses are good or not. It proposes a new way to train these models so they can make decisions that align with human preferences. The approach uses absolute ratings instead of just comparing two options, and lets the model choose which preference to use based on the situation. |
Keywords
» Artificial intelligence » Gpt » Mixture of experts » Reinforcement learning from human feedback » Rlhf