Loading Now

Summary of Interpretable Preferences Via Multi-objective Reward Modeling and Mixture-of-experts, by Haoxiang Wang et al.


Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

by Haoxiang Wang, Wei Xiong, Tengyang Xie, Han Zhao, Tong Zhang

First submitted to arxiv on: 18 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed two-stage approach in this paper addresses the limitations of conventional reinforcement learning from human feedback (RLHF) models. Specifically, the Absolute-Rating Multi-Objective Reward Model (ArmoRM) is trained on multi-dimensional absolute-rating data to provide interpretable preferences. This is achieved by employing a Mixture-of-Experts (MoE) strategy with a gating network that selects suitable reward objectives based on context. The trained ArmoRM-Llama3-8B model outperforms state-of-the-art models, including the LLM-as-a-judge method with GPT-4 judges, and approaches the performance of larger Nemotron-4 340B reward models.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps create better language models by making them understand why certain responses are good or not. It proposes a new way to train these models so they can make decisions that align with human preferences. The approach uses absolute ratings instead of just comparing two options, and lets the model choose which preference to use based on the situation.

Keywords

» Artificial intelligence  » Gpt  » Mixture of experts  » Reinforcement learning from human feedback  » Rlhf