Summary of Interpretable Preferences Via Multi-objective Reward Modeling and Mixture-of-experts, by Haoxiang Wang et al.

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

by Haoxiang Wang, Wei Xiong, Tengyang Xie, Han Zhao, Tong Zhang

First submitted to arxiv on: 18 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed two-stage approach in this paper addresses the limitations of conventional reinforcement learning from human feedback (RLHF) models. Specifically, the Absolute-Rating Multi-Objective Reward Model (ArmoRM) is trained on multi-dimensional absolute-rating data to provide interpretable preferences. This is achieved by employing a Mixture-of-Experts (MoE) strategy with a gating network that selects suitable reward objectives based on context. The trained ArmoRM-Llama3-8B model outperforms state-of-the-art models, including the LLM-as-a-judge method with GPT-4 judges, and approaches the performance of larger Nemotron-4 340B reward models.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper helps create better language models by making them understand why certain responses are good or not. It proposes a new way to train these models so they can make decisions that align with human preferences. The approach uses absolute ratings instead of just comparing two options, and lets the model choose which preference to use based on the situation.

Keywords

» Artificial intelligence » Gpt » Mixture of experts » Reinforcement learning from human feedback » Rlhf

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

by Haoxiang Wang, Wei Xiong, Tengyang Xie, Han Zhao, Tong Zhang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Tsi-bench: Benchmarking Time Series Imputation, by Wenjie Du et al.

Summary of Leveraging Pedagogical Theories to Understand Student Learning Process with Graph-based Reasonable Knowledge Tracing, by Jiajun Cui et al.

Related Posts