Summary of Reward Generalization in Rlhf: a Topological Perspective, by Tianyi Qiu et al.
Reward Generalization in RLHF: A Topological Perspective
by Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang
First submitted to arxiv on: 15 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Discrete Mathematics (cs.DM)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A new theoretical framework is introduced for investigating reward generalization in reinforcement learning from human feedback (RLHF). The framework focuses on the information flow topology at both macro and micro levels. At the macro level, RLHF is formalized as an autoencoding process over behavior distributions to ensure distributional consistency between human preference and model behavior. At the micro level, induced Bayesian networks are presented as a theory of reward generalization in RLHF, incorporating fine-grained dataset topologies into generalization bounds. By combining analysis on both levels, a reward modeling approach from tree-structured preference information is proposed, reducing reward uncertainty by up to (n/n) times compared to baselines. This approach is validated on three NLP tasks, achieving an average win rate of 65% against baseline methods. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A new way to understand how computers learn from people’s preferences is being developed. The current method for training AI models using human feedback has a common structure that hasn’t been fully explored or understood. This paper introduces a new framework that looks at the flow of information between humans and AI systems at different levels. By doing this, it can help improve the way AI models learn from people’s preferences, making them more reliable and efficient. |
Keywords
* Artificial intelligence * Generalization * Nlp * Reinforcement learning from human feedback * Rlhf