Loading Now

Summary of Online Learning From Strategic Human Feedback in Llm Fine-tuning, by Shugang Hao and Lingjie Duan


Online Learning from Strategic Human Feedback in LLM Fine-Tuning

by Shugang Hao, Lingjie Duan

First submitted to arxiv on: 22 Dec 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Computer Science and Game Theory (cs.GT)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed work focuses on addressing a critical issue in fine-tuning large language models (LLMs) using reinforcement learning from human feedback (RLHF). The problem arises due to strategic misreporting by human labelers, who aim to influence the system’s aggregation towards their own preferences. Current methods average labelers’ feedback per time slot and fail to identify accurate human labelers. This leads to a linear regret of O(T) for T time slots. In this study, we develop an online learning mechanism that adjusts human labeler weights in preference aggregation, ensuring truthful feedback and achieving sublinear regret of O(T^(1/2)). Simulation results highlight the effectiveness of our approach compared to existing benchmark schemes.
Low GrooveSquid.com (original content) Low Difficulty Summary
RLHF is a way to make large language models better by using feedback from humans. But people can be strategic when giving feedback online. They might give fake information to get what they want. Current methods just average all the feedback and don’t find the most accurate person. This makes things worse over time, with regret increasing like O(T). The researchers are trying to solve this problem by creating a new system that adjusts how much each person’s opinion counts based on their truthfulness. This helps get more accurate results and reduces regret to O(T^(1/2)). The simulation shows our method is better than the current one.

Keywords

» Artificial intelligence  » Fine tuning  » Online learning  » Reinforcement learning from human feedback  » Rlhf