Summary of Uncovering Factor Level Preferences to Improve Human-model Alignment, by Juhyun Oh et al.
Uncovering Factor Level Preferences to Improve Human-Model Alignment
by Juhyun Oh, Eunsu Kim, Jiseon Kim, Wenda Xu, Inha Cha, William Yang Wang, Alice Oh
First submitted to arxiv on: 9 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper introduces PROFILE, a novel framework that uncovers and quantifies the influence of specific factors driving Large Language Model (LLM) preferences. By analyzing these factors at a granular level, PROFILE explains why LLMs often exhibit biases or tendencies diverging from human preferences. The authors apply PROFILE to three tasks: summarization, helpful response generation, and document-based question-answering. Their analysis reveals significant discrepancies between human and LLM preferences in generation tasks, but strong alignment in evaluation tasks. The work highlights the importance of explainable preference analysis and demonstrates how leveraging factor level insights can improve alignment with human preferences. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper is about understanding why large language models make certain choices or have specific preferences. These models often make decisions that are different from what humans would prefer, like writing in a style that’s too fancy. Right now, it’s hard to understand why this happens because we’re not using good methods to compare human and model preferences. The authors introduce a new way called PROFILE that helps us see which specific factors are driving these differences. They use PROFILE on three different tasks and find out that models tend to be more different from humans when they’re generating text, but are more similar when evaluating it. This matters because understanding what’s going wrong can help improve the models so they make better choices. |
Keywords
» Artificial intelligence » Alignment » Large language model » Question answering » Summarization