Summary of Prompt Optimization with Human Feedback, by Xiaoqiang Lin et al.
Prompt Optimization with Human Feedback
by Xiaoqiang Lin, Zhongxiang Dai, Arun Verma, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low
First submitted to arxiv on: 27 May 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary In this paper, researchers investigate the problem of optimizing prompts for large language models (LLMs) using human feedback. Unlike previous works that rely on numeric scores to assess prompt quality, the authors focus on preference feedback from humans, where users are shown response pairs and asked which one is preferred. The authors design a strategy inspired by dueling bandits to select prompt pairs for querying preference feedback in each iteration. They introduce an algorithm called Automated Prompt Optimization with Human Feedback (APOHF) and apply it to various tasks, including optimizing user instructions, text-to-image generation, and response refinement. The results show that APOHF can efficiently find a good prompt using a small number of preference feedback instances. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper explores how to optimize prompts for large language models using human feedback. Researchers usually rely on numeric scores to evaluate prompts, but this method is not reliable when interacting with black-box LLMs. Instead, humans can provide preference feedback by comparing response pairs. The authors create an algorithm called APOHF that uses dueling bandits to select prompt pairs and optimize the prompt. They test APOHF in different tasks like optimizing user instructions and text-to-image generation. The results show that APOHF works well with minimal human feedback. |
Keywords
» Artificial intelligence » Image generation » Optimization » Prompt