Loading Now

Summary of Model-based Preference Optimization in Abstractive Summarization Without Human Feedback, by Jaepill Choi et al.


Model-based Preference Optimization in Abstractive Summarization without Human Feedback

by Jaepill Choi, Kyubyung Chae, Jiwoo Song, Yohan Jo, Taesup Kim

First submitted to arxiv on: 27 Sep 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper tackles the challenge of generating accurate and concise abstractive summaries, a task where Large Language Models (LLMs) often introduce inaccuracies by hallucinating content not present in the original source. To improve summarization abilities, researchers have employed supervised fine-tuning methods that maximize likelihood, but these approaches do not consistently enhance faithfulness. The study introduces Model-based Preference Optimization (MPO), a novel approach that fine-tunes LLMs without relying on human feedback. MPO leverages the model’s inherent summarization capabilities to create a preference dataset generated through different decoding strategies. Experiments on standard summarization datasets and metrics demonstrate that MPO significantly enhances summary quality.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making machines better at creating summaries of texts. Right now, these machines can write summaries but often get things wrong by adding information that’s not in the original text. To make them better, researchers have been using a special type of training that helps machines understand what makes a good summary. However, this approach still relies on human feedback, which can be time-consuming and expensive. The study proposes a new way to train machines called Model-based Preference Optimization (MPO) that doesn’t require human feedback. MPO uses the machine’s own capabilities to create a preference dataset, which helps the machine learn what makes a good summary. The results show that MPO improves the quality of summaries without needing human help.

Keywords

» Artificial intelligence  » Fine tuning  » Likelihood  » Optimization  » Summarization  » Supervised