Loading Now

Summary of Hero: Human-feedback Efficient Reinforcement Learning For Online Diffusion Model Finetuning, by Ayano Hiranaka et al.


HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning

by Ayano Hiranaka, Shang-Fu Chen, Chieh-Hsin Lai, Dongjun Kim, Naoki Murata, Takashi Shibuya, Wei-Hsiang Liao, Shao-Hua Sun, Yuki Mitsufuji

First submitted to arxiv on: 7 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper presents a framework called HERO, which enables controllable generation through fine-tuning of Stable Diffusion (SD) models using online human feedback. The authors develop two key mechanisms: Feedback-Aligned Representation Learning and Feedback-Guided Image Generation. These mechanisms allow for efficient and effective utilization of human feedback, enabling the SD model to refine its initialization samples and converge towards the evaluator’s intent more quickly. The paper demonstrates that HERO is 4x more efficient in online feedback for body part anomaly correction compared to existing methods and can handle tasks such as reasoning, counting, personalization, and reducing NSFW content with only 0.5K online feedback.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper introduces a new way to improve Stable Diffusion (SD) models by using human feedback during training. The HERO framework helps SD learn from people’s guidance in real-time, making it more accurate and efficient. This means that SD can generate better images and text based on what people want. The authors tested HERO on different tasks, like correcting mistakes or creating personalized content, and found that it worked much faster than previous methods.

Keywords

» Artificial intelligence  » Diffusion  » Fine tuning  » Image generation  » Representation learning