Summary of Hero: Human-feedback Efficient Reinforcement Learning For Online Diffusion Model Finetuning, by Ayano Hiranaka et al.
HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning
by Ayano Hiranaka, Shang-Fu Chen, Chieh-Hsin Lai, Dongjun Kim, Naoki Murata, Takashi Shibuya, Wei-Hsiang Liao, Shao-Hua Sun, Yuki Mitsufuji
First submitted to arxiv on: 7 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper presents a framework called HERO, which enables controllable generation through fine-tuning of Stable Diffusion (SD) models using online human feedback. The authors develop two key mechanisms: Feedback-Aligned Representation Learning and Feedback-Guided Image Generation. These mechanisms allow for efficient and effective utilization of human feedback, enabling the SD model to refine its initialization samples and converge towards the evaluator’s intent more quickly. The paper demonstrates that HERO is 4x more efficient in online feedback for body part anomaly correction compared to existing methods and can handle tasks such as reasoning, counting, personalization, and reducing NSFW content with only 0.5K online feedback. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper introduces a new way to improve Stable Diffusion (SD) models by using human feedback during training. The HERO framework helps SD learn from people’s guidance in real-time, making it more accurate and efficient. This means that SD can generate better images and text based on what people want. The authors tested HERO on different tasks, like correcting mistakes or creating personalized content, and found that it worked much faster than previous methods. |
Keywords
» Artificial intelligence » Diffusion » Fine tuning » Image generation » Representation learning