Loading Now

Summary of Preserve or Modify? Context-aware Evaluation For Balancing Preservation and Modification in Text-guided Image Editing, by Yoonjeon Kim et al.


Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing

by Yoonjeon Kim, Soohyun Ryu, Yeonsung Jung, Hyunkoo Lee, Joowon Kim, June Yong Yang, Jaeryong Hwang, Eunho Yang

First submitted to arxiv on: 15 Oct 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed AugCLIP metric, a context-aware evaluation tool for text-guided image editing, addresses the context-blindness problem in existing metrics. Traditional approaches apply the same evaluation criteria to diverse image and text pairs, favoring either modification or preservation. AugCLIP adapts its assessment based on the specific source image and target text by deriving an ideally edited image representation that preserves the original while aligning with the textual description. This is achieved through a multi-modal large language model, which generates modification vectors by separating source and target attributes in CLIP space. The proposed metric outperforms existing methods on five benchmark datasets, demonstrating remarkable alignment with human evaluation standards.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine you’re editing a picture based on some text. You want to keep the important parts of the original image while making changes according to the text. Right now, there are problems with how we measure how well an edited image does this. Some methods work better for certain types of edits than others. A new approach called AugCLIP tries to fix this by taking into account both the source image and the target text when evaluating an edit. It works by using a special language model that generates information about what should be changed in the image based on the text. This helps AugCLIP make more accurate judgments about how well an edited image preserves the original while making desired changes.

Keywords

» Artificial intelligence  » Alignment  » Language model  » Large language model  » Multi modal