Loading Now

Summary of Negative Token Merging: Image-based Adversarial Feature Guidance, by Jaskirat Singh et al.


Negative Token Merging: Image-based Adversarial Feature Guidance

by Jaskirat Singh, Lindsey Li, Weijia Shi, Ranjay Krishna, Yejin Choi, Pang Wei Koh, Michael F. Cohen, Stephen Gould, Liang Zheng, Luke Zettlemoyer

First submitted to arxiv on: 2 Dec 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper introduces NegToMe, a training-free approach that performs adversarial guidance using visual features from reference images or other images in a batch. This method is used to steer diffusion models away from producing undesired concepts, such as copyrighted characters. The authors show that NegToMe enables diverse applications, including enhancing output diversity by guiding features of each image away from others, and reducing visual similarity to copyrighted content by 34.57%. The approach is simple to implement, uses only marginally higher inference time (<4%), and is compatible with different diffusion architectures. This paper expands the scope of adversarial guidance in text-based models to include visual features.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper explores a new way to guide computer vision models using images instead of text. It introduces NegToMe, a simple method that helps models avoid producing unwanted results by changing how they look at reference images. The authors show that this approach can be used for different tasks, such as making sure outputs don’t resemble copyrighted content or reducing diversity in generated images. The good news is that this method is easy to use and doesn’t require much extra time or resources.

Keywords

* Artificial intelligence  * Diffusion  * Inference