Loading Now

Summary of Defending Lvlms Against Vision Attacks Through Partial-perception Supervision, by Qi Zhou et al.


Defending LVLMs Against Vision Attacks through Partial-Perception Supervision

by Qi Zhou, Tianlin Li, Qing Guo, Dongxia Wang, Yun Lin, Yang Liu, Jin Song Dong

First submitted to arxiv on: 17 Dec 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper explores the vulnerability of Large Vision Language Models (LVLMs) to maliciously injected or perturbed input images, which can mislead their responses. Existing defense methods show that such vision attacks are sensitive to image modifications, particularly cropping. However, these modifications often result in partial images and distort semantics, reducing response quality on clean images after voting. The authors propose a black-box, training-free method called DPS (Defense through Partial-Perception Supervision), which uses the responses generated by a model that perceives only a partial image to supervise the LVLM’s responses to the original images. This approach enables the model to adjust its response based on partial image understanding when under attack, while confidently maintaining its original response for clean input.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper talks about making computer models better at resisting fake or tricky pictures that try to trick them into giving wrong answers. Right now, these models can be fooled by people who add special effects to the pictures, like cropping out parts of the image. The authors came up with a new way to help the models not fall for these tricks. They used another model that only sees part of the picture and told the main model what it thinks the whole picture looks like. This helped the main model make better decisions when it was shown fake pictures, while still giving good answers when shown real pictures.

Keywords

» Artificial intelligence  » Semantics