Loading Now

Summary of Ask, Attend, Attack: a Effective Decision-based Black-box Targeted Attack For Image-to-text Models, by Qingyuan Zeng and Zhenzhong Wang and Yiu-ming Cheung and Min Jiang


Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models

by Qingyuan Zeng, Zhenzhong Wang, Yiu-ming Cheung, Min Jiang

First submitted to arxiv on: 16 Aug 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper focuses on advancing adversarial attacks against image-to-text models, specifically decision-based black-box targeted attacks where attackers only have access to the final output text. The authors propose a three-stage process called AAA (Ask, Attend, Attack) to efficiently solve the optimization problem. Ask guides attackers to create target texts, Attend identifies crucial regions of the image for attacking, and Attack uses an evolutionary algorithm to attack these regions, achieving targeted attacks without semantic loss. Experimental results on transformer-based and CNN+RNN-based models confirm the effectiveness of AAA.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making computers foolproof when they translate images into text. Right now, these systems are easy to trick with fake information. The researchers want to make it harder for hackers to manipulate these systems. They developed a new way to do this by breaking down the problem into three parts: asking what kind of text the hacker wants, finding important parts of the image, and then attacking those parts to make the system say the wrong thing. This helps keep the system honest and ensures that it translates images accurately.

Keywords

» Artificial intelligence  » Cnn  » Optimization  » Rnn  » Transformer