Loading Now

Summary of Visual Modality Prompt For Adapting Vision-language Object Detectors, by Heitor R. Medeiros et al.


Visual Modality Prompt for Adapting Vision-Language Object Detectors

by Heitor R. Medeiros, Atif Belal, Srikanth Muralidharan, Eric Granger, Marco Pedersoli

First submitted to arxiv on: 1 Dec 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes ModPrompt, a novel visual prompt strategy to adapt vision-language detectors for zero-shot performance on different modalities such as infrared and depth. Unlike existing methods that rely on image translation or fine-tuning, ModPrompt leverages an encoder-decoder architecture with inference-friendly modality prompts to decouple the residual, enabling robust adaptation without compromising zero-shot capabilities. The proposed approach is demonstrated on two vision-language detectors, YOLO-World and Grounding DINO, achieving comparable performance to full fine-tuning while preserving zero-shot capability.
Low GrooveSquid.com (original content) Low Difficulty Summary
ModPrompt helps object detectors work better on different types of images, like infrared or depth pictures. This is important because current methods either change the image into something else or make the detector learn everything again from scratch. ModPrompt is a new way to help detectors understand different types of images without losing their ability to recognize things without training.

Keywords

» Artificial intelligence  » Encoder decoder  » Fine tuning  » Grounding  » Inference  » Prompt  » Translation  » Yolo  » Zero shot