Loading Now

Summary of Bendvlm: Test-time Debiasing Of Vision-language Embeddings, by Walter Gerych et al.


BendVLM: Test-Time Debiasing of Vision-Language Embeddings

by Walter Gerych, Haoran Zhang, Kimia Hamidieh, Eileen Pan, Maanas Sharma, Thomas Hartvigsen, Marzyeh Ghassemi

First submitted to arxiv on: 7 Nov 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel approach to debiasing vision-language model (VLM) embeddings is proposed in this work. VLMs have been shown to encode societal biases present in their training data, which can lead to negative characteristics being assigned to certain racial or gender identities. As VLMs are increasingly used for various tasks such as few-shot classification and text-guided image generation, it is crucial to debias these embeddings. Fine-tuning-based methods often suffer from catastrophic forgetting, while fine-tuning-free approaches typically utilize a “one-size-fits-all” approach that assumes correlation with the spurious attribute can be explained using a single linear direction across all possible inputs. The proposed method, Bend-VLM, is a nonlinear and fine-tuning-free approach that tailors the debiasing operation to each unique input, allowing for a more flexible debiasing approach. This method does not require knowledge of the set of inputs a priori to inference time, making it suitable for online and open-set tasks such as retrieval and text-guided image generation.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about finding a way to remove biases from computer models that understand both images and text. These models are being used for lots of things, but they can have problems like thinking negatively about certain groups of people just because of the way they were trained. To fix this, some methods try to adjust the model after it’s already been trained, but that can make the model forget what it learned before. Other methods assume that there is only one way to explain why a model might be making a mistake, which isn’t always true. This paper proposes a new method called Bend-VLM that doesn’t require any special adjustments and can work with different types of input. This makes it useful for things like searching for images and generating new images based on text.

Keywords

» Artificial intelligence  » Classification  » Few shot  » Fine tuning  » Image generation  » Inference  » Language model