Summary of Debias Your Large Multi-modal Model at Test-time Via Non-contrastive Visual Attribute Steering, by Neale Ratzlaff et al.
Debias your Large Multi-Modal Model at Test-Time via Non-Contrastive Visual Attribute Steering
by Neale Ratzlaff, Matthew Lyle Olson, Musashi Hinck, Estelle Aflalo, Shao-Yen Tseng, Vasudev Lal, Phillip Howard
First submitted to arxiv on: 15 Nov 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel training-free debiasing framework for Large Multi-Modal Models (LMMs) is proposed to mitigate societal biases in chatbot responses. LMMs are trained to engage in conversations about visual inputs, but their responses reflect biases present in the training datasets, leading to differences in how they respond to images of people from different demographics. The framework intervenes on the model’s representations during text generation by constructing a steering vector that reduces reference to protected attributes. Two complementary methods are introduced: a dataset-based approach and an optimization-based approach for low-resource settings. Experimental results show that these interventions effectively reduce bias while maintaining sentiment, fluency, and accuracy on downstream tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large Multi-Modal Models (LMMs) can have conversations about pictures, but they sometimes say things that are unfair because of how they were trained. This makes them treat people from different groups differently. Researchers came up with a way to fix this problem without needing any more training data. They made a special kind of “steering” vector that helps the model stop saying biased things. There are two ways to do this: one uses existing data and the other is better for when you don’t have much data. The results show that this helps reduce bias and makes the models work just as well on other tasks. |
Keywords
* Artificial intelligence * Multi modal * Optimization * Text generation