Loading Now

Summary of They’re All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias, by Salma Abdel Magid et al.


They’re All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias

by Salma Abdel Magid, Jui-Hsien Wang, Kushal Kafle, Hanspeter Pfister

First submitted to arxiv on: 17 Jun 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Information Retrieval (cs.IR); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed framework generates synthetic counterfactual images to create a diverse and balanced dataset for fine-tuning Vision Language Models (VLMs) like CLIP. This approach aims to reduce unwanted biases in VLMs used in applications such as text-to-image, text-to-video retrievals, reverse search, or classification tasks. The framework leverages off-the-shelf segmentation and inpainting models to place humans with diverse visual appearances in context. By training CLIP on these synthetic datasets, the model learns to disentangle human appearance from image context, improving fairness metrics like MaxSkew, MinSkew, and NDKL by 40-66% for image retrieval tasks while retaining similar performance levels in downstream tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper creates a special kind of computer-generated pictures that can help make a type of AI model called Vision Language Models (VLMs) more fair. VLMs are very good at recognizing what’s in pictures and videos, but sometimes they make mistakes because they have biases against certain groups of people. The new framework helps fix this by making the pictures more diverse and realistic. This makes the AI model better at finding the right picture when someone asks for it, without discriminating against certain groups. The results show that the new approach can improve fairness metrics by 40-66% while still being good at doing its job.

Keywords

» Artificial intelligence  » Classification  » Fine tuning