Summary of They’re All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias, by Salma Abdel Magid et al.
They’re All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias
by Salma Abdel Magid, Jui-Hsien Wang, Kushal Kafle, Hanspeter Pfister
First submitted to arxiv on: 17 Jun 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Information Retrieval (cs.IR); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed framework generates synthetic counterfactual images to create a diverse and balanced dataset for fine-tuning Vision Language Models (VLMs) like CLIP. This approach aims to reduce unwanted biases in VLMs used in applications such as text-to-image, text-to-video retrievals, reverse search, or classification tasks. The framework leverages off-the-shelf segmentation and inpainting models to place humans with diverse visual appearances in context. By training CLIP on these synthetic datasets, the model learns to disentangle human appearance from image context, improving fairness metrics like MaxSkew, MinSkew, and NDKL by 40-66% for image retrieval tasks while retaining similar performance levels in downstream tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper creates a special kind of computer-generated pictures that can help make a type of AI model called Vision Language Models (VLMs) more fair. VLMs are very good at recognizing what’s in pictures and videos, but sometimes they make mistakes because they have biases against certain groups of people. The new framework helps fix this by making the pictures more diverse and realistic. This makes the AI model better at finding the right picture when someone asks for it, without discriminating against certain groups. The results show that the new approach can improve fairness metrics by 40-66% while still being good at doing its job. |
Keywords
» Artificial intelligence » Classification » Fine tuning