Loading Now

Summary of See It From My Perspective: How Language Affects Cultural Bias in Image Understanding, by Amith Ananthram et al.


See It from My Perspective: How Language Affects Cultural Bias in Image Understanding

by Amith Ananthram, Elias Stengel-Eskin, Mohit Bansal, Kathleen McKeown

First submitted to arxiv on: 17 Jun 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the cultural biases of vision-language models (VLMs) that respond to image queries across multiple languages. Researchers characterize the Western bias of VLMs and explore the role language plays in this disparity. They evaluate VLMs on subjective and objective visual tasks using culturally diverse images and annotations, finding better performance on Western splits than East Asian ones. The study identifies a source of bias in the lack of diversity in language model construction during text-only pre-training. Interestingly, representation of all languages during pre-training can reduce bias even when prompting in English. This work emphasizes the importance of richer language representations for equitable VLMs.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine you’re trying to understand an image using a computer program. This program is trained on lots of words and pictures from different places around the world, but it’s not perfect. It tends to focus more on what Western cultures think is important in an image, rather than considering other cultures’ perspectives. In this research, scientists studied how language affects how well these programs understand images. They found that these programs do better when they’re trained on words and pictures from Western cultures, but can be improved by adding more languages to their training data. This study shows us that we need to make sure our computer programs are fair and considerate of all cultures.

Keywords

» Artificial intelligence  » Language model  » Prompting