Loading Now

Summary of Cvqa: Culturally-diverse Multilingual Visual Question Answering Benchmark, by David Romero et al.


CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

by David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song, Henok Biadglign Ademtew, Hernán Maina, Holy Lovenia, Israel Abebe Azime, Jan Christian Blaise Cruz, Jay Gala, Jiahui Geng, Jesus-German Ortiz-Barajas, Jinheon Baek, Jocelyn Dunstan, Laura Alonso Alemany, Kumaranage Ravindu Yasas Nagasinghe, Luciana Benotti, Luis Fernando D’Haro, Marcelo Viridiano, Marcos Estecha-Garitagoitia, Maria Camila Buitrago Cabrera, Mario Rodríguez-Cantelar, Mélanie Jouitteau, Mihail Mihaylov, Mohamed Fazli Mohamed Imam, Muhammad Farid Adilazuarda, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Naome Etori, Olivier Niyomugisha, Paula Mónica Silva, Pranjal Chitale, Raj Dabre, Rendi Chevi, Ruochen Zhang, Ryandito Diandaru, Samuel Cahyawijaya, Santiago Góngora, Soyeong Jeong, Sukannya Purkayastha, Tatsuki Kuribayashi, Teresa Clifford, Thanmay Jayakumar, Tiago Timponi Torrent, Toqeer Ehsan, Vladimir Araujo, Yova Kementchedjhieva, Zara Burzo, Zheng Wei Lim, Zheng Xin Yong, Oana Ignat, Joan Nwatu, Rada Mihalcea, Thamar Solorio, Alham Fikri Aji

First submitted to arxiv on: 10 Jun 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Visual Question Answering (VQA) is a crucial task in multimodal AI, evaluating the ability to comprehend and reason on visual and textual data. Current VQA models rely heavily on English-centric datasets with Western-centric images. Recent efforts have expanded language coverage but still lack diversity in low-resource languages. To address these limitations, we introduce CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, collecting data from 30 countries across four continents and 31 languages. The dataset features culturally-driven images and questions, engaging native speakers and cultural experts throughout the process. We benchmark several Multimodal Large Language Models (MLLMs) on CVQA, demonstrating its challenging nature for current state-of-the-art models. This benchmark serves as a probing evaluation suite for assessing the cultural capability and bias of multimodal models, encouraging research toward increasing cultural awareness and linguistic diversity.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about creating a new tool to help computers understand images and text from different languages and cultures. Currently, most computer systems can only handle English or a few other popular languages, which is not fair since many people around the world speak different languages. The researchers created this new system by collecting data from 30 countries and 31 languages. They made sure that the images and questions were culturally relevant to each language. Then, they tested some computer models on this new system and found that they struggled to understand it. This is important because it means we need to improve how computers can learn about different cultures and languages.

Keywords

» Artificial intelligence  » Question answering