Summary of Worldcuisines: a Massive-scale Benchmark For Multilingual and Multicultural Visual Question Answering on Global Cuisines, by Genta Indra Winata et al.
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
by Genta Indra Winata, Frederikus Hudi, Patrick Amadeus Irawan, David Anugraha, Rifki Afina Putri, Yutong Wang, Adam Nohejl, Ubaidillah Ariq Prathama, Nedjma Ousidhoum, Afifa Amriani, Anar Rzayev, Anirban Das, Ashmari Pramodya, Aulia Adila, Bryan Wilie, Candy Olivia Mawalim, Ching Lam Cheng, Daud Abolade, Emmanuele Chersoni, Enrico Santus, Fariz Ikhwantri, Garry Kuwanto, Hanyang Zhao, Haryo Akbarianto Wibowo, Holy Lovenia, Jan Christian Blaise Cruz, Jan Wira Gotama Putra, Junho Myung, Lucky Susanto, Maria Angelica Riera Machin, Marina Zhukova, Michael Anugraha, Muhammad Farid Adilazuarda, Natasha Santosa, Peerat Limkonchotiwat, Raj Dabre, Rio Alexander Audino, Samuel Cahyawijaya, Shi-Xiong Zhang, Stephanie Yulia Salim, Yi Zhou, Yinxuan Gui, David Ifeoluwa Adelani, En-Shiun Annie Lee, Shogo Okada, Ayu Purwarianti, Alham Fikri Aji, Taro Watanabe, Derry Tanti Wijaya, Alice Oh, Chong-Wah Ngo
First submitted to arxiv on: 16 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel benchmark called WorldCuisines is proposed to evaluate Vision Language Models’ (VLMs) understanding of culture-specific knowledge in multilingual and multicultural settings. The benchmark consists of a massive-scale visual question answering (VQA) dataset featuring over 1 million text-image pairs across 30 languages and dialects, spanning 9 language families. The dataset includes tasks for identifying dish names and their origins, and is designed to test VLMs’ ability to understand and respond to culture-specific knowledge. While VLMs perform well with correct location context, they struggle with adversarial contexts and predicting specific regional cuisines and languages. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine you’re trying to order food at a restaurant in a foreign country where you don’t speak the language. You’d want the AI-powered chatbot to understand what you mean when you ask for “pizza” or “sushi.” But right now, those chatbots struggle to comprehend cultural differences and nuances. A team of researchers created WorldCuisines, a massive database of images and text that tests whether AI models can understand different languages and cuisines from around the world. The goal is to make sure these AI systems can communicate effectively with people from diverse backgrounds. |
Keywords
» Artificial intelligence » Question answering