Loading Now

Summary of Gaokao-mm: a Chinese Human-level Benchmark For Multimodal Models Evaluation, by Yi Zong et al.


GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation

by Yi Zong, Xipeng Qiu

First submitted to arxiv on: 24 Feb 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes GAOKAO-MM, a multimodal benchmark for Large Vision-Language Models (LVLMs). The current benchmarks are insufficient to reflect the comprehensive capabilities of LVLMs. GAOKAO-MM is based on the Chinese College Entrance Examination and includes 8 subjects and 12 types of images. This benchmark sets human-level requirements for the models’ abilities, including perception, understanding, knowledge, and reasoning. The authors evaluate 10 LVLMs and find that none of them achieve an accuracy higher than 50%. GPT-4-Vision, Qwen-VL-Plus, and Gemini-Pro-Vision are among the top performers. The results indicate that LVLMs have a moderate distance towards Artificial General Intelligence (AGI) and provide insights for developing multilingual LVLMs.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper creates a new way to test big language models. These models can understand images and text, but current tests don’t check if they’re really smart. The new test, called GAOKAO-MM, uses Chinese school exam questions and images like diagrams and maps. This helps the models show what they can do, like recognizing objects and understanding stories. Researchers tested 10 big language models and found that none of them could answer more than half of the questions correctly. This shows how far these models are from being super intelligent. The results will help scientists make better language models.

Keywords

» Artificial intelligence  » Gemini  » Gpt