Summary of Gaokao-mm: a Chinese Human-level Benchmark For Multimodal Models Evaluation, by Yi Zong et al.
GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation
by Yi Zong, Xipeng Qiu
First submitted to arxiv on: 24 Feb 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes GAOKAO-MM, a multimodal benchmark for Large Vision-Language Models (LVLMs). The current benchmarks are insufficient to reflect the comprehensive capabilities of LVLMs. GAOKAO-MM is based on the Chinese College Entrance Examination and includes 8 subjects and 12 types of images. This benchmark sets human-level requirements for the models’ abilities, including perception, understanding, knowledge, and reasoning. The authors evaluate 10 LVLMs and find that none of them achieve an accuracy higher than 50%. GPT-4-Vision, Qwen-VL-Plus, and Gemini-Pro-Vision are among the top performers. The results indicate that LVLMs have a moderate distance towards Artificial General Intelligence (AGI) and provide insights for developing multilingual LVLMs. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper creates a new way to test big language models. These models can understand images and text, but current tests don’t check if they’re really smart. The new test, called GAOKAO-MM, uses Chinese school exam questions and images like diagrams and maps. This helps the models show what they can do, like recognizing objects and understanding stories. Researchers tested 10 big language models and found that none of them could answer more than half of the questions correctly. This shows how far these models are from being super intelligent. The results will help scientists make better language models. |
Keywords
» Artificial intelligence » Gemini » Gpt