Loading Now

Summary of Mmsci: a Dataset For Graduate-level Multi-discipline Multimodal Scientific Understanding, by Zekun Li et al.


MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding

by Zekun Li, Xianjun Yang, Kyuri Choi, Wanrong Zhu, Ryan Hsieh, HyeonJung Kim, Jin Hyuk Lim, Sungyoung Ji, Byungju Lee, Xifeng Yan, Linda Ruth Petzold, Stephen D. Wilson, Woosang Lim, William Yang Wang

First submitted to arxiv on: 6 Jul 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A new large-scale dataset is introduced to advance AI-driven scientific assistants’ ability to interpret complex scientific figures from various domains, including Nature Communications articles covering 72 fields. The dataset contains schematic diagrams, microscopic images, and experimental data requiring graduate-level expertise to interpret. Models were evaluated on two tasks: figure captioning and multiple-choice questions. Human expert annotation revealed significant task challenges and performance gaps among models. Fine-tuning Qwen2-VL-7B with the task-specific data achieved better performance than GPT-4o and human experts in multiple-choice evaluations. The dataset can support further research, enabling continuous pre-training on scientific articles and figures to enhance model performance in downstream tasks like materials science.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps AI assistants understand complex scientific pictures from many different fields of study. These pictures are hard to interpret and require a high level of expertise. To improve this ability, the researchers created a big dataset with lots of examples from Nature Communications articles that cover 72 fields. They tested many models on two tasks: writing captions for the pictures and answering multiple-choice questions. The results showed that some models did better than others at these tasks. When they fine-tuned one model using this new dataset, it performed better than other models and even humans in some cases. This big dataset can be used to help train AI assistants to do better in the future.

Keywords

» Artificial intelligence  » Fine tuning  » Gpt