Loading Now

Summary of On Pre-training Of Multimodal Language Models Customized For Chart Understanding, by Wan-cyuan Fan et al.


On Pre-training of Multimodal Language Models Customized for Chart Understanding

by Wan-Cyuan Fan, Yen-Chun Chen, Mengchen Liu, Lu Yuan, Leonid Sigal

First submitted to arxiv on: 19 Jul 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper presents research on customizing Multimodal Large Language Models (MLLMs) for domain-specific tasks, specifically chart comprehension. Recent studies have utilized visual instruction tuning with specialized datasets to enhance question and answer (QA) accuracy within the chart domain. However, these studies often neglect the fundamental discrepancy between natural image-caption pre-training data and digital chart image-QA data, particularly in the models’ capacity to extract underlying numeric values from charts. This paper addresses this oversight by exploring the training processes necessary to improve MLLMs’ comprehension of charts. The authors present three key findings: (1) incorporating raw data values in alignment pre-training improves comprehension of chart data; (2) replacing images with textual representation during fine-tuning transfers language reasoning capability to chart interpretation skills; and (3) requiring the model to extract underlying chart data before answering questions can further improve accuracy. The authors introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension, which effectively interprets various types of charts while maintaining robust reasoning abilities.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about a new way to train machines to understand charts and graphs. Right now, machines are not very good at this task because they were trained on images and words from the internet, rather than actual data from charts. The researchers found three things that help: (1) using real numbers in training helps; (2) replacing pictures with text makes it easier for machines to understand; and (3) having machines extract data before answering questions also improves accuracy. They created a new machine called CHOPINLLM that is good at understanding charts, even ones without labels. The researchers also created a new way to test how well machines can do this task.

Keywords

» Artificial intelligence  » Alignment  » Fine tuning  » Instruction tuning