Loading Now

Summary of “did My Figure Do Justice to the Answer?” : Towards Multimodal Short Answer Grading with Feedback (mmsaf), by Pritam Sil et al.


“Did my figure do justice to the answer?” : Towards Multimodal Short Answer Grading with Feedback (MMSAF)

by Pritam Sil, Bhaskaran Raman, Pushpak Bhattacharyya

First submitted to arxiv on: 27 Dec 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed Multimodal Short Answer Grading with Feedback (MMSAF) problem aims to develop an automated framework for grading short answer questions that involve drawing supporting diagrams. This issue is particularly challenging as existing literature lacks a solution for automatic grading of such answers. The paper presents a dataset of 2197 data points and proposes a framework for generating such datasets. Evaluations on Large Language Models (LLMs) over this dataset achieved an overall accuracy of 55% on Level of Correctness labels and 75% on Image Relevance labels. Interestingly, Pixtral was more aligned with human judgement and values for biology, while ChatGPT performed better for physics and chemistry.
Low GrooveSquid.com (original content) Low Difficulty Summary
Assessments are crucial in the learning process as they provide feedback to students on their proficiency level. However, grading short answer questions can be challenging, especially when students need to draw supporting diagrams. The proposed solution, MMSAF, aims to develop an automated framework for grading such answers. A dataset of 2197 data points was created to test this idea. The results show that Large Language Models (LLMs) can achieve an accuracy of 55% in grading the correctness of answers and 75% in judging the relevance of images.

Keywords

» Artificial intelligence