Loading Now

Summary of Unibucllm: Harnessing Llms For Automated Prediction Of Item Difficulty and Response Time For Multiple-choice Questions, by Ana-cristina Rogoz et al.


UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions

by Ana-Cristina Rogoz, Radu Tudor Ionescu

First submitted to arxiv on: 20 Apr 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This novel data augmentation method, based on Large Language Models (LLMs), improves the prediction of item difficulty and response time for retired USMLE Multiple-Choice Questions (MCQs) in the BEA 2024 Shared Task. The approach uses zero-shot LLM answers from Falcon, Meditron, and Mistral to augment the dataset, along with transformer-based models that combine six feature combinations. Results show that predicting question difficulty is more challenging, but top-performing methods consistently include question text and benefit from LLM answer variability. This highlights the potential of LLLMs for improving automated assessment in medical licensing exams.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper uses big language models to make medical test questions easier or harder. It helps predict how hard a question will be and how long someone might take to answer it. The model looks at what these language models say about the question, even if they’re not experts in medicine. This can help make computer-based tests better for people trying to become doctors.

Keywords

» Artificial intelligence  » Data augmentation  » Transformer  » Zero shot