Summary of Unibucllm: Harnessing Llms For Automated Prediction Of Item Difficulty and Response Time For Multiple-choice Questions, by Ana-cristina Rogoz et al.
UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions
by Ana-Cristina Rogoz, Radu Tudor Ionescu
First submitted to arxiv on: 20 Apr 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This novel data augmentation method, based on Large Language Models (LLMs), improves the prediction of item difficulty and response time for retired USMLE Multiple-Choice Questions (MCQs) in the BEA 2024 Shared Task. The approach uses zero-shot LLM answers from Falcon, Meditron, and Mistral to augment the dataset, along with transformer-based models that combine six feature combinations. Results show that predicting question difficulty is more challenging, but top-performing methods consistently include question text and benefit from LLM answer variability. This highlights the potential of LLLMs for improving automated assessment in medical licensing exams. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper uses big language models to make medical test questions easier or harder. It helps predict how hard a question will be and how long someone might take to answer it. The model looks at what these language models say about the question, even if they’re not experts in medicine. This can help make computer-based tests better for people trying to become doctors. |
Keywords
» Artificial intelligence » Data augmentation » Transformer » Zero shot