Loading Now

Summary of Active Learning to Guide Labeling Efforts For Question Difficulty Estimation, by Arthur Thuy et al.


Active Learning to Guide Labeling Efforts for Question Difficulty Estimation

by Arthur Thuy, Ekaterina Loginova, Dries F. Benoit

First submitted to arxiv on: 14 Sep 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computers and Society (cs.CY); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A recent surge in research on Question Difficulty Estimation (QDE) has seen transformer-based neural networks achieve state-of-the-art performance primarily through supervised methods. However, these approaches require abundant labeled data, which can be costly to obtain. In contrast, unsupervised methods do not require labeled data but rely on a different evaluation metric that is computationally expensive in practice. To bridge this research gap, this work explores active learning for QDE, a supervised human-in-the-loop approach aiming to minimize labeling efforts while matching state-of-the-art model performance. The proposed methodology iteratively trains on a labeled subset, acquiring labels from human experts only for the most informative unlabeled data points. A novel acquisition function, PowerVariance, is introduced to add the most informative samples to the labeled set, extending the popular PowerBALD function in classification. DistilBERT is employed for QDE and epistemic uncertainty is captured by applying Monte Carlo dropout to identify informative samples. The results show that active learning with PowerVariance acquisition achieves performance close to fully supervised models after labeling only 10% of the training data.
Low GrooveSquid.com (original content) Low Difficulty Summary
Question Difficulty Estimation (QDE) helps make educational resources more accessible to course instructors and personalized support systems better. Researchers have been trying to get computers to estimate how hard a question is, but it’s been tricky because they need lots of labeled data. Labeled data means someone has already marked the answers as correct or not, which can be time-consuming. This study finds a way to make computers guess how hard a question is using only a little bit of labeled data and some clever math tricks.

Keywords

» Artificial intelligence  » Active learning  » Classification  » Dropout  » Supervised  » Transformer  » Unsupervised