Summary of Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies Between Model Predictions and Human Responses in Vqa, by Jian Lan et al.
Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies between Model Predictions and Human Responses in VQA
by Jian Lan, Diego Frassinelli, Barbara Plank
First submitted to arxiv on: 17 Sep 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This study investigates how well large vision-language models perform on the Visual Question Answering (VQA) task when predicting multiple human annotators’ responses. The researchers evaluate the state-of-the-art models, including BEiT3, which currently outperforms others in this task. They find that even the best-performing model struggles to capture the multi-label distribution of diverse human responses and that common calibration techniques can actually worsen the gap between model predictions and human distributions. Instead, calibrating models towards human distributions leads to better alignment with human uncertainty. The study highlights the need for future research to focus on aligning model confidence with human uncertainty in VQA. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how well computer models do when trying to answer questions about pictures. Right now, the best models are good at guessing what one person thinks is the right answer, but they’re not very good at understanding that different people might have different answers. The researchers tested these models and found that even the best ones struggle with this problem. They also tried making the models better by changing how they make predictions, and they found that this actually made things worse! Instead of trying to guess what one person thinks is right, the models should be trying to understand why different people might have different answers. |
Keywords
» Artificial intelligence » Alignment » Question answering