Loading Now

Summary of Wildvision: Evaluating Vision-language Models in the Wild with Human Preferences, by Yujie Lu et al.


WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

by Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Yejin Choi, Bill Yuchen Lin

First submitted to arxiv on: 16 Jun 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A recent paper emphasizes the importance of benchmarking human preferences in real-world multimodal interactions using vision-language models (VLMs). To address this gap, the researchers launched WildVision-Arena (WV-Arena), an online platform that collects human preferences to evaluate VLMs. The WV-Bench was curated by selecting 500 high-quality samples from 8,000 user submissions in WV-Arena and used GPT-4 as the judge to compare each VLM with Claude-3-Sonnet, achieving a Spearman correlation of 0.94 with the WV-Arena Elo.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about how to test which vision-language models are best at understanding what people want. The researchers made a special place online where people can say what they think is good or bad about different pictures and words combined. They picked out some really great examples from all the things people said, and used a super smart computer program (GPT-4) to compare how well the models did compared to another model called Claude-3-Sonnet.

Keywords

» Artificial intelligence  » Claude  » Gpt