Summary of Wildvision: Evaluating Vision-language Models in the Wild with Human Preferences, by Yujie Lu et al.
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
by Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Yejin Choi, Bill Yuchen Lin
First submitted to arxiv on: 16 Jun 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A recent paper emphasizes the importance of benchmarking human preferences in real-world multimodal interactions using vision-language models (VLMs). To address this gap, the researchers launched WildVision-Arena (WV-Arena), an online platform that collects human preferences to evaluate VLMs. The WV-Bench was curated by selecting 500 high-quality samples from 8,000 user submissions in WV-Arena and used GPT-4 as the judge to compare each VLM with Claude-3-Sonnet, achieving a Spearman correlation of 0.94 with the WV-Arena Elo. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about how to test which vision-language models are best at understanding what people want. The researchers made a special place online where people can say what they think is good or bad about different pictures and words combined. They picked out some really great examples from all the things people said, and used a super smart computer program (GPT-4) to compare how well the models did compared to another model called Claude-3-Sonnet. |
Keywords
» Artificial intelligence » Claude » Gpt