Summary of Yesbut: a High-quality Annotated Multimodal Dataset For Evaluating Satire Comprehension Capability Of Vision-language Models, by Abhilash Nandy et al.
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models
by Abhilash Nandy, Yash Agarwal, Ashish Patwa, Millon Madhur Das, Aman Bansal, Ankit Raj, Pawan Goyal, Niloy Ganguly
First submitted to arxiv on: 20 Sep 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed paper tackles the challenging task of understanding satire and humor in images. The authors introduce three tasks: Satirical Image Detection, Understanding, and Completion, which involve identifying satirical images, generating reasons for their satire, and completing partially shown images to create a satirical whole. A high-quality dataset called YesBut is released, consisting of 2547 images with diverse artistic styles, to evaluate these tasks. Despite the success of current Vision-Language models on multimodal tasks like Visual QA and Image Captioning, they struggle with the proposed tasks in Zero-Shot Settings, as revealed by both automated and human evaluation benchmarks. The authors also release a dataset of real, satirical photographs for further research. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about understanding what makes an image funny or ironic. The researchers created three challenges to help machines better understand humor: detecting if an image is meant to be humorous, figuring out why it’s humorous, and completing a partially shown image so that it remains funny when complete. They made a big dataset with many images that are either normal or satirical to test these challenges. Surprisingly, even the best machines today struggle to get these tasks right, which means there is still much work to be done in this area. |
Keywords
» Artificial intelligence » Image captioning » Zero shot