Summary of Insights on Disagreement Patterns in Multimodal Safety Perception Across Diverse Rater Groups, by Charvi Rastogi et al.
Insights on Disagreement Patterns in Multimodal Safety Perception across Diverse Rater Groups
by Charvi Rastogi, Tian Huey Teh, Pushkar Mishra, Roma Patel, Zoe Ashwood, Aida Mostafazadeh Davani, Mark Diaz, Michela Paganini, Alicia Parrish, Ding Wang, Vinodkumar Prabhakaran, Lora Aroyo, Verena Rieser
First submitted to arxiv on: 22 Oct 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates how human ratings impact the evaluation of generative AI, particularly regarding safety concerns. The authors focus on text-to-image generations and explore how demographic differences affect perceptions of harm. A large-scale study with 630 raters from diverse backgrounds reveals significant variations in harm assessments across different groups and types of violations. Comparing these results to expert raters trained on specific safety policies highlights the importance of incorporating diverse perspectives. The findings emphasize the need for inclusive safety evaluation, ensuring generative AI systems reflect the values of all users. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper studies how people rate generative AI’s safety differently because of their backgrounds and experiences. They looked at text-to-image generations and found that different groups see harm in different ways. A big study with many raters showed that even small differences between people can make a big difference in what they consider safe or not. This means we need to listen to diverse perspectives when evaluating AI’s safety, so it reflects everyone’s values. |