Summary of Goat-bench: Safety Insights to Large Multimodal Models Through Meme-based Social Abuse, by Hongzhan Lin et al.
GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse
by Hongzhan Lin, Ziyang Luo, Bo Wang, Ruichao Yang, Jing Ma
First submitted to arxiv on: 3 Jan 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the capabilities of large multimodal models (LMMs) in detecting and responding to online abuse manifested in memes. The authors introduce a comprehensive benchmark, GOAT-Bench, containing over 6K varied memes that encapsulate themes such as implicit hate speech, sexism, and cyberbullying. They examine the ability of LMMs, including GPT-4o, to assess hatefulness, misogyny, offensiveness, sarcasm, and harmful content using this benchmark. The results show that current models still lack safety awareness, exhibiting insensitivity to various forms of implicit abuse. This study highlights a critical limitation in achieving safe artificial intelligence. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how big language models can spot online abuse in memes. Memes are funny images or videos with captions, and they’re often used to express opinions or make jokes. But sometimes, people use memes to be mean or hurtful. The authors created a special test called GOAT-Bench that contains many different types of memes. They want to see if the big language models can tell which ones are nice and which ones are not nice. They found out that even the best models still have trouble recognizing when someone is being mean online. |
Keywords
» Artificial intelligence » Gpt