Summary of Rtp-lx: Can Llms Evaluate Toxicity in Multilingual Scenarios?, by Adrian De Wynter et al.
RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
by Adrian de Wynter, Ishaan Watts, Tua Wongsangaroonsri, Minghui Zhang, Noura Farra, Nektar Ege Altıntoprak, Lena Baur, Samantha Claudet, Pavel Gajdusek, Can Gören, Qilong Gu, Anna Kaminska, Tomasz Kaminski, Ruby Kuo, Akiko Kyuba, Jongho Lee, Kartik Mathur, Petter Merok, Ivana Milovanović, Nani Paananen, Vesa-Matti Paananen, Anna Pavlenko, Bruno Pereira Vidal, Luciano Strika, Yueh Tsao, Davide Turcato, Oleksandr Vakhno, Judit Velcsov, Anna Vickers, Stéphanie Visser, Herdyan Widarmanto, Andrey Zaikin, Si-Qing Chen
First submitted to arxiv on: 22 Apr 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Computers and Society (cs.CY); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper introduces RTP-LX, a large-scale corpus of human-transcreated and annotated toxic prompts and outputs in 28 languages. The goal is to evaluate the safety of multilingual language models (LLMs) and small language models (SLMs) as they are rapidly deployed. The authors use participatory design practices to create a culturally-sensitive dataset that can detect subtle-yet-harmful content, such as microaggressions and bias. They test 10 LLMs/SLMs on their ability to detect toxic content in a multilingual scenario and find that while they have acceptable accuracy, they struggle to discern harm in context-dependent scenarios. The authors release the dataset to contribute to reducing harmful uses of these models and improving their safe deployment. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about making sure language models don’t cause harm as they are used more and more. They created a big collection of text prompts and outputs that are toxic or hurtful in 28 different languages. The goal is to help make sure these language models can detect when someone is saying something harmful, even if it’s not obvious. They tested some language models on this task and found that while they’re pretty good at detecting harm, they struggle with subtle forms of harm, like microaggressions or bias. The authors hope that by releasing their dataset, they can help make language models safer for everyone. |