Loading Now

Summary of Rtp-lx: Can Llms Evaluate Toxicity in Multilingual Scenarios?, by Adrian De Wynter et al.


RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

by Adrian de Wynter, Ishaan Watts, Tua Wongsangaroonsri, Minghui Zhang, Noura Farra, Nektar Ege Altıntoprak, Lena Baur, Samantha Claudet, Pavel Gajdusek, Can Gören, Qilong Gu, Anna Kaminska, Tomasz Kaminski, Ruby Kuo, Akiko Kyuba, Jongho Lee, Kartik Mathur, Petter Merok, Ivana Milovanović, Nani Paananen, Vesa-Matti Paananen, Anna Pavlenko, Bruno Pereira Vidal, Luciano Strika, Yueh Tsao, Davide Turcato, Oleksandr Vakhno, Judit Velcsov, Anna Vickers, Stéphanie Visser, Herdyan Widarmanto, Andrey Zaikin, Si-Qing Chen

First submitted to arxiv on: 22 Apr 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Computers and Society (cs.CY); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper introduces RTP-LX, a large-scale corpus of human-transcreated and annotated toxic prompts and outputs in 28 languages. The goal is to evaluate the safety of multilingual language models (LLMs) and small language models (SLMs) as they are rapidly deployed. The authors use participatory design practices to create a culturally-sensitive dataset that can detect subtle-yet-harmful content, such as microaggressions and bias. They test 10 LLMs/SLMs on their ability to detect toxic content in a multilingual scenario and find that while they have acceptable accuracy, they struggle to discern harm in context-dependent scenarios. The authors release the dataset to contribute to reducing harmful uses of these models and improving their safe deployment.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making sure language models don’t cause harm as they are used more and more. They created a big collection of text prompts and outputs that are toxic or hurtful in 28 different languages. The goal is to help make sure these language models can detect when someone is saying something harmful, even if it’s not obvious. They tested some language models on this task and found that while they’re pretty good at detecting harm, they struggle with subtle forms of harm, like microaggressions or bias. The authors hope that by releasing their dataset, they can help make language models safer for everyone.

Keywords

» Artificial intelligence