Summary of Multilingual and Explainable Text Detoxification with Parallel Corpora, by Daryna Dementieva et al.
Multilingual and Explainable Text Detoxification with Parallel Corpora
by Daryna Dementieva, Nikolay Babakov, Amit Ronen, Abinew Ali Ayele, Naquee Rizwan, Florian Schneider, Xintong Wang, Seid Muhie Yimam, Daniil Moskovskiy, Elisei Stakovskii, Eran Kaufman, Ashraf Elnagar, Animesh Mukherjee, Alexander Panchenko
First submitted to arxiv on: 16 Dec 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A machine learning-based solution to address digital abusive speech is proposed, focusing on automatic text detoxification. The goal is to transform toxic language into a more neutral or non-toxic form. To achieve this, the availability of parallel corpora for the text detoxification task is crucial. In this work, an extension of the parallel text detoxification corpus to new languages – German, Chinese, Arabic, Hindi, and Amharic – is presented, along with TST baselines in a multilingual setup. Additionally, an automated analysis of descriptive features of both toxic and non-toxic sentences is conducted across 9 languages, providing insights into toxicity and detoxification nuances. A novel text detoxification method inspired by the Chain-of-Thoughts reasoning approach is experimented with, enhancing the prompting process through clustering on relevant descriptive attributes. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A group of researchers are working on a way to make the internet safer by changing mean or hurtful language into kinder language. They’re using special computer programs to do this. The program needs special training data, which they’ve been collecting in many languages. They want to know what makes certain words or sentences “toxic” and how to change them. To figure this out, they analyzed lots of examples of mean and nice language. Then, they came up with a new way to make the computer programs better at changing mean language into kind language. |
Keywords
» Artificial intelligence » Clustering » Machine learning » Prompting