Summary of Multiparadetox: Extending Text Detoxification with Parallel Data to New Languages, by Daryna Dementieva et al.
MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages
by Daryna Dementieva, Nikolay Babakov, Alexander Panchenko
First submitted to arxiv on: 2 Apr 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a novel approach to text detoxification, a task that involves rewriting toxic texts into neutral ones. The authors extend existing methods to collect parallel detoxification corpora across multiple languages, creating MultiParaDetox. They then experiment with different text detoxification models, including unsupervised baselines and Large Language Models (LLMs), demonstrating the importance of parallel corpora in achieving state-of-the-art results. This work has significant implications for ensuring safe communication online, particularly in social networks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Text detoxification is a way to make mean or rude texts nicer and more friendly. Researchers have been working on this problem and found it can help keep people safe online. But so far, they’ve only done it with languages like English. This new project wants to make it work for any language, by collecting many examples of parallel text that is both toxic and not toxic. The team then tested different methods for cleaning up the toxic texts and showed that using a lot of examples from different languages can help them get even better at doing this task. |
Keywords
» Artificial intelligence » Unsupervised