Summary of On the Role Of Speech Data in Reducing Toxicity Detection Bias, by Samuel J. Bell et al.
On the Role of Speech Data in Reducing Toxicity Detection Bias
by Samuel J. Bell, Mariano Coria Meglioli, Megan Richards, Eduardo Sánchez, Christophe Ropers, Skyler Wang, Adina Williams, Levent Sagun, Marta R. Costa-jussà
First submitted to arxiv on: 12 Nov 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The abstract presents a study on the biases in text-based toxicity detection systems, which produce high rates of false positives when mentioning demographic groups. The researchers investigate whether speech-based systems can mitigate these biases by comparing speech- and text-based toxicity classifiers on a multilingual MuTox dataset. They find that access to speech data during inference reduces bias against group mentions, particularly for ambiguous and disagreement-inducing samples. Additionally, the study suggests that improving classifiers rather than transcription pipelines is more effective in reducing group bias. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper looks at how well systems can detect toxic language in text and speech. Right now, these systems are biased, which means they incorrectly flag certain groups of people more often than others. The researchers want to see if using audio recordings instead of just written text helps reduce this bias. They tested several different types of classifiers on a dataset called MuTox, and found that when the system gets to listen to the recording, it’s less likely to make false accusations against certain groups. This is especially true for tricky cases where people might disagree or have mixed feelings. Overall, the study suggests that making the classifier better is more important than improving how we transcribe spoken words. |
Keywords
* Artificial intelligence * Inference