Summary of Diagnosing Hate Speech Classification: Where Do Humans and Machines Disagree, and Why?, by Xilin Yang

Diagnosing Hate Speech Classification: Where Do Humans and Machines Disagree, and Why?

by Xilin Yang

First submitted to arxiv on: 14 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Medium Difficulty summary: This study diagnoses hate speech classification by leveraging cosine similarity ratio, embedding regression, and manual re-annotation. The researchers computed cosine similarity on the “Measuring Hate Speech” dataset containing 135,556 annotated social media comments, demonstrating its application in describing hate speech content. They then analyzed inconsistencies in human annotation, using embedding regression to identify biases (e.g., female annotators are more sensitive to racial slurs targeting black people). The study also trained a SoTA pre-trained large language model, NV-Embed-v2, to classify hate speech, achieving 94% testing accuracy. By comparing machine and human annotations, the researchers found machines make fewer mistakes than humans, but struggle with labeling short swear words. This is attributed to “model alignment” – curating models prevents obvious hate speech, but reduces detection capabilities.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Low Difficulty summary: This study helps us understand how computers can accurately identify hateful language on social media. Researchers used a special math formula (cosine similarity) and looked at how people labeled these comments differently. They found that some annotators were more sensitive to certain types of hate speech than others. The team then trained a computer model to classify hate speech, which did well overall but struggled with short swear words. Surprisingly, computers made fewer mistakes than humans! This study helps us understand why and how we can improve our systems to better detect hateful language.

Keywords

* Artificial intelligence * Alignment * Classification * Cosine similarity * Embedding * Large language model * Regression

Diagnosing Hate Speech Classification: Where Do Humans and Machines Disagree, and Why?

by Xilin Yang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Fast and Accurate Neural Rendering Using Semi-gradients, by In-young Cho et al.

Summary of Improved Regret Bound For Safe Reinforcement Learning Via Tighter Cost Pessimism and Reward Optimism, by Kihyun Yu et al.

Related Posts