Summary of Diagnosing Hate Speech Classification: Where Do Humans and Machines Disagree, and Why?, by Xilin Yang
Diagnosing Hate Speech Classification: Where Do Humans and Machines Disagree, and Why?
by Xilin Yang
First submitted to arxiv on: 14 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Computers and Society (cs.CY); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Medium Difficulty summary: This study diagnoses hate speech classification by leveraging cosine similarity ratio, embedding regression, and manual re-annotation. The researchers computed cosine similarity on the “Measuring Hate Speech” dataset containing 135,556 annotated social media comments, demonstrating its application in describing hate speech content. They then analyzed inconsistencies in human annotation, using embedding regression to identify biases (e.g., female annotators are more sensitive to racial slurs targeting black people). The study also trained a SoTA pre-trained large language model, NV-Embed-v2, to classify hate speech, achieving 94% testing accuracy. By comparing machine and human annotations, the researchers found machines make fewer mistakes than humans, but struggle with labeling short swear words. This is attributed to “model alignment” – curating models prevents obvious hate speech, but reduces detection capabilities. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Low Difficulty summary: This study helps us understand how computers can accurately identify hateful language on social media. Researchers used a special math formula (cosine similarity) and looked at how people labeled these comments differently. They found that some annotators were more sensitive to certain types of hate speech than others. The team then trained a computer model to classify hate speech, which did well overall but struggled with short swear words. Surprisingly, computers made fewer mistakes than humans! This study helps us understand why and how we can improve our systems to better detect hateful language. |
Keywords
» Artificial intelligence » Alignment » Classification » Cosine similarity » Embedding » Large language model » Regression