Loading Now

Summary of Diagnosing Hate Speech Classification: Where Do Humans and Machines Disagree, and Why?, by Xilin Yang


Diagnosing Hate Speech Classification: Where Do Humans and Machines Disagree, and Why?

by Xilin Yang

First submitted to arxiv on: 14 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Computers and Society (cs.CY); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Medium Difficulty summary: This study diagnoses hate speech classification by leveraging cosine similarity ratio, embedding regression, and manual re-annotation. The researchers computed cosine similarity on the “Measuring Hate Speech” dataset containing 135,556 annotated social media comments, demonstrating its application in describing hate speech content. They then analyzed inconsistencies in human annotation, using embedding regression to identify biases (e.g., female annotators are more sensitive to racial slurs targeting black people). The study also trained a SoTA pre-trained large language model, NV-Embed-v2, to classify hate speech, achieving 94% testing accuracy. By comparing machine and human annotations, the researchers found machines make fewer mistakes than humans, but struggle with labeling short swear words. This is attributed to “model alignment” – curating models prevents obvious hate speech, but reduces detection capabilities.
Low GrooveSquid.com (original content) Low Difficulty Summary
Low Difficulty summary: This study helps us understand how computers can accurately identify hateful language on social media. Researchers used a special math formula (cosine similarity) and looked at how people labeled these comments differently. They found that some annotators were more sensitive to certain types of hate speech than others. The team then trained a computer model to classify hate speech, which did well overall but struggled with short swear words. Surprisingly, computers made fewer mistakes than humans! This study helps us understand why and how we can improve our systems to better detect hateful language.

Keywords

» Artificial intelligence  » Alignment  » Classification  » Cosine similarity  » Embedding  » Large language model  » Regression