Loading Now

Summary of Towards Interpretable Hate Speech Detection Using Large Language Model-extracted Rationales, by Ayushi Nirmal et al.


Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales

by Ayushi Nirmal, Amrita Bhattacharjee, Paras Sheth, Huan Liu

First submitted to arxiv on: 19 Mar 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a novel approach to hate speech detection on social media platforms. The authors aim to address the lack of interpretability in existing black-box methods by leveraging Large Language Models (LLMs) to extract features from input text, training a base hate speech classifier that enables faithful interpretability. The framework combines LLMs’ textual understanding capabilities with state-of-the-art hate speech classifiers’ discriminative power. Evaluation on various English language social media hate speech datasets demonstrates the effectiveness of LLM-extracted rationales and the surprising retention of detector performance after ensuring interpretability.
Low GrooveSquid.com (original content) Low Difficulty Summary
Hate speech on social media can be a big problem. Some people use these platforms to spread hurtful or offensive ideas, but it’s hard for computers to identify this kind of content automatically. Most of the methods used today are like a black box – they work well, but we don’t really know how they make their decisions. This paper tries to change that by using special language models to help computers understand hate speech and then train them to make fair and understandable choices. The results show that this approach works well even when we’re trying to balance the need for accuracy with the need for fairness.

Keywords

* Artificial intelligence