Summary of Towards Interpretable Hate Speech Detection Using Large Language Model-extracted Rationales, by Ayushi Nirmal et al.

Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales

by Ayushi Nirmal, Amrita Bhattacharjee, Paras Sheth, Huan Liu

First submitted to arxiv on: 19 Mar 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper proposes a novel approach to hate speech detection on social media platforms. The authors aim to address the lack of interpretability in existing black-box methods by leveraging Large Language Models (LLMs) to extract features from input text, training a base hate speech classifier that enables faithful interpretability. The framework combines LLMs’ textual understanding capabilities with state-of-the-art hate speech classifiers’ discriminative power. Evaluation on various English language social media hate speech datasets demonstrates the effectiveness of LLM-extracted rationales and the surprising retention of detector performance after ensuring interpretability.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Hate speech on social media can be a big problem. Some people use these platforms to spread hurtful or offensive ideas, but it’s hard for computers to identify this kind of content automatically. Most of the methods used today are like a black box – they work well, but we don’t really know how they make their decisions. This paper tries to change that by using special language models to help computers understand hate speech and then train them to make fair and understandable choices. The results show that this approach works well even when we’re trying to balance the need for accuracy with the need for fairness.

Keywords

* Artificial intelligence

Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales

by Ayushi Nirmal, Amrita Bhattacharjee, Paras Sheth, Huan Liu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Methods For Generating Drift in Text Streams, by Cristiano Mesquita Garcia et al.

Summary of Do Generated Data Always Help Contrastive Learning?, by Yifei Wang et al.

Related Posts