Summary of Enhancing Llm-based Hatred and Toxicity Detection with Meta-toxic Knowledge Graph, by Yibo Zhao et al.

Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph

by Yibo Zhao, Jiapeng Zhu, Can Xu, Xiang Li

First submitted to arxiv on: 17 Dec 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed MetaTox method addresses two key challenges in Large Language Models (LLMs) for toxicity detection: false negatives due to a lack of domain-specific toxic knowledge and false positives due to excessive sensitivity to toxic speech, which limits freedom of speech. To achieve this, MetaTox leverages graph search on a meta-toxic knowledge graph, enhancing hatred and toxicity detection. A comprehensive meta-toxic knowledge graph is constructed by extracting toxic information from LLMs through a three-step pipeline using toxic benchmark datasets as corpora. This is followed by querying the graph via retrieval and ranking processes to supplement accurate, relevant toxic knowledge. Experimental results demonstrate that MetaTox significantly decreases false positives while boosting overall toxicity detection performance.
Low	GrooveSquid.com (original content)	Low Difficulty Summary MetaTox helps detect online content toxicity more accurately. Right now, Large Language Models are used for this task but they have some big problems. They often miss toxic content because they don’t know enough about specific kinds of hate speech. On the other hand, they’re too good at finding toxic language and might flag things that aren’t actually harmful. The MetaTox method tries to fix these issues by creating a special graph that helps LLMs learn more about different types of hate speech. This graph is built using information from many different places online. Then, when an LLM wants to identify toxic content, it can ask the graph for help and get more accurate results.

Keywords

» Artificial intelligence » Boosting » Knowledge graph

Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph

by Yibo Zhao, Jiapeng Zhu, Can Xu, Xiang Li

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Chinese Safetyqa: a Safety Short-form Factuality Benchmark For Large Language Models, by Yingshui Tan et al.

Summary of Xrag: Examining the Core — Benchmarking Foundational Components in Advanced Retrieval-augmented Generation, by Qianren Mao et al.

Related Posts