Loading Now

Summary of Opsd: An Offensive Persian Social Media Dataset and Its Baseline Evaluations, by Mehran Safayani et al.


OPSD: an Offensive Persian Social media Dataset and its baseline evaluations

by Mehran Safayani, Amir Sartipi, Amir Hossein Ahmadi, Parniyan Jalali, Amir Hossein Mansouri, Mohammad Bisheh-Niasar, Zahra Pourbahman

First submitted to arxiv on: 8 Apr 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel paper addresses the pressing issue of hate speech on social media by introducing two Persian-language datasets: one annotated with expert input and another comprised of unlabeled data for unsupervised learning. The dataset curation process involved meticulous labeling stages and inter-annotator agreement measures to ensure quality. To establish baselines, state-of-the-art language models like XLM-RoBERTa were employed, achieving F1-scores of 76.9% for three-class classification and 89.9% for two-class classification.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps combat hate speech on social media by creating two new datasets in the Persian language. It’s like having a special tool to help computers understand what is and isn’t mean or offensive online. The first dataset has been carefully labeled by experts, while the second one is huge and unlabeled, perfect for training machines to learn without being told exactly what to do. To make sure the datasets are useful, the authors tested some of the best computer models on them and got impressive results.

Keywords

* Artificial intelligence  * Classification  * Unsupervised