Summary of Multisocial: Multilingual Benchmark Of Machine-generated Text Detection Of Social-media Texts, by Dominik Macko et al.

by Dominik Macko, Jakub Kopal, Robert Moro, Ivan Srba

First submitted to arxiv on: 18 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The recent advancements in Large Language Models (LLMs) have enabled them to generate high-quality multilingual texts that are indistinguishable from authentic human-written ones. However, most research in machine-generated text detection has focused on longer texts such as news articles, scientific papers, or student essays in the English language. The social-media domain presents a gap in studying the ability of existing methods to detect shorter and informal texts, which often feature grammatical errors, emoticons, and hashtags. To address this gap, we propose the first multilingual (22 languages) and multi-platform (5 social media platforms) dataset called MultiSocial for benchmarking machine-generated text detection in the social-media domain. The dataset contains 472,097 texts, with approximately 58k being human-written and about the same amount generated by each of 7 multilingual LLMs. We compare existing detection methods using this benchmark, both in zero-shot and fine-tuned forms. Our results show that fine-tuned detectors can be trained on social-media texts, and platform selection for training matters.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Machine-generated text detection is a challenging task, especially when it comes to social media platforms where texts are short and informal. Currently, most methods are designed for longer texts like news articles or scientific papers, but this doesn’t account for the way people communicate on social media. To fill this gap, researchers have created a new dataset called MultiSocial that contains over 472,000 texts from 5 different social media platforms in 22 languages. This dataset includes both human-written and machine-generated texts, which can be used to test how well detection methods work. The results show that fine-tuning the detectors for each platform improves their performance. This matters because it shows that even small differences between platforms can make a big difference in detecting machine-generated text.

Keywords

* Artificial intelligence * Fine tuning * Zero shot

MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts

by Dominik Macko, Jakub Kopal, Robert Moro, Ivan Srba

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Dassf: Dynamic-attention Scale-sequence Fusion For Aerial Object Detection, by Haodong Li et al.

Summary of Bridging Local Details and Global Context in Text-attributed Graphs, by Yaoke Wang et al.

Related Posts