Loading Now

Summary of Fine-tuning Large Language Models For Multigenerator, Multidomain, and Multilingual Machine-generated Text Detection, by Feng Xiong et al.


Fine-tuning Large Language Models for Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection

by Feng Xiong, Thanet Markchom, Ziwei Zheng, Subin Jung, Varun Ojha, Huizhi Liang

First submitted to arxiv on: 22 Jan 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes SemEval-2024 Task 8, a challenge to identify machine-generated texts from diverse Large Language Models (LLMs) across languages and domains. The task consists of three subtasks: binary classification in monolingual and multilingual settings (Subtask A), multi-class classification (Subtask B), and mixed text detection (Subtask C). Two methods are presented to tackle this challenge: traditional machine learning (ML) with natural language preprocessing (NLP) for feature extraction, and fine-tuning LLMs for text classification. The results show that transformer models, particularly LoRA-RoBERTa, outperform traditional ML approaches in effectiveness, with majority voting being effective in multilingual contexts.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about a challenge to identify texts written by computers from many different languages and topics. It’s like trying to figure out who wrote an essay – human or AI? The task has three parts: deciding if a text is machine-generated or not, classifying texts into categories, and detecting mixed texts that combine human and machine writing styles. Two ways are proposed to solve this challenge: using traditional computer learning techniques with natural language processing, and training special computers (LLMs) to classify texts. The results show that certain AI models called transformer models do a better job than traditional methods at identifying machine-generated texts.

Keywords

» Artificial intelligence  » Classification  » Feature extraction  » Fine tuning  » Lora  » Machine learning  » Natural language processing  » Nlp  » Text classification  » Transformer