Summary of Rkadiyala at Semeval-2024 Task 8: Black-box Word-level Text Boundary Detection in Partially Machine Generated Texts, by Ram Mohan Rao Kadiyala
RKadiyala at SemEval-2024 Task 8: Black-Box Word-Level Text Boundary Detection in Partially Machine Generated Texts
by Ram Mohan Rao Kadiyala
First submitted to arxiv on: 22 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper addresses the challenge of identifying human-written versus machine-generated texts, particularly at the sentence or paragraph level. Existing approaches often focus on binary classification (entirely human vs entirely machine) and perform well only for specific domains and generators. The authors propose reliable methods to detect which parts of a text are machine generated at a word level, comparing results across different approaches and methods. They evaluate their model’s performance on unseen domains’ and generators’ texts, achieving significant improvements in detection accuracy. The findings also highlight implications for detecting machine-generated outputs from Instruct variants of large language models (LLMs). The authors discuss potential avenues for improvement, emphasizing the importance of reliable text generation attribution. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper is about identifying whether a text was written by a person or a computer. Right now, most systems can only tell if a whole text is human-written or machine-generated. But what if you want to know which part of a text was written by a computer? The authors developed ways to do this at the word level and compared them with other methods. They tested their model on texts from different domains and generators and found that it performed much better than existing systems. This research has important implications for detecting when large language models (LLMs) are generating text, which is becoming more common. |
Keywords
* Artificial intelligence * Classification * Text generation