Loading Now

Summary of Online Detecting Llm-generated Texts Via Sequential Hypothesis Testing by Betting, By Can Chen et al.


Online Detecting LLM-Generated Texts via Sequential Hypothesis Testing by Betting

by Can Chen, Jun-Kun Wang

First submitted to arxiv on: 29 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A newly developed algorithm for distinguishing between machine-generated texts and human-written texts is presented in this paper. The existing methods are mainly focused on offline settings, where a dataset containing both real and machine-generated texts is provided. However, in many practical scenarios, content is published online in a streaming fashion, making it crucial to develop an effective method for detecting whether the source is a large language model (LLM) or a human with strong statistical guarantees. The proposed algorithm uses sequential hypothesis testing by betting, which not only builds upon existing offline detection techniques but also enjoys controlled false positive rates and expected times to correctly identify sources as LLMS. Experimental results demonstrate the effectiveness of this method.
Low GrooveSquid.com (original content) Low Difficulty Summary
A new way is found to tell apart machine-made writing from human-written text. This happens in a special setting where text comes out one piece at a time, like on news websites or social media. Right now, most methods work when you have a big collection of texts beforehand. But we need something that works well online so that people can trust what they read and not spread false information. To solve this problem, an algorithm is created using special math techniques. This method is good at finding out if the writer is a computer or a person, with some guarantees about how often it makes mistakes. The results show that this new way works well.

Keywords

» Artificial intelligence  » Large language model