Loading Now

Summary of Mitigating Paraphrase Attacks on Machine-text Detectors Via Paraphrase Inversion, by Rafael Rivera Soto et al.


Mitigating Paraphrase Attacks on Machine-Text Detectors via Paraphrase Inversion

by Rafael Rivera Soto, Barry Chen, Nicholas Andrews

First submitted to arxiv on: 29 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes a novel problem called paraphrase inversion, where given paraphrased text, the goal is to recover an approximation of the original text. This is motivated by the fact that paraphrasing attacks can significantly degrade the performance of machine-text detectors. To tackle this challenge, the authors frame the problem as translation from paraphrased text back to the original text, requiring examples of texts and corresponding paraphrases to train the inversion model. The training data can be easily generated using a corpus of original texts and one or more paraphrasing models. Language models like GPT-4 and Llama-3 are found to exhibit biases when paraphrasing, which an inversion model can learn with a modest amount of data. Notably, these models generalize well, including to paraphrase models unseen at training time. When combined with a paraphrased-text detector, the inversion models provide an effective defense against paraphrasing attacks, yielding an average improvement of +22% AUROC across seven machine-text detectors and three different domains.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper talks about making machines better at detecting fake text by figuring out how to reverse-engineer the original text from a rewritten version. This is important because right now, these machine-detecting systems are easily fooled by people who know how to make their text look like it was written by a machine. The researchers came up with an innovative way to solve this problem by treating it as a translation task – essentially trying to translate the rewritten text back into its original form. They used existing language models and found that they could learn from them and improve at reversing engineered text. This new approach can help detect fake text more accurately.

Keywords

» Artificial intelligence  » Gpt  » Llama  » Translation