Summary of Humanizing the Machine: Proxy Attacks to Mislead Llm Detectors, by Tianchun Wang et al.

Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors

by Tianchun Wang, Yuanzhou Chen, Zichuan Liu, Zhanwen Chen, Haifeng Chen, Xiang Zhang, Wei Cheng

First submitted to arxiv on: 25 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper presents a novel attack strategy for large language models (LLMs), showcasing its potential to bypass existing detectors and generate human-like writing. The proxy-attack method leverages a reinforcement learning (RL) fine-tuned small language model (SLM) in the decoding phase, allowing it to produce responses that are indistinguishable from human-written text. The strategy is tested on extensive datasets using open-source models like Llama2-13B, Llama3-70B, and Mixtral-8*7B in white-box and black-box settings. Results show a significant average AUROC drop of 70.4% across multiple datasets, with a maximum drop of 90.3%. The strategy also bypasses detectors in cross-discipline and cross-language scenarios, achieving relative decreases of up to 90.9% and 91.3%, respectively. Notably, the generation quality of the attacked models remains preserved within a modest utility budget.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper talks about a way to trick machines that are meant to spot fake writing generated by large language models. The researchers created an attack strategy that can make these machines think that human-written text is actually machine-generated. They tested this strategy on different datasets and found that it was very successful, making the machines incorrect up to 90% of the time. This means that attackers could use this method to create fake writing that seems real. However, surprisingly, the quality of the attacked writing remains good.

Keywords

* Artificial intelligence * Language model * Reinforcement learning

Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors

by Tianchun Wang, Yuanzhou Chen, Zichuan Liu, Zhanwen Chen, Haifeng Chen, Xiang Zhang, Wei Cheng

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Hierarchical Mixture Of Experts: Generalizable Learning For High-level Synthesis, by Weikai Li et al.

Summary of Enhancing Exchange Rate Forecasting with Explainable Deep Learning Models, by Shuchen Meng et al.

Related Posts