Summary of An Evaluation Of Explanation Methods For Black-box Detectors Of Machine-generated Text, by Loris Schoenegger et al.
An Evaluation of Explanation Methods for Black-Box Detectors of Machine-Generated Text
by Loris Schoenegger, Yuxi Xia, Benjamin Roth
First submitted to arxiv on: 26 Aug 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The increasing difficulty in distinguishing machine-generated text from human-written text has led to the development of detectors of machine-generated text (MGT). To understand why a detector made a prediction, explanation methods that estimate feature importance are used. However, the quality of these methods for this task has not been assessed before. This study evaluates the quality of explanations for MGT detectors using five automated experiments and a user study. The paper uses a dataset of ChatGPT-generated and human-written documents to pair predictions with SHAP, LIME, and Anchor explanations. It finds that SHAP performs best in terms of faithfulness, stability, and helping users predict detector behavior. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The increasing difficulty in distinguishing machine-generated text from human-written text has led to the development of detectors of machine-generated text (MGT). This makes it harder for computers to tell if a piece of writing is made by a human or a machine. To make things clearer, we need to know why these detectors are making certain predictions. One way to do this is by using explanation methods that show which parts of the input were used by the classifier. However, we don’t really know how good these methods are for this task. This study looks at how well different explanation methods work for detecting MGT. |