Loading Now

Summary of A Robust Autoencoder Ensemble-based Approach For Anomaly Detection in Text, by Jeremie Pantin and Christophe Marsala


A Robust Autoencoder Ensemble-Based Approach for Anomaly Detection in Text

by Jeremie Pantin, Christophe Marsala

First submitted to arxiv on: 16 May 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed paper tackles anomaly detection in text data, an emerging domain with significant potential applications. Building upon self-supervised methods with self-attention mechanisms, the authors introduce two primary contributions: contextual anomaly contamination and a novel ensemble-based approach. The first innovation, Textual Anomaly Contamination (TAC), allows for contaminating inlier classes with either independent or contextual anomalies, filling a gap in the existing literature. The second contribution is RoSAE, a Robust Subspace Local Recovery Autoencoder Ensemble, which presents different latent representations through local manifold learning. Experimental results demonstrate that the proposed approach outperforms recent works on both types of anomalies and exhibits increased robustness. Additionally, the authors provide an 8-dataset comparison, extending beyond the traditional Reuters and 20 Newsgroups corpora.
Low GrooveSquid.com (original content) Low Difficulty Summary
Anomaly detection in text data is a growing field with many potential applications. Researchers have been using self-supervised methods to find unusual patterns in text data. In this paper, the authors introduce two new ideas: contamination and an ensemble-based approach. Contamination means adding unusual patterns to normal text to see how well algorithms can detect them. The authors propose a method called Textual Anomaly Contamination (TAC) that adds these unusual patterns to text. They also suggest a new type of algorithm called RoSAE, which uses multiple small models to find unusual patterns in text. The results show that their approach is better than previous methods at finding unusual patterns and is more robust. The authors also compare their method on 8 different datasets, showing its effectiveness.

Keywords

» Artificial intelligence  » Anomaly detection  » Autoencoder  » Manifold learning  » Self attention  » Self supervised