Summary of Filo: Zero-shot Anomaly Detection by Fine-grained Description and High-quality Localization, By Zhaopeng Gu et al.
FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization
by Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Hao Li, Ming Tang, Jinqiao Wang
First submitted to arxiv on: 21 Apr 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed FiLo method is a novel zero-shot anomaly detection (ZSAD) approach that addresses the limitations of existing methods. It consists of two components: Fine-Grained Description (FG-Des) and High-Quality Localization (HQ-Loc). FG-Des uses Large Language Models (LLMs) to introduce fine-grained anomaly descriptions for each category, enhancing accuracy and interpretability. HQ-Loc utilizes Grounding DINO for preliminary localization, position-enhanced text prompts, and Multi-scale Multi-shape Cross-modal Interaction (MMCI) module for accurate localization of anomalies with different sizes and shapes. The method achieves state-of-the-art performance on datasets like MVTec and VisA, with an image-level AUC of 83.9% and a pixel-level AUC of 95.9% on the VisA dataset. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary FiLo is a new way to find unusual things in images without needing any normal or abnormal pictures as examples. Right now, most methods rely on really good machine learning models that can understand many different types of things, like text and images. These models help find unusual things by comparing the features of what’s normal and what’s not. But this doesn’t work well when the description of something being unusual is very general and doesn’t match the kinds of unusual things we’re looking for. FiLo fixes this problem by creating more detailed descriptions of what’s usual or unusual for each type of thing, using large language models to help understand what makes them different. It also uses a special way to find where these unusual things are in an image, which is really good at finding even small or big unusual things. |
Keywords
» Artificial intelligence » Anomaly detection » Auc » Grounding » Machine learning » Zero shot