Summary of Gansemble For Small and Imbalanced Data Sets: a Baseline For Synthetic Microplastics Data, by Daniel Platnick et al.
GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic Microplastics Data
by Daniel Platnick, Sourena Khanzadeh, Alireza Sadeghian, Richard Anthony Valenzano
First submitted to arxiv on: 10 Apr 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel deep learning framework called GANsemble is proposed to overcome the challenges posed by limited and imbalanced data in understanding the potential harms of microplastic particle ingestion or inhalation. The framework combines data augmentation with conditional generative adversarial networks (cGANs) to generate class-conditioned synthetic data. The two-module framework consists of a data chooser module that automates the selection of the best data augmentation strategy and a cGAN module that uses this strategy to train a cGAN for generating enhanced synthetic data. The GANsemble framework is experimentally evaluated on a small and imbalanced microplastics data set, with a Microplastic-cGAN (MPcGAN) algorithm introduced and baselines established in terms of Frechet Inception Distance (FID) and Inception Scores (IS). Additionally, a synthetic microplastics filter (SYMP-Filter) algorithm is presented to increase the quality of generated synthetic microplastics data. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Microplastic particles are tiny plastic pieces that can be ingested or inhaled by humans. Scientists need more data to understand their potential harm. Unfortunately, current machine learning methods struggle with limited and imbalanced data. This paper proposes a new way to generate synthetic data using deep learning techniques. The method combines two steps: selecting the best augmentation strategy and generating synthetic data. The researchers tested this method on a small microplastics dataset and introduced a new algorithm called MPcGAN. They also set baselines for evaluating the quality of generated synthetic data. |
Keywords
* Artificial intelligence * Data augmentation * Deep learning * Machine learning * Synthetic data