Loading Now

Summary of Noisy Ostracods: a Fine-grained, Imbalanced Real-world Dataset For Benchmarking Robust Machine Learning and Label Correction Methods, by Jiamian Hu et al.


Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction Methods

by Jiamian Hu, Yuanyuan Hong, Yihua Chen, He Wang, Moriaki Yasuhara

First submitted to arxiv on: 3 Dec 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The Noisy Ostracods dataset is a novel contribution to the field of genus and species classification of crustacean ostracods, with specialists’ annotations. The dataset consists of 71466 specimens, with an estimated 5.58% being noisy at the genus level. This noise can be attributed to multiple sources, including open-set noise and pseudo-classes created during curation. The dataset is highly imbalanced, with an imbalance factor of 22429, presenting a unique challenge for robust machine learning methods. Initial experiments using current robust learning techniques have not yielded significant performance improvements on the Noisy Ostracods dataset compared to cross-entropy training on the raw, noisy data.
Low GrooveSquid.com (original content) Low Difficulty Summary
The researchers created a new dataset called Noisy Ostracods that can help scientists learn how to better identify different types of crustaceans. This dataset has some tricky features, like pictures of animals that don’t belong in the usual categories. The dataset also includes many more examples of some species than others, which makes it harder for computers to learn from it. So far, the best way to deal with this noise is just using simple methods and ignoring the bad information.

Keywords

* Artificial intelligence  * Classification  * Cross entropy  * Machine learning