Loading Now

Summary of Prioritizing Informative Features and Examples For Deep Learning From Noisy Data, by Dongmin Park


Prioritizing Informative Features and Examples for Deep Learning from Noisy Data

by Dongmin Park

First submitted to arxiv on: 27 Feb 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This dissertation proposes a systemic framework to enhance each stage of the development process by prioritizing informative features and examples. The approach improves feature learning, data labeling, and data selection by extracting informative features from auxiliary out-of-distribution data and deactivating noise features in the target distribution. To solve the purity-information dilemma, a meta-model finds the best balance between purity and informativeness. The framework also includes approaches to prioritize informative examples from unlabeled noisy data and labeled noisy data, with applications in labeled image noise data and labeled text noise data. Notably, this work enhances the performance of state-of-the-art Re-labeling models and aligned large language models.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research aims to make it easier to develop new ideas by using more helpful features and examples. The approach helps us learn better from our mistakes and avoid getting stuck with bad data. It also finds the right balance between getting accurate results and learning quickly. This work has practical applications in image recognition and text analysis, making it possible for machines to understand and generate human-like language.

Keywords

* Artificial intelligence  * Data labeling