Loading Now

Summary of Contributing Dimension Structure Of Deep Feature For Coreset Selection, by Zhijing Wan et al.


Contributing Dimension Structure of Deep Feature for Coreset Selection

by Zhijing Wan, Zhixiang Wang, Yuran Wang, Zheng Wang, Hongyuan Zhu, Shin’ichi Satoh

First submitted to arxiv on: 29 Jan 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Databases (cs.DB)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
In this paper, researchers explore a crucial aspect of deep learning: coreset selection. They investigate how to efficiently choose a subset of training samples that accurately represents the original dataset while preventing overfitting. The team focuses on two key aspects: representation and diversity. Currently, existing methods measure these factors using similarity metrics like L2-norm, but they fall short in capturing true diversity. To address this limitation, the researchers propose a feature-based diversity constraint, introducing a novel Contributing Dimension Structure (CDS) metric that considers both redundancy reduction and significant dimension differences. They show that existing methods favor similar CDS, leading to reduced variety and hindering model performance. By integrating their proposed method with five classical selection techniques, they demonstrate improved results on three datasets.
Low GrooveSquid.com (original content) Low Difficulty Summary
In this paper, scientists work on making deep learning more efficient. They want to find the most important training samples for a computer to learn from. To do this, they look at two things: how well the sample represents the whole dataset and how different it is from other samples. Right now, people use special math formulas to measure these things, but they’re not very good at finding diverse samples. The scientists have an idea to make it better by looking at features (or characteristics) of the data. They come up with a new way to measure diversity that’s more accurate. By using this new method with some old ones, they show that it can help computers learn better.

Keywords

* Artificial intelligence  * Deep learning  * Overfitting