Summary of Exploring the Impact Of Dataset Bias on Dataset Distillation, by Yao Lu et al.
Exploring the Impact of Dataset Bias on Dataset Distillation
by Yao Lu, Jianyang Gu, Xuguang Chen, Saeed Vahidian, Qi Xuan
First submitted to arxiv on: 24 Mar 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a new direction in dataset distillation (DD), which synthesizes smaller datasets that preserve essential information from the original large-scale ones. The authors investigate how biases in these original datasets impact DD, as current methods assume unbiased data. They construct two biased datasets and use existing DD methods to generate synthetic datasets on these datasets. The results show that biases significantly affect synthetic dataset performance, highlighting the need to identify and mitigate biases during DD. The paper reformulates DD within a biased dataset context. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research focuses on making large-scale datasets more manageable by creating smaller versions that capture important information. The scientists studied how problems with these original datasets can affect this process. They created two datasets with intentional errors and used existing methods to make new, smaller datasets based on these flawed datasets. The results show that the issues in the original data greatly impact the quality of the synthetic datasets. This highlights the importance of fixing flaws in the original data before creating smaller versions. |
Keywords
* Artificial intelligence * Distillation