Summary of Puma: Margin-based Data Pruning, by Javier Maroto and Pascal Frossard

PUMA: margin-based data pruning

by Javier Maroto, Pascal Frossard

First submitted to arxiv on: 10 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The authors investigate ways to reduce the amount of training data required for deep learning models while maintaining their ability to withstand adversarial perturbations. Specifically, they focus on data pruning techniques that eliminate training samples based on their distance to the classification boundary (margin). The authors find that existing approaches can actually decrease robustness when combined with synthetic data and propose a new strategy called PUMA that computes margins using DeepFool and prunes high-margin samples while adjusting the attack norm for low-margin samples. This approach achieves similar robustness as state-of-the-art methods but with improved accuracy, enhancing the performance trade-off.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Deep learning models can do many things better than humans, like classifying images or recognizing speech. But they’re not perfect – they can be tricked by special kinds of fake data called adversarial perturbations. To make them more robust to these fake data, we need to train them on a lot of data that’s been made into artificial versions using techniques like diffusion models. The researchers in this paper want to find ways to reduce the amount of data needed for this training while keeping the model as accurate and robust as possible. They do this by removing some training samples based on how close they are to the model’s decision boundary. This technique is called data pruning. The team finds that existing methods can actually make things worse when combined with artificial data, so they propose a new approach called PUMA that adjusts the way it removes data and makes the model more accurate while keeping it robust.

Keywords

» Artificial intelligence » Classification » Deep learning » Pruning » Synthetic data

PUMA: margin-based data pruning

by Javier Maroto, Pascal Frossard

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Deep Learning-based Residual Useful Lifetime Prediction For Assets with Uncertain Failure Modes, by Yuqi Su et al.

Summary of Dp-dylora: Fine-tuning Transformer-based Models On-device Under Differentially Private Federated Learning Using Dynamic Low-rank Adaptation, by Jie Xu et al.

Related Posts