Summary of Puma: Margin-based Data Pruning, by Javier Maroto and Pascal Frossard
PUMA: margin-based data pruning
by Javier Maroto, Pascal Frossard
First submitted to arxiv on: 10 May 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The authors investigate ways to reduce the amount of training data required for deep learning models while maintaining their ability to withstand adversarial perturbations. Specifically, they focus on data pruning techniques that eliminate training samples based on their distance to the classification boundary (margin). The authors find that existing approaches can actually decrease robustness when combined with synthetic data and propose a new strategy called PUMA that computes margins using DeepFool and prunes high-margin samples while adjusting the attack norm for low-margin samples. This approach achieves similar robustness as state-of-the-art methods but with improved accuracy, enhancing the performance trade-off. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Deep learning models can do many things better than humans, like classifying images or recognizing speech. But they’re not perfect – they can be tricked by special kinds of fake data called adversarial perturbations. To make them more robust to these fake data, we need to train them on a lot of data that’s been made into artificial versions using techniques like diffusion models. The researchers in this paper want to find ways to reduce the amount of data needed for this training while keeping the model as accurate and robust as possible. They do this by removing some training samples based on how close they are to the model’s decision boundary. This technique is called data pruning. The team finds that existing methods can actually make things worse when combined with artificial data, so they propose a new approach called PUMA that adjusts the way it removes data and makes the model more accurate while keeping it robust. |
Keywords
» Artificial intelligence » Classification » Deep learning » Pruning » Synthetic data