Summary of Understanding the Effectiveness Of Lossy Compression in Machine Learning Training Sets, by Robert Underwood et al.
Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets
by Robert Underwood, Jon C. Calhoun, Sheng Di, Franck Cappello
First submitted to arxiv on: 23 Mar 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
| Summary difficulty | Written by | Summary |
|---|---|---|
| High | Paper authors | High Difficulty Summary Read the original abstract here |
| Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Learning and Artificial Intelligence (ML/AI) techniques are increasingly used in high-performance computing (HPC), but they require vast amounts of floating-point data for training and validation. This data needs to be shared on a wide-area network (WAN) or transferred from edge devices to data centers, posing significant challenges. One potential solution is data compression, which can reduce the amount of data needed to train ML/AI models. However, it’s essential to understand how lossy compression affects model quality. Previous studies have typically focused on a single application or compression method. In this paper, we developed a systematic methodology for evaluating data reduction techniques for ML/AI and applied it to 17 methods across 7 applications. Our results show that modern lossy compression methods can achieve a 50-100x compression ratio improvement with only a 1% or less loss in quality. We also identify critical insights that guide the future use and design of lossy compressors for ML/AI. |
| Low | GrooveSquid.com (original content) | Low Difficulty Summary Artificial Intelligence (AI) and machine learning (ML) are becoming more important in supercomputers. But to train these AI models, we need huge amounts of data. This data needs to be shared or transferred quickly, which can be a problem. One way to solve this is by compressing the data, but we need to know how this affects the quality of the AI model. Previous studies have looked at one type of compression method for one specific use. In this study, we developed a new way to test different compression methods and applied it to many different types of AI models. We found that modern compression methods can reduce the data by 50-100 times with only a small loss in quality. This helps us understand how to make better compression methods for ML/AI. |
Keywords
* Artificial intelligence * Machine learning




