Summary of Understanding the Effectiveness Of Lossy Compression in Machine Learning Training Sets, by Robert Underwood et al.

Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets

by Robert Underwood, Jon C. Calhoun, Sheng Di, Franck Cappello

First submitted to arxiv on: 23 Mar 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Learning and Artificial Intelligence (ML/AI) techniques are increasingly used in high-performance computing (HPC), but they require vast amounts of floating-point data for training and validation. This data needs to be shared on a wide-area network (WAN) or transferred from edge devices to data centers, posing significant challenges. One potential solution is data compression, which can reduce the amount of data needed to train ML/AI models. However, it’s essential to understand how lossy compression affects model quality. Previous studies have typically focused on a single application or compression method. In this paper, we developed a systematic methodology for evaluating data reduction techniques for ML/AI and applied it to 17 methods across 7 applications. Our results show that modern lossy compression methods can achieve a 50-100x compression ratio improvement with only a 1% or less loss in quality. We also identify critical insights that guide the future use and design of lossy compressors for ML/AI.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Artificial Intelligence (AI) and machine learning (ML) are becoming more important in supercomputers. But to train these AI models, we need huge amounts of data. This data needs to be shared or transferred quickly, which can be a problem. One way to solve this is by compressing the data, but we need to know how this affects the quality of the AI model. Previous studies have looked at one type of compression method for one specific use. In this study, we developed a new way to test different compression methods and applied it to many different types of AI models. We found that modern compression methods can reduce the data by 50-100 times with only a small loss in quality. This helps us understand how to make better compression methods for ML/AI.

Keywords

* Artificial intelligence * Machine learning

Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets

by Robert Underwood, Jon C. Calhoun, Sheng Di, Franck Cappello

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Sample and Communication Efficient Fully Decentralized Marl Policy Evaluation Via a New Approach: Local Td Update, by Fnu Hairi et al.

Summary of Detection Of Problem Gambling with Less Features Using Machine Learning Methods, by Yang Jiao et al.

Related Posts