Summary of Forecasting Gpu Performance For Deep Learning Training and Inference, by Seonho Lee et al.
Forecasting GPU Performance for Deep Learning Training and Inference
by Seonho Lee, Amar Phanishayee, Divya Mahajan
First submitted to arxiv on: 18 Jul 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Performance (cs.PF)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Deep learning kernels are designed to take advantage of graphics processing units’ (GPUs) parallel architecture. As deep learning models and GPUs evolve, it’s crucial to predict how new model architectures will perform on existing or new GPUs without requiring actual execution. To address this challenge, we introduce NeuSight, a framework that leverages GPU hardware behavior and software library optimizations to estimate end-to-end performance for various deep learning models during both training and inference. Unlike previous approaches that use regression models or multilayer perceptrons, NeuSight decomposes the prediction problem into smaller working sets called tiles, which are executed independently on the GPU. This approach reduces the error percentage from 121.4% to 2.3% when predicting the latency of GPT3 model for training and inference on H100 GPUs. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary NeuSight is a new framework that helps predict how deep learning models will work on different graphics processing units (GPUs) without actually testing them. This can be important because new GPU models are being developed all the time, and it’s not always possible to test every model on every GPU. The framework uses information about how GPUs work and how software is optimized for them to make predictions. It breaks down complex problems into smaller parts that can be solved more easily. NeuSight does a better job than previous approaches at predicting how long it will take for certain deep learning models to run on new GPUs. |
Keywords
» Artificial intelligence » Deep learning » Inference » Regression