Summary of Neural Precision Polarization: Simplifying Neural Network Inference with Dual-level Precision, by Dinithi Jayasuriya et al.
Neural Precision Polarization: Simplifying Neural Network Inference with Dual-Level Precision
by Dinithi Jayasuriya, Nastaran Darabi, Maeesha Binte Hashem, Amit Ranjan Trivedi
First submitted to arxiv on: 6 Nov 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary We present a novel approach to deep neural network (DNN) inference that optimizes precision levels for improved performance on edge devices. The proposed scheme, called Neural Precision Polarization (NPP), separates weights and activations into low-precision majority and high-precision targeted error compensation paths. This allows for distinct optimization of each precision level, reducing memory and computation demands without compromising model accuracy. NPP enables training a floating-point model in the cloud and then downloading it to an edge device, where weights and activations are quantized to meet the desired level (e.g., NF4 or INT8). To mitigate accuracy loss from quantization, surrogate paths are introduced using low-rank approximations on a layer-by-layer basis. These paths are trained with a sensitivity-based metric on minimal training data to recover accuracy loss due to quantization and process variability. Our results show that NPP achieves approximately 464 TOPS per Watt MAC efficiency and reliability by integrating rank-8 error recovery paths with highly efficient, though potentially unreliable, bit plane-wise compute-in-memory processing. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary We’ve developed a new way to make deep learning models work better on devices like smartphones or smart home devices. This approach is called Neural Precision Polarization (NPP). It’s like using different gears in a car – some parts of the model can use low-power, simple calculations, while others need more complex, high-powered calculations. By doing this, we can make the model work faster and more efficiently on edge devices without sacrificing its accuracy. We’ve also found a way to fix errors that happen when we simplify the model’s calculations. Our results show that NPP can do calculations about 464 times faster than usual while still being reliable. |
Keywords
» Artificial intelligence » Deep learning » Inference » Neural network » Optimization » Precision » Quantization