Summary of Flatenn: Train Flat For Enhanced Fault Tolerance Of Quantized Deep Neural Networks, by Akul Malhotra and Sumeet Kumar Gupta

FlatENN: Train Flat for Enhanced Fault Tolerance of Quantized Deep Neural Networks

by Akul Malhotra, Sumeet Kumar Gupta

First submitted to arxiv on: 29 Dec 2022

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Model compression via quantization and sparsity enhancement has gained significant interest to enable deployment of deep neural networks (DNNs) in resource-constrained edge environments. Although these techniques have shown promising results in reducing energy, latency, and memory requirements, their performance in non-ideal real-world settings (e.g., hardware faults) is yet to be fully understood. This paper investigates the impact of bit-flip and stuck-at faults on activation-sparse quantized DNNs (QDNNs). Results show that high levels of activation sparsity come at the cost of larger vulnerability to faults, with accuracy drops up to 17.32%. The study also identifies sharper minima in loss landscapes for activation-sparse QDNNs as a major cause of degraded accuracy due to fault perturbations. To mitigate this impact, the paper proposes a sharpness-aware quantization (SAQ) training scheme. Activation-sparse and standard QDNNs trained with SAQ show up to 36.71% and 24.76% higher inference accuracy, respectively, compared to conventionally trained equivalents. Moreover, SAQ-trained activation-sparse QDNNs exhibit better accuracy in faulty settings than standard QDNNs. The proposed technique can achieve sparsity-related energy/latency benefits without compromising on fault tolerance.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Deep neural networks (DNNs) are used for many important tasks like image recognition and speech understanding. To make them work faster and use less power, researchers have been trying to shrink or compress these DNNs. One way they do this is by making some parts of the network sparse, which means removing extra connections between neurons. Another technique is called quantization, which reduces the number of calculations needed to run the network. When these techniques are used together, they can greatly reduce the energy and latency required to run the DNNs. However, when there are mistakes in the hardware (like faulty chips), the compressed networks might not work as well as expected. In this paper, researchers studied how these compression techniques work when the hardware is faulty. They found that making some parts of the network sparse actually makes it more vulnerable to faults and reduces its accuracy. To fix this problem, they proposed a new training method called sharpness-aware quantization (SAQ). This method can help maintain high accuracy even when the hardware is faulty.

Keywords

* Artificial intelligence * Inference * Model compression * Quantization

FlatENN: Train Flat for Enhanced Fault Tolerance of Quantized Deep Neural Networks

by Akul Malhotra, Sumeet Kumar Gupta

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of On Machine Learning Knowledge Representation in the Form Of Partially Unitary Operator. Knowledge Generalizing Operator, by Vladislav Gennadievich Malyshkin

Summary of Difformer: Scalable (graph) Transformers Induced by Energy Constrained Diffusion, By Qitian Wu et al.

Related Posts