Summary of Concurrent Training and Layer Pruning Of Deep Neural Networks, by Valentin Frank Ingmar Guenter and Athanasios Sideris
Concurrent Training and Layer Pruning of Deep Neural Networks
by Valentin Frank Ingmar Guenter, Athanasios Sideris
First submitted to arxiv on: 6 Jun 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed algorithm is a novel approach to reducing the computational complexity of neural networks by identifying and eliminating irrelevant layers during early stages of training. Unlike traditional weight or filter-level pruning, this method targets layer pruning, which reduces sequential computation and allows for more efficient parallelization. The algorithm utilizes residual connections around nonlinear network sections to enable information flow after a layer is pruned. Built upon variational inference principles with Gaussian scale mixture priors on neural network weights, the approach enables significant cost savings during training and inference. This method learns the variational posterior distribution of scalar Bernoulli random variables multiplying layer weight matrices of nonlinear sections, similar to adaptive layer-wise dropout. To address challenges like premature pruning and lack of robustness, a “flattening” hyper-prior is adopted on prior parameters. The algorithm’s optimization problem is formulated using projected SGD and proven to converge to deterministic networks with posterior distribution at 0 or 1. Practical pruning conditions are derived from theoretical results. Evaluated on MNIST, CIFAR-10, ImageNet, LeNet, VGG16, and ResNet architectures, the proposed method achieves state-of-the-art performance for layer pruning at reduced computational cost. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The researchers developed a new way to make neural networks more efficient by removing unnecessary parts during training. They focused on “layers” rather than individual weights or filters, which helps speed up computations that are harder to parallelize. To achieve this, they used special connections and mathematical techniques inspired by variational inference. This approach not only saves time but also improves performance compared to existing methods. |
Keywords
* Artificial intelligence * Dropout * Inference * Neural network * Optimization * Pruning * Resnet