Summary of Investigating Sparsity in Recurrent Neural Networks, by Harshil Darji
Investigating Sparsity in Recurrent Neural Networks
by Harshil Darji
First submitted to arxiv on: 30 Jul 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A recent advancement in neural networks has led to the development of more complex architectures such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). While CNNs excel in tasks where sequence is not important, RNNs are better suited for tasks that rely on order, like machine translation. The increase in layers improves performance but also increases complexity, making training more time-consuming. To mitigate this issue, researchers have introduced sparsity in neural network architecture. Pruning is one method to achieve this by clipping weights below a certain threshold while maintaining performance. Another approach involves generating arbitrary structures using random graphs and embedding them between input and output layers. Despite pruning being well-studied for CNNs, there is a lack of research on RNNs. This thesis investigates the effects of pruning and creating sparse architectures in RNNs. The study describes pruning’s impact on RNN performance and training epochs required to regain accuracy. It also explores the creation and training of Sparse Recurrent Neural Networks, examining the relationship between performance and graph properties. The experiments are conducted on various RNN variants, including Tanh nonlinearity (RNN-Tanh), ReLU nonlinearity (RNN-ReLU), GRU, and LSTM. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper explores how to make recurrent neural networks (RNNs) more efficient by introducing sparsity in their architecture. This is done using two techniques: pruning and creating sparse architectures. Pruning involves removing unimportant weights from the network while maintaining its performance. Creating sparse architectures involves generating random graphs and embedding them between the input and output layers. The study focuses on RNNs, which are important for tasks like machine translation, where the order of sequence matters. While some research has been done on pruning CNNs, there is a lack of research on pruning RNNs or creating sparse RNN architectures. This thesis investigates how these techniques affect the performance of RNNs and explores their relationship with the graph properties of the underlying architecture. |
Keywords
» Artificial intelligence » Embedding » Lstm » Neural network » Pruning » Relu » Rnn » Tanh » Translation