Summary of On the Numerical Reliability Of Nonsmooth Autodiff: a Maxpool Case Study, by Ryan Boustany (tse-r)
On the numerical reliability of nonsmooth autodiff: a MaxPool case study
by Ryan Boustany
First submitted to arxiv on: 5 Jan 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the reliability of automatic differentiation (AD) for neural networks involving non-smooth operations like MaxPool. The authors explore how different precision levels and convolutional architectures affect AD’s performance on various datasets. While AD is generally accurate, it can be numerically incorrect in certain subsets, including a bifurcation zone where AD is wrong over real numbers and a compensation zone where AD is correct over reals but wrong over floating-point numbers. The study finds that using the nonsmooth Jacobian for MaxPool with lower norms helps maintain stable test accuracy, whereas higher norm Jacobians can lead to instability and decreased performance. Batch normalization, Adam-like optimizers, or increasing precision levels can also mitigate the impact of MaxPool’s nonsmooth Jacobians on learning. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how well a special math technique called automatic differentiation works for some types of computer programs called neural networks. Neural networks are used to recognize things in pictures and hear what people say. Automatic differentiation is like a shortcut that makes it easier to calculate something important, but sometimes it doesn’t work perfectly. The researchers wanted to know why this happens and how it affects the way the program learns from data. They found that using certain techniques or increasing the precision of their calculations can make automatic differentiation work better. This could help make neural networks even more powerful. |
Keywords
* Artificial intelligence * Batch normalization * Precision