Summary of An Improved Empirical Fisher Approximation For Natural Gradient Descent, by Xiaodong Wu et al.
An Improved Empirical Fisher Approximation for Natural Gradient Descent
by Xiaodong Wu, Wenyi Yu, Chao Zhang, Philip Woodland
First submitted to arxiv on: 10 Jun 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Approximate Natural Gradient Descent (NGD) methods are crucial for deep learning models, using approximate Fisher information matrices to pre-condition gradients. The empirical Fisher (EF) method approximates the Fisher information matrix empirically during back-propagation. Despite its ease of implementation, EF has theoretical and practical limitations. This paper investigates the inversely-scaled projection issue of EF, a major cause of poor approximation quality. An improved empirical Fisher (iEF) method is proposed to address this issue, motivated by loss reduction and retaining EF’s practical convenience. iEF is evaluated experimentally using deep learning setups, showing strong convergence and generalisation when applied directly as an optimiser. Compared to EF and sampled Fisher methods, iEF demonstrates better approximation quality and robustness to damping choices. The proposed method improves existing approximate NGD optimisers and serves as a better approximation to the Fisher information matrix itself. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about making a type of computer program (called an “optimiser”) work better for deep learning models. Deep learning models are used in things like image recognition and language processing. The optimiser uses approximate information to help it make good decisions during training. There’s a problem with this method, which makes it not very good at its job. The researchers in this paper found the cause of the problem and created a new way to do the same thing, but better. They tested their new method and it worked well. This means that people can use deep learning models more effectively now. |
Keywords
» Artificial intelligence » Deep learning » Gradient descent