Loading Now

Summary of An Improved Empirical Fisher Approximation For Natural Gradient Descent, by Xiaodong Wu et al.


An Improved Empirical Fisher Approximation for Natural Gradient Descent

by Xiaodong Wu, Wenyi Yu, Chao Zhang, Philip Woodland

First submitted to arxiv on: 10 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Approximate Natural Gradient Descent (NGD) methods are crucial for deep learning models, using approximate Fisher information matrices to pre-condition gradients. The empirical Fisher (EF) method approximates the Fisher information matrix empirically during back-propagation. Despite its ease of implementation, EF has theoretical and practical limitations. This paper investigates the inversely-scaled projection issue of EF, a major cause of poor approximation quality. An improved empirical Fisher (iEF) method is proposed to address this issue, motivated by loss reduction and retaining EF’s practical convenience. iEF is evaluated experimentally using deep learning setups, showing strong convergence and generalisation when applied directly as an optimiser. Compared to EF and sampled Fisher methods, iEF demonstrates better approximation quality and robustness to damping choices. The proposed method improves existing approximate NGD optimisers and serves as a better approximation to the Fisher information matrix itself.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making a type of computer program (called an “optimiser”) work better for deep learning models. Deep learning models are used in things like image recognition and language processing. The optimiser uses approximate information to help it make good decisions during training. There’s a problem with this method, which makes it not very good at its job. The researchers in this paper found the cause of the problem and created a new way to do the same thing, but better. They tested their new method and it worked well. This means that people can use deep learning models more effectively now.

Keywords

» Artificial intelligence  » Deep learning  » Gradient descent