Loading Now

Summary of Measuring Sharpness in Grokking, by Jack Miller et al.


Measuring Sharpness in Grokking

by Jack Miller, Patrick Gleeson, Charles O’Neill, Thang Bui, Noam Levi

First submitted to arxiv on: 14 Feb 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This workshop paper introduces a robust technique for measuring neural network grokking, a phenomenon where networks achieve perfect or near-perfect performance on a validation set after already performing well on the training set. The authors use this method to investigate transitions in training and validation accuracy under two settings: a theoretical framework developed by Levi et al. (2023) and a two-layer MLP trained to predict parity of bits, with grokking induced by the concealment strategy of Miller et al. (2023). They find that trends between relative grokking gap and sharpness are similar in both settings when using absolute and relative measures of sharpness.
Low GrooveSquid.com (original content) Low Difficulty Summary
Neural networks sometimes get really good at doing something after they’ve already learned it. This is called “grokking”. Researchers want to understand how this happens. They came up with a way to measure grokking and used it to study two different types of neural networks. One type was made up of simple math formulas, while the other was trained to predict whether numbers were even or odd. Surprisingly, they found that both types of networks showed similar patterns when they got really good at doing things. This helps us understand what makes neural networks so smart sometimes.

Keywords

* Artificial intelligence  * Neural network