Loading Now

Summary of Slicing Mutual Information Generalization Bounds For Neural Networks, by Kimia Nadjahi et al.


Slicing Mutual Information Generalization Bounds for Neural Networks

by Kimia Nadjahi, Kristjan Greenewald, Rickard Brüel Gabrielsson, Justin Solomon

First submitted to arxiv on: 6 Jun 2024

Categories

  • Main: Machine Learning (stat.ML)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes new information-theoretic generalization bounds tailored for machine learning algorithms that operate by slicing the parameter space. These bounds are tighter than standard mutual information (MI) bounds and rely on scalable alternative measures of dependence, such as disintegrated mutual information and k-sliced mutual information. The authors demonstrate that this approach improves generalization and offers significant computational and statistical advantages. They also extend their analysis to algorithms whose parameters do not need to exactly lie on random subspaces, leveraging rate-distortion theory to incorporate a distortion term measuring model compressibility under slicing. This allows for tighter bounds without compromising performance or requiring model compression. The authors propose a regularization scheme enabling practitioners to control generalization through compressibility and empirically validate their results, achieving the computation of non-vacuous information-theoretic generalization bounds for neural networks.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper shows how machine learning algorithms can be improved by using a new way to measure how well they generalize. Generalization is when an algorithm works well on new data it has never seen before. The authors use a concept called “information theory” to create new, tighter limits on how well an algorithm will do. They also show that their approach can help algorithms work better even if the parameters don’t exactly lie on random subspaces. This means that practitioners can control how well an algorithm generalizes by making it more compressible. The authors tested their ideas and were able to compute non-vacuous information-theoretic generalization bounds for neural networks, which was previously impossible.

Keywords

» Artificial intelligence  » Generalization  » Machine learning  » Model compression  » Regularization