Summary of Slicing Mutual Information Generalization Bounds For Neural Networks, by Kimia Nadjahi et al.
Slicing Mutual Information Generalization Bounds for Neural Networks
by Kimia Nadjahi, Kristjan Greenewald, Rickard Brüel Gabrielsson, Justin Solomon
First submitted to arxiv on: 6 Jun 2024
Categories
- Main: Machine Learning (stat.ML)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes new information-theoretic generalization bounds tailored for machine learning algorithms that operate by slicing the parameter space. These bounds are tighter than standard mutual information (MI) bounds and rely on scalable alternative measures of dependence, such as disintegrated mutual information and k-sliced mutual information. The authors demonstrate that this approach improves generalization and offers significant computational and statistical advantages. They also extend their analysis to algorithms whose parameters do not need to exactly lie on random subspaces, leveraging rate-distortion theory to incorporate a distortion term measuring model compressibility under slicing. This allows for tighter bounds without compromising performance or requiring model compression. The authors propose a regularization scheme enabling practitioners to control generalization through compressibility and empirically validate their results, achieving the computation of non-vacuous information-theoretic generalization bounds for neural networks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper shows how machine learning algorithms can be improved by using a new way to measure how well they generalize. Generalization is when an algorithm works well on new data it has never seen before. The authors use a concept called “information theory” to create new, tighter limits on how well an algorithm will do. They also show that their approach can help algorithms work better even if the parameters don’t exactly lie on random subspaces. This means that practitioners can control how well an algorithm generalizes by making it more compressible. The authors tested their ideas and were able to compute non-vacuous information-theoretic generalization bounds for neural networks, which was previously impossible. |
Keywords
» Artificial intelligence » Generalization » Machine learning » Model compression » Regularization