Summary of Wasserstein Distances, Neuronal Entanglement, and Sparsity, by Shashata Sawmya et al.

Wasserstein Distances, Neuronal Entanglement, and Sparsity

by Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

First submitted to arxiv on: 24 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed paper investigates how disentanglement can be used to understand the performance of large language models (LLMs) under weight sparsity, a post-training optimization technique. A novel measure is introduced to estimate neuronal entanglement using the Wasserstein distance between a neuron’s output distribution and a Gaussian. The study reveals the existence of “Wasserstein Neurons” in each linear layer of an LLM, characterized by highly non-Gaussian output distributions and significant impact on model accuracy. To disentangle polysemantic neurons, a new experimental framework is proposed, separating each layer’s inputs to create a mixture of experts. This framework provides strong evidence that it effectively disentangles the input-output relationship of individual neurons, particularly Wasserstein neurons.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large language models are getting better at understanding human language, but they’re still not very good at explaining why they make certain predictions. To fix this, scientists have been trying to figure out how to “disentangle” different types of information that these models are processing. One approach is to look at the way neurons in these models work together. Neurons are like tiny computers that process information, and when they’re working together, it’s called entanglement. The researchers in this paper propose a new way to measure how entangled these neurons are and find that some of them are really good at making predictions even when their weights are reduced or “sparsified”. They also show that these special neurons play an important role in how the model works.

Keywords

» Artificial intelligence » Mixture of experts » Optimization

Wasserstein Distances, Neuronal Entanglement, and Sparsity

by Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of The Impact Of Geometric Complexity on Neural Collapse in Transfer Learning, by Michael Munn et al.

Summary of Spatio-temporal Value Semantics-based Abstraction For Dense Deep Reinforcement Learning, by Jihui Nie and Dehui Du and Jiangnan Zhao

Related Posts