Loading Now

Summary of Wasserstein Distances, Neuronal Entanglement, and Sparsity, by Shashata Sawmya et al.


Wasserstein Distances, Neuronal Entanglement, and Sparsity

by Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

First submitted to arxiv on: 24 May 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed paper investigates how disentanglement can be used to understand the performance of large language models (LLMs) under weight sparsity, a post-training optimization technique. A novel measure is introduced to estimate neuronal entanglement using the Wasserstein distance between a neuron’s output distribution and a Gaussian. The study reveals the existence of “Wasserstein Neurons” in each linear layer of an LLM, characterized by highly non-Gaussian output distributions and significant impact on model accuracy. To disentangle polysemantic neurons, a new experimental framework is proposed, separating each layer’s inputs to create a mixture of experts. This framework provides strong evidence that it effectively disentangles the input-output relationship of individual neurons, particularly Wasserstein neurons.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models are getting better at understanding human language, but they’re still not very good at explaining why they make certain predictions. To fix this, scientists have been trying to figure out how to “disentangle” different types of information that these models are processing. One approach is to look at the way neurons in these models work together. Neurons are like tiny computers that process information, and when they’re working together, it’s called entanglement. The researchers in this paper propose a new way to measure how entangled these neurons are and find that some of them are really good at making predictions even when their weights are reduced or “sparsified”. They also show that these special neurons play an important role in how the model works.

Keywords

» Artificial intelligence  » Mixture of experts  » Optimization