Summary of Representations As Language: An Information-theoretic Framework For Interpretability, by Henry Conklin et al.

Representations as Language: An Information-Theoretic Framework for Interpretability

by Henry Conklin, Kenny Smith

First submitted to arxiv on: 4 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Large-scale neural language models excel in various linguistic tasks, but their internal workings remain largely opaque, making it difficult to understand what they learn and when. To address this, we propose a novel approach to interpretability by treating the model’s representation mapping as a language in its own right. This allows us to introduce information-theoretic measures quantifying the structure of representations with respect to input. Our measures are fast to compute, grounded in linguistic theory, and can predict which models generalize best based on their representations. We apply these measures to describe two distinct training phases: an initial phase of in-distribution learning reducing task loss, followed by a second stage where representations become robust to noise, leading to improved generalization performance. Our findings also explore how model size affects the structure of representational space, showing that larger models compress their representations more than smaller counterparts.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large language models are great at understanding language, but they’re hard to understand themselves! They take in sentences and create a special kind of code (called a representation) that makes it tough for us to know what they learned or how well they’ll work outside of the original training data. To solve this problem, we came up with a new way to “decode” these representations by treating them like a language in their own right. This lets us measure just how structured and organized these codes are, and even predict which models will generalize best based on what they learn. We found that these models go through two main phases of training: one where they get really good at the original task, and another where they become super robust to noise and distractions. This second phase is when generalization performance really takes off! Finally, we discovered that bigger models tend to compress their representations more than smaller ones.

Keywords

» Artificial intelligence » Generalization

Representations as Language: An Information-Theoretic Framework for Interpretability

by Henry Conklin, Kenny Smith

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-supervised Learning, by Lukas Christ et al.

Summary of Da-flow: Dual Attention Normalizing Flow For Skeleton-based Video Anomaly Detection, by Ruituo Wu et al.

Related Posts