Loading Now

Summary of Representations As Language: An Information-theoretic Framework For Interpretability, by Henry Conklin et al.


Representations as Language: An Information-Theoretic Framework for Interpretability

by Henry Conklin, Kenny Smith

First submitted to arxiv on: 4 Jun 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Large-scale neural language models excel in various linguistic tasks, but their internal workings remain largely opaque, making it difficult to understand what they learn and when. To address this, we propose a novel approach to interpretability by treating the model’s representation mapping as a language in its own right. This allows us to introduce information-theoretic measures quantifying the structure of representations with respect to input. Our measures are fast to compute, grounded in linguistic theory, and can predict which models generalize best based on their representations. We apply these measures to describe two distinct training phases: an initial phase of in-distribution learning reducing task loss, followed by a second stage where representations become robust to noise, leading to improved generalization performance. Our findings also explore how model size affects the structure of representational space, showing that larger models compress their representations more than smaller counterparts.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models are great at understanding language, but they’re hard to understand themselves! They take in sentences and create a special kind of code (called a representation) that makes it tough for us to know what they learned or how well they’ll work outside of the original training data. To solve this problem, we came up with a new way to “decode” these representations by treating them like a language in their own right. This lets us measure just how structured and organized these codes are, and even predict which models will generalize best based on what they learn. We found that these models go through two main phases of training: one where they get really good at the original task, and another where they become super robust to noise and distractions. This second phase is when generalization performance really takes off! Finally, we discovered that bigger models tend to compress their representations more than smaller ones.

Keywords

» Artificial intelligence  » Generalization