Loading Now

Summary of Concept Bottleneck Large Language Models, by Chung-en Sun et al.


Concept Bottleneck Large Language Models

by Chung-En Sun, Tuomas Oikarinen, Berk Ustun, Tsui-Wei Weng

First submitted to arxiv on: 11 Dec 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel framework for building inherently interpretable Large Language Models (LLMs) is introduced, namely Concept Bottleneck Large Language Models (CB-LLMs). Unlike traditional black-box LLMs that rely on limited post-hoc interpretations, CB-LLMs integrate intrinsic interpretability directly into the models, enabling accurate explanations with scalability and transparency. This framework is applied to two essential NLP tasks: text classification and text generation. For text classification, CB-LLMs are competitive with, and at times outperform, traditional black-box models while providing explicit and interpretable reasoning. In text generation, interpretable neurons in CB-LLMs enable precise concept detection, controlled generation, and safer outputs. The embedded interpretability empowers users to transparently identify harmful content, steer model behavior, and unlearn undesired concepts, significantly enhancing the safety, reliability, and trustworthiness of LLMs.
Low GrooveSquid.com (original content) Low Difficulty Summary
We introduce a new way to build language models that can explain their decisions. This is called Concept Bottleneck Large Language Models (CB-LLMs). Instead of just doing math, CB-LLMs figure out what’s important and why. We tested this on two tasks: classifying text as positive or negative, and generating new text. For classification, our model was just as good as others, but could explain its decisions. For generation, it did a better job at making sense and not saying bad things.

Keywords

» Artificial intelligence  » Classification  » Nlp  » Text classification  » Text generation