Summary of Concept Bottleneck Large Language Models, by Chung-en Sun et al.
Concept Bottleneck Large Language Models
by Chung-En Sun, Tuomas Oikarinen, Berk Ustun, Tsui-Wei Weng
First submitted to arxiv on: 11 Dec 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel framework for building inherently interpretable Large Language Models (LLMs) is introduced, namely Concept Bottleneck Large Language Models (CB-LLMs). Unlike traditional black-box LLMs that rely on limited post-hoc interpretations, CB-LLMs integrate intrinsic interpretability directly into the models, enabling accurate explanations with scalability and transparency. This framework is applied to two essential NLP tasks: text classification and text generation. For text classification, CB-LLMs are competitive with, and at times outperform, traditional black-box models while providing explicit and interpretable reasoning. In text generation, interpretable neurons in CB-LLMs enable precise concept detection, controlled generation, and safer outputs. The embedded interpretability empowers users to transparently identify harmful content, steer model behavior, and unlearn undesired concepts, significantly enhancing the safety, reliability, and trustworthiness of LLMs. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary We introduce a new way to build language models that can explain their decisions. This is called Concept Bottleneck Large Language Models (CB-LLMs). Instead of just doing math, CB-LLMs figure out what’s important and why. We tested this on two tasks: classifying text as positive or negative, and generating new text. For classification, our model was just as good as others, but could explain its decisions. For generation, it did a better job at making sense and not saying bad things. |
Keywords
» Artificial intelligence » Classification » Nlp » Text classification » Text generation