Loading Now

Summary of Are Self-explanations From Large Language Models Faithful?, by Andreas Madsen et al.


Are self-explanations from Large Language Models faithful?

by Andreas Madsen, Sarath Chandar, Siva Reddy

First submitted to arxiv on: 15 Jan 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes a method to measure the faithfulness of self-explanations provided by Large Language Models (LLMs), which is crucial to ensure their reasoning is accurate. The authors introduce self-consistency checks as a means to evaluate the faithfulness of these explanations, demonstrating that they are not only dependent on the LLM model and task but also on the type of explanation being used. For instance, in sentiment classification, counterfactuals are more faithful for Llama2, feature attribution is more faithful for Mistral, and redaction is more faithful for Falcon 40B.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large Language Models (LLMs) can explain their answers, but it’s hard to know if they’re telling the truth. This paper tries to figure out how well these models really understand what they’re doing by checking if their explanations make sense. It turns out that LLMs are not always reliable, and it depends on which type of explanation is being used and which model is providing it.

Keywords

* Artificial intelligence  * Classification