Summary of Are Self-explanations From Large Language Models Faithful?, by Andreas Madsen et al.

Are self-explanations from Large Language Models faithful?

by Andreas Madsen, Sarath Chandar, Siva Reddy

First submitted to arxiv on: 15 Jan 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper proposes a method to measure the faithfulness of self-explanations provided by Large Language Models (LLMs), which is crucial to ensure their reasoning is accurate. The authors introduce self-consistency checks as a means to evaluate the faithfulness of these explanations, demonstrating that they are not only dependent on the LLM model and task but also on the type of explanation being used. For instance, in sentiment classification, counterfactuals are more faithful for Llama2, feature attribution is more faithful for Mistral, and redaction is more faithful for Falcon 40B.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large Language Models (LLMs) can explain their answers, but it’s hard to know if they’re telling the truth. This paper tries to figure out how well these models really understand what they’re doing by checking if their explanations make sense. It turns out that LLMs are not always reliable, and it depends on which type of explanation is being used and which model is providing it.

Keywords

* Artificial intelligence * Classification

Are self-explanations from Large Language Models faithful?

by Andreas Madsen, Sarath Chandar, Siva Reddy

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Improving Ocr Quality in 19th Century Historical Documents Using a Combined Machine Learning Based Approach, by David Fleischhacker et al.

Summary of Calpric: Inclusive and Fine-grain Labeling Of Privacy Policies with Crowdsourcing and Active Learning, by Wenjun Qiu et al.

Related Posts