Loading Now

Summary of Dissociation Of Faithful and Unfaithful Reasoning in Llms, by Evelyn Yee and Alice Li and Chenyu Tang and Yeon Ho Jung and Ramamohan Paturi and Leon Bergen


Dissociation of Faithful and Unfaithful Reasoning in LLMs

by Evelyn Yee, Alice Li, Chenyu Tang, Yeon Ho Jung, Ramamohan Paturi, Leon Bergen

First submitted to arxiv on: 23 May 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Large language models (LLMs) tend to perform better in downstream tasks when they generate Chain of Thought reasoning text before providing an answer. Our research explores how LLMs recover from errors in this Chain of Thought process. We analyzed the error recovery behaviors and found evidence for unfaithfulness, where models reach the correct answer despite flawed reasoning. We identified factors influencing LLM recovery behavior: more frequent recovery from obvious errors and increased evidence supporting the correct answer. These factors have different effects on faithful and unfaithful recoveries. Our results suggest distinct mechanisms drive these types of error recoveries. Targeting these mechanisms could reduce unfaithful reasoning and improve model interpretability.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models can be trained to do tasks better by thinking through a problem before giving an answer. But sometimes they make mistakes in this process. We studied how the models correct their mistakes. We found that sometimes the models are wrong, but they still get the right answer anyway! This happens more often when the mistake is obvious and there’s strong evidence for the correct answer. Our research shows that there are different ways that the models correct mistakes, and that understanding these differences can help us make the models better.

Keywords

» Artificial intelligence