Summary of Dissociation Of Faithful and Unfaithful Reasoning in Llms, by Evelyn Yee and Alice Li and Chenyu Tang and Yeon Ho Jung and Ramamohan Paturi and Leon Bergen
Dissociation of Faithful and Unfaithful Reasoning in LLMs
by Evelyn Yee, Alice Li, Chenyu Tang, Yeon Ho Jung, Ramamohan Paturi, Leon Bergen
First submitted to arxiv on: 23 May 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Large language models (LLMs) tend to perform better in downstream tasks when they generate Chain of Thought reasoning text before providing an answer. Our research explores how LLMs recover from errors in this Chain of Thought process. We analyzed the error recovery behaviors and found evidence for unfaithfulness, where models reach the correct answer despite flawed reasoning. We identified factors influencing LLM recovery behavior: more frequent recovery from obvious errors and increased evidence supporting the correct answer. These factors have different effects on faithful and unfaithful recoveries. Our results suggest distinct mechanisms drive these types of error recoveries. Targeting these mechanisms could reduce unfaithful reasoning and improve model interpretability. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large language models can be trained to do tasks better by thinking through a problem before giving an answer. But sometimes they make mistakes in this process. We studied how the models correct their mistakes. We found that sometimes the models are wrong, but they still get the right answer anyway! This happens more often when the mistake is obvious and there’s strong evidence for the correct answer. Our research shows that there are different ways that the models correct mistakes, and that understanding these differences can help us make the models better. |