Summary of Unveiling Narrative Reasoning Limits Of Large Language Models with Trope in Movie Synopses, by Hung-ting Su et al.
Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
by Hung-Ting Su, Ya-Ching Hsu, Xudong Lin, Xiang-Qian Shi, Yulei Niu, Han-Yuan Hsu, Hung-yi Lee, Winston H. Hsu
First submitted to arxiv on: 22 Sep 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper explores the abstract reasoning capabilities of large language models (LLMs) in narrative contexts, building upon their successes in multi-step reasoning tasks like mathematics and logic. The authors utilize movie synopses to assess the performance of state-of-the-art LLMs and find that they struggle with abstraction. To address this, they introduce a trope-wise querying approach, which boosts the F1 score by 11.8 points. Additionally, the study reveals that chain-of-thought (CoT) prompting can cause hallucinations in narrative content, reducing GPT-4’s performance. The authors also propose an Adversarial Injection method to embed trope-related text tokens into movie synopses without explicit tropes, demonstrating CoT’s heightened sensitivity to such injections. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large language models are very smart computers that can understand and generate human-like text. Recently, they’ve been great at solving math problems and understanding common sense questions. But what about stories? Can they understand and create narratives like we do? The study finds that these models aren’t as good at this type of thinking, which is called abstract reasoning. To help them get better, the authors came up with a new way to ask questions about movie plots using special techniques called tropes. This improved their performance by a lot! However, they also found that another technique called chain-of-thought prompting can sometimes make things worse and create fake information. |
Keywords
» Artificial intelligence » F1 score » Gpt » Prompting