Summary of Unveiling Narrative Reasoning Limits Of Large Language Models with Trope in Movie Synopses, by Hung-ting Su et al.

Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses

by Hung-Ting Su, Ya-Ching Hsu, Xudong Lin, Xiang-Qian Shi, Yulei Niu, Han-Yuan Hsu, Hung-yi Lee, Winston H. Hsu

First submitted to arxiv on: 22 Sep 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper explores the abstract reasoning capabilities of large language models (LLMs) in narrative contexts, building upon their successes in multi-step reasoning tasks like mathematics and logic. The authors utilize movie synopses to assess the performance of state-of-the-art LLMs and find that they struggle with abstraction. To address this, they introduce a trope-wise querying approach, which boosts the F1 score by 11.8 points. Additionally, the study reveals that chain-of-thought (CoT) prompting can cause hallucinations in narrative content, reducing GPT-4’s performance. The authors also propose an Adversarial Injection method to embed trope-related text tokens into movie synopses without explicit tropes, demonstrating CoT’s heightened sensitivity to such injections.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large language models are very smart computers that can understand and generate human-like text. Recently, they’ve been great at solving math problems and understanding common sense questions. But what about stories? Can they understand and create narratives like we do? The study finds that these models aren’t as good at this type of thinking, which is called abstract reasoning. To help them get better, the authors came up with a new way to ask questions about movie plots using special techniques called tropes. This improved their performance by a lot! However, they also found that another technique called chain-of-thought prompting can sometimes make things worse and create fake information.

Keywords

* Artificial intelligence * F1 score * Gpt * Prompting

Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses

by Hung-Ting Su, Ya-Ching Hsu, Xudong Lin, Xiang-Qian Shi, Yulei Niu, Han-Yuan Hsu, Hung-yi Lee, Winston H. Hsu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Sketch ‘n Solve: An Efficient Python Package For Large-scale Least Squares Using Randomized Numerical Linear Algebra, by Alex Lavaee

Summary of A Competitive Baseline For Deep Learning Enhanced Data Assimilation Using Conditional Gaussian Ensemble Kalman Filtering, by Zachariah Malik et al.

Related Posts