Summary of One Thousand and One Pairs: a “novel” Challenge For Long-context Language Models, by Marzena Karpinska et al.

One Thousand and One Pairs: A “novel” challenge for long-context language models

by Marzena Karpinska, Katherine Thai, Kyle Lo, Tanya Goyal, Mohit Iyyer

First submitted to arxiv on: 24 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper presents a new dataset, NoCha, designed to test long-context language models’ (LLMs) ability to retrieve, synthesize, and reason over information across book-length inputs. Unlike existing synthetic benchmarks that only evaluate surface-level retrieval capabilities, NoCha contains 1,001 pairs of true and false claims about recently-published English fictional books, requiring global reasoning to verify. The authors evaluate ten LLMs, including GPT-4o, and find that none outperform random chance on the task, with GPT-4o achieving an accuracy of 55.8%. Analysis reveals models perform better on sentence-level retrieval and worse on speculative fiction books. The proposed methodology enables easy analysis of future models.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper creates a new way to test how well language models can understand big books. They make a dataset called NoCha with pairs of true and false claims about recently-published book series. Unlike previous tests that only looked at surface-level information, NoCha requires readers to think globally about the entire book. The authors tested 10 language models and found that none could do better than chance. They also discovered that models are better at understanding simple sentences rather than complex ideas from science fiction books.

Keywords

* Artificial intelligence * Gpt

One Thousand and One Pairs: A “novel” challenge for long-context language models

by Marzena Karpinska, Katherine Thai, Kyle Lo, Tanya Goyal, Mohit Iyyer

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Hcqa @ Ego4d Egoschema Challenge 2024, by Haoyu Zhang et al.

Summary of Combining Supervised Learning and Reinforcement Learning For Multi-label Classification Tasks with Partial Labels, by Zixia Jia et al.

Related Posts