Summary of Uncovering Autoregressive Llm Knowledge Of Thematic Fit in Event Representation, by Safeyah Khaled Alshemali et al.
Uncovering Autoregressive LLM Knowledge of Thematic Fit in Event Representation
by Safeyah Khaled Alshemali, Daniel Bauer, Yuval Marton
First submitted to arxiv on: 19 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates whether pre-trained autoregressive large language models (LLMs) possess knowledge about thematic fit estimation, a task measuring compatibility between predicates, arguments, and semantic roles. Previous work focused on distributional or neural event representation models trained with indirect labels. In this study, we evaluate both closed and open LLMs on psycholinguistic datasets along three axes: reasoning form (chain-of-thought prompting vs. simple prompting), input form (contextualized sentences vs. raw tuples), and output form (categorical vs. numeric). Our results show that chain-of-thought reasoning is more effective on datasets with self-explanatory semantic role labels, especially Location. We found that generated sentences help in few settings but lower performance in many others. Predefined categorical output improves GPT’s results across the board except for Llama. The study also highlights the importance of filtering out semantically incoherent generated sentences to improve reasoning and overall performance. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper explores how well language models understand something called “thematic fit”. Thematic fit is about figuring out if a sentence makes sense based on the words used. Previous work tried to teach machines using indirect labels, but this study looks at pre-trained language models instead. Researchers tested these models on different types of sentences and found that some methods work better than others. For example, giving the model more context helps in some cases. The results show that one type of model (GPT) performs well when given specific categories to choose from, but another model (Llama) does worse with this approach. Overall, the study suggests that language models need to be able to tell good sentences apart from bad ones to do their job well. |
Keywords
» Artificial intelligence » Autoregressive » Gpt » Llama » Prompting