Summary of Are Large Language Models Strategic Decision Makers? a Study Of Performance and Bias in Two-player Non-zero-sum Games, by Nathan Herr et al.

Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games

by Nathan Herr, Fernando Acero, Roberta Raileanu, María Pérez-Ortiz, Zhibin Li

First submitted to arxiv on: 5 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper investigates the strategic decision-making abilities of Large Language Models (LLMs) in complex social scenarios, leveraging game theory to assess their performance. The authors evaluate GPT-3.5, GPT-4-Turbo, GPT-4o, and Llama-3-8B in canonical two-player non-zero-sum games, Stag Hunt and Prisoner Dilemma. The results show that the models are affected by systematic biases, including positional bias, payoff bias, and behavioural bias, which impact their performance when game configurations misalign with these biases. Interestingly, newer LLMs like GPT-4o suffer significant performance drops, while a chain-of-thought (CoT) prompting method can reduce biases in some models but worsen them in others.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper looks at how well Large Language Models do when making big decisions that involve other people. They use game theory to understand the LLMs’ choices. The authors tested GPT-3.5, GPT-4-Turbo, GPT-4o, and Llama-3-8B in two different games. They found that these models make mistakes because of things like how they’re positioned or what they want to get out of the game. This means their performance drops when the game doesn’t match up with those biases. Surprisingly, newer models don’t always do better than older ones.

Keywords

» Artificial intelligence » Gpt » Llama » Prompting

Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games

by Nathan Herr, Fernando Acero, Roberta Raileanu, María Pérez-Ortiz, Zhibin Li

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Multi-branch Auxiliary Fusion Yolo with Re-parameterization Heterogeneous Convolutional For Accurate Object Detection, by Zhiqiang Yang et al.

Summary of A Defeasible Deontic Calculus For Resolving Norm Conflicts, by Taylor Olson et al.

Related Posts