Loading Now

Summary of Are Large Language Models Strategic Decision Makers? a Study Of Performance and Bias in Two-player Non-zero-sum Games, by Nathan Herr et al.


Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games

by Nathan Herr, Fernando Acero, Roberta Raileanu, María Pérez-Ortiz, Zhibin Li

First submitted to arxiv on: 5 Jul 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Computation and Language (cs.CL); Computer Science and Game Theory (cs.GT)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the strategic decision-making abilities of Large Language Models (LLMs) in complex social scenarios, leveraging game theory to assess their performance. The authors evaluate GPT-3.5, GPT-4-Turbo, GPT-4o, and Llama-3-8B in canonical two-player non-zero-sum games, Stag Hunt and Prisoner Dilemma. The results show that the models are affected by systematic biases, including positional bias, payoff bias, and behavioural bias, which impact their performance when game configurations misalign with these biases. Interestingly, newer LLMs like GPT-4o suffer significant performance drops, while a chain-of-thought (CoT) prompting method can reduce biases in some models but worsen them in others.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper looks at how well Large Language Models do when making big decisions that involve other people. They use game theory to understand the LLMs’ choices. The authors tested GPT-3.5, GPT-4-Turbo, GPT-4o, and Llama-3-8B in two different games. They found that these models make mistakes because of things like how they’re positioned or what they want to get out of the game. This means their performance drops when the game doesn’t match up with those biases. Surprisingly, newer models don’t always do better than older ones.

Keywords

» Artificial intelligence  » Gpt  » Llama  » Prompting