Loading Now

Summary of Werewolf Arena: a Case Study in Llm Evaluation Via Social Deduction, by Suma Bailis et al.


Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction

by Suma Bailis, Jane Friedhoff, Feiyang Chen

First submitted to arxiv on: 18 Jul 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
As machine learning educators, we’ll delve into a novel framework for evaluating large language models (LLMs) through the lens of the classic social deduction game, Werewolf. Werewolf Arena introduces a dynamic turn-taking system based on bidding, mirroring real-world discussions where individuals strategically choose when to speak. The framework pits LLMs against each other, navigating deception, deduction, and persuasion. Our results reveal distinct strengths and weaknesses in the models’ strategic reasoning and communication, highlighting Werewolf Arena’s potential as a challenging and scalable LLM benchmark.
Low GrooveSquid.com (original content) Low Difficulty Summary
Werewolf Arena is a new way to test how well large language models can understand and play social games like Werewolf. In this game, some players are “werewolves” who don’t want to get caught, while others are humans trying to figure out who the werewolves are. The language models have to work together or against each other to win. This helps us see how well they can understand and make decisions in complex situations.

Keywords

» Artificial intelligence  » Machine learning