Loading Now

Summary of Decision-making Behavior Evaluation Framework For Llms Under Uncertain Context, by Jingru Jia et al.


Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

by Jingru Jia, Zehua Yuan, Junhao Pan, Paul E. McNamara, Deming Chen

First submitted to arxiv on: 10 Jun 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Theoretical Economics (econ.TH)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
When making decisions under uncertainty, individuals often deviate from rational behavior, which can be evaluated across three dimensions: risk preference, probability weighting, and loss aversion. Given the widespread use of large language models (LLMs) in decision-making processes, it is crucial to assess whether their behavior aligns with human norms and ethical expectations or exhibits potential biases. Several empirical studies have investigated the rationality and social behavior performance of LLMs, yet their internal decision-making tendencies and capabilities remain inadequately understood. This paper proposes a framework, grounded in behavioral economics, to evaluate the decision-making behaviors of LLMs. Through a multiple-choice-list experiment, we estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro. Our results reveal that LLMs generally exhibit patterns similar to humans, such as risk aversion and loss aversion, with a tendency to overweight small probabilities. However, there are significant variations in the degree to which these behaviors are expressed across different LLMs.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models (LLMs) are used to make decisions, but do they act like humans? This paper looks at how three commercial LLMs – ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro – make choices when faced with uncertainty. The researchers compared the LLMs’ behavior to human norms and found that they tend to be risk-averse and avoid losses. However, there are some differences between how humans and LLMs behave. For example, one LLM was more cautious when making decisions about sexual minority groups or physical disabilities.

Keywords

» Artificial intelligence  » Claude  » Gemini  » Probability