Summary of Decision-making Behavior Evaluation Framework For Llms Under Uncertain Context, by Jingru Jia et al.
Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context
by Jingru Jia, Zehua Yuan, Junhao Pan, Paul E. McNamara, Deming Chen
First submitted to arxiv on: 10 Jun 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Theoretical Economics (econ.TH)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary When making decisions under uncertainty, individuals often deviate from rational behavior, which can be evaluated across three dimensions: risk preference, probability weighting, and loss aversion. Given the widespread use of large language models (LLMs) in decision-making processes, it is crucial to assess whether their behavior aligns with human norms and ethical expectations or exhibits potential biases. Several empirical studies have investigated the rationality and social behavior performance of LLMs, yet their internal decision-making tendencies and capabilities remain inadequately understood. This paper proposes a framework, grounded in behavioral economics, to evaluate the decision-making behaviors of LLMs. Through a multiple-choice-list experiment, we estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro. Our results reveal that LLMs generally exhibit patterns similar to humans, such as risk aversion and loss aversion, with a tendency to overweight small probabilities. However, there are significant variations in the degree to which these behaviors are expressed across different LLMs. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large language models (LLMs) are used to make decisions, but do they act like humans? This paper looks at how three commercial LLMs – ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro – make choices when faced with uncertainty. The researchers compared the LLMs’ behavior to human norms and found that they tend to be risk-averse and avoid losses. However, there are some differences between how humans and LLMs behave. For example, one LLM was more cautious when making decisions about sexual minority groups or physical disabilities. |
Keywords
» Artificial intelligence » Claude » Gemini » Probability