Loading Now

Summary of Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment, by Allison Huang et al.


Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment

by Allison Huang, Yulu Niki Pi, Carlos Mougan

First submitted to arxiv on: 18 Nov 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates how large language models (LLMs) can be influenced by prompting them to alter their initial decisions and align with established ethical frameworks. Two experiments were designed to assess the susceptibility of LLMs to moral persuasion. In the first experiment, a Base Agent LLM was evaluated on morally ambiguous scenarios, and a Persuader Agent attempted to modify its initial decisions. The second experiment tested the ability of LLMs to adopt specific value alignments rooted in established philosophical theories. The results show that LLMs can be persuaded in morally charged scenarios, with factors such as model size, scenario complexity, and conversation length affecting persuasion success. Notably, LLMs from the same company but different sizes produced distinct outcomes, highlighting variability in susceptibility to ethical persuasion.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper looks at how big computer models can be changed by giving them instructions that align with good values. The researchers did two tests to see if these models could be convinced to make different choices when given reasons why their initial choices were wrong. They found out that the computer models can indeed be persuaded, and it depends on things like what kind of model is being used, how complicated the situation is, and how long the conversation lasts. Interestingly, even computer models from the same company but with different sizes had different results, showing that there’s a lot of variation in their ability to change their minds.

Keywords

» Artificial intelligence  » Prompting