Summary of Sage: Evaluating Moral Consistency in Large Language Models, by Vamshi Krishna Bonagiri et al.
SaGE: Evaluating Moral Consistency in Large Language Models
by Vamshi Krishna Bonagiri, Sreeram Vennam, Priyanshul Govil, Ponnurangam Kumaraguru, Manas Gaur
First submitted to arxiv on: 21 Feb 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Recent advancements in Large Language Models (LLMs) have showcased impressive capabilities in conversational systems, but we demonstrate that even state-of-the-art LLMs can be morally inconsistent in their generations. This inconsistency raises questions about their reliability and trustworthiness. Prior works focused on developing ground-truth data for specific tasks, but moral scenarios often lack universally agreed-upon answers, making consistency crucial. We propose an information-theoretic measure, Semantic Graph Entropy (SaGE), grounded in “Rules of Thumb” (RoTs) to measure moral consistency. RoTs are abstract principles learned by models and help explain their decision-making strategies effectively. We constructed the Moral Consistency Corpus (MCC) with 50K moral questions, LLM responses, and corresponding RoTs. We also used SaGE to investigate LLM consistency on TruthfulQA and HellaSwag datasets. Our results show that task-accuracy and consistency are independent problems, highlighting the need for further investigation. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Recently, big language models have shown great skills in talking systems. But we found out that even the best ones can give wrong answers on moral questions. This makes us wonder if we can really trust them. Before, people focused on making sure these models got specific tasks right. But when it comes to moral questions, there’s no one “right” answer. That means we need to make sure the model is giving consistent answers. We came up with a new way to measure how consistent a model is by looking at its “rules of thumb”. These rules help us understand why the model made certain decisions. We also created a big database of moral questions and the models’ answers, along with their rules. By using this method on two popular datasets, we found that getting tasks right and being consistent are actually two different things. This means we need to keep looking into how these models work. |