Loading Now

Summary of Measuring Social Norms Of Large Language Models, by Ye Yuan et al.


Measuring Social Norms of Large Language Models

by Ye Yuan, Kexin Tang, Jianhao Shen, Ming Zhang, Chenguang Wang

First submitted to arxiv on: 3 Apr 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed dataset challenges large language models to demonstrate an understanding of social norms by requiring a fundamental comprehension of these skills. The dataset features 402 skills and 12,383 questions covering various social norms, designed in accordance with the K-12 curriculum to facilitate direct comparison with human performance, particularly that of elementary students. Large language models like GPT3.5-Turbo and LLaMA2-Chat show significant improvements on this benchmark, with results only slightly below those achieved by humans. To further enhance these models’ ability to understand social norms, a multi-agent framework is proposed, leading to parity with human performance.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models are being used in many real-world applications, and it’s important for them to understand social norms. A new dataset challenges these models to show they can do this by asking questions about different social norms like opinions, arguments, culture, and laws. The dataset is designed so that it’s easy to compare the performance of large language models with that of humans, specifically elementary school students. Surprisingly, some recent large language models are able to answer these questions almost as well as humans do! To make these models even better, a new way of using multiple agents is proposed.

Keywords

* Artificial intelligence