Loading Now

Summary of Giebench: Towards Holistic Evaluation Of Group Identity-based Empathy For Large Language Models, by Leyan Wang et al.


GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models

by Leyan Wang, Yonggang Jin, Tianhao Shen, Tianyu Zheng, Xinrun Du, Chenchen Zhang, Wenhao Huang, Jiaheng Liu, Shi Wang, Ge Zhang, Liuyu Xiang, Zhaofeng He

First submitted to arxiv on: 21 Jun 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper introduces GIEBench, a novel benchmark for evaluating the empathy of large language models (LLMs) towards diverse group identities. The existing benchmarks primarily focus on universal human emotions, neglecting the context of individuals’ group identities. GIEBench consists of 11 identity dimensions, covering 97 group identities with 999 single-choice questions related to specific group identities such as gender, age, occupation, and race. This benchmark aims to assess LLMs’ ability to respond from the perspective of identified groups, supporting the development of empathetic LLM applications tailored to users with different identities. The evaluation of 23 LLMs reveals that they can understand different identity standpoints but fail to consistently exhibit equal empathy without explicit instructions. This highlights the need for improved alignment of LLMs with diverse values to accommodate human identities.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper creates a new way to test how well large language models (LLMs) understand and care about people from different groups, such as women, men, young people, old people, and people from different countries. Right now, most tests just focus on general feelings like sadness and happiness, but this benchmark looks at specific group identities like gender, age, occupation, and race. The test has 11 categories with many questions to see how well LLMs can understand and respond from the perspective of these groups. They found that some LLMs are good at understanding different perspectives, but they often don’t show equal care for all groups without special instructions. This shows that we need better training for LLMs so they can better understand and respect people’s differences.

Keywords

» Artificial intelligence  » Alignment