Loading Now

Summary of Kgpa: Robustness Evaluation For Large Language Models Via Cross-domain Knowledge Graphs, by Aihua Pei (1) et al.


KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs

by Aihua Pei, Zehua Yang, Shunan Zhu, Ruoxi Cheng, Ju Jia, Lina Wang

First submitted to arxiv on: 16 Jun 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed framework for assessing the robustness of large language models (LLMs) leverages knowledge graphs to generate original prompts and create adversarial prompts through poisoning. This approach evaluates the robustness of LLMs under attack scenarios, providing a more comprehensive assessment than existing frameworks that rely on specific benchmarks. The framework’s modules are systematically evaluated, showing that the ChatGPT family’s robustness ranks as GPT-4-turbo > GPT-4o > GPT-3.5-turbo. The results highlight the influence of professional domains on LLM robustness.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper develops a new way to test how well large language models can withstand attacks that try to make them make mistakes. They use special graphs called knowledge graphs to create original prompts and fake ones that are designed to trick the models. This helps evaluate how good the models are at staying accurate in real-life situations. The results show that some language models, like ChatGPT, are better than others at withstanding attacks.

Keywords

» Artificial intelligence  » Gpt