Summary of Ci-bench: Benchmarking Contextual Integrity Of Ai Assistants on Synthetic Data, by Zhao Cheng et al.
CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data
by Zhao Cheng, Diane Wan, Matthew Abueg, Sahra Ghalebikesabi, Ren Yi, Eugene Bagdasarian, Borja Balle, Stefan Mellem, Shawn O’Banion
First submitted to arxiv on: 20 Sep 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed paper introduces CI-Bench, a comprehensive benchmark for evaluating the ability of AI assistants to protect personal information during model inference. The authors leverage the Contextual Integrity framework to assess information flow across roles, information types, and transmission principles. A novel, scalable data pipeline is presented to generate natural communications, including dialogues and emails, which are used to create 44 thousand test samples across eight domains. Additionally, the paper formulates and evaluates a naive AI assistant to demonstrate the need for further study and careful training towards personal assistant tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary AI assistants have the potential to perform diverse tasks on behalf of users, but they may also share personal data, raising significant privacy challenges. To address this issue, researchers introduce CI-Bench, a new benchmark that helps evaluate AI assistants’ ability to protect user information. The authors create a large dataset with natural communications and test how well an AI assistant performs in keeping user data private. |
Keywords
» Artificial intelligence » Inference