Loading Now

Summary of Time Will Tell: Timing Side Channels Via Output Token Count in Large Language Models, by Tianchen Zhang et al.


Time Will Tell: Timing Side Channels via Output Token Count in Large Language Models

by Tianchen Zhang, Gururaj Saileshwar, David Lie

First submitted to arxiv on: 19 Dec 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computation and Language (cs.CL); Cryptography and Security (cs.CR)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper presents a novel side-channel attack that enables an adversary to extract sensitive information about inference inputs in large language models (LLMs) based on the number of output tokens. The attack is demonstrated in two common LLM tasks: machine translation and classification. Experiments show that the attack can successfully recover the target language in machine translation tasks with more than 75% precision across three different models, as well as the input class in text classification tasks with more than 70% precision from open-source LLMs and production models. The paper also proposes mitigations against the output token count side-channel.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research shows how an attacker can figure out what language someone is trying to translate or which category something belongs to, just by looking at the response from a large language model. The model uses this information to make predictions, but it’s not secure enough. The researchers found that if you have access to the model and can see the responses, you can guess the right answer most of the time.

Keywords

» Artificial intelligence  » Classification  » Inference  » Large language model  » Precision  » Text classification  » Token  » Translation