Loading Now

Summary of Prompt Compression with Context-aware Sentence Encoding For Fast and Improved Llm Inference, by Barys Liskavets et al.


Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

by Barys Liskavets, Maxim Ushakov, Shuvendu Roy, Mark Klibanov, Ali Etemad, Shane Luke

First submitted to arxiv on: 2 Sep 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a novel sentence-level prompt compression technique called Context-Aware Prompt Compression (CPC) to reduce the computational cost of large language models (LLMs) while retaining helpful information. The method employs a context-aware sentence encoder that provides relevance scores for each sentence given a question, trained in a contrastive setup on a new dataset consisting of questions, positives, and negative pairs. CPC outperforms prior works on prompt compression benchmarks and is up to 10.93x faster at inference compared to the best token-level compression method, with better improvement for shorter length constraints. The code and dataset are released for quick reproducibility and further development.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper makes a big difference in how computers answer questions. Right now, it takes a lot of computation to make sure they get the right answers. But what if we could compress the information needed to answer those questions? That’s exactly what this team did. They came up with a new way to do this called Context-Aware Prompt Compression (CPC). It works by looking at each sentence and deciding how important it is for answering the question. This helps computers focus on the most important information, making them faster and more efficient.

Keywords

» Artificial intelligence  » Encoder  » Inference  » Prompt  » Token