Summary of Prompt Compression with Context-aware Sentence Encoding For Fast and Improved Llm Inference, by Barys Liskavets et al.
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference
by Barys Liskavets, Maxim Ushakov, Shuvendu Roy, Mark Klibanov, Ali Etemad, Shane Luke
First submitted to arxiv on: 2 Sep 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a novel sentence-level prompt compression technique called Context-Aware Prompt Compression (CPC) to reduce the computational cost of large language models (LLMs) while retaining helpful information. The method employs a context-aware sentence encoder that provides relevance scores for each sentence given a question, trained in a contrastive setup on a new dataset consisting of questions, positives, and negative pairs. CPC outperforms prior works on prompt compression benchmarks and is up to 10.93x faster at inference compared to the best token-level compression method, with better improvement for shorter length constraints. The code and dataset are released for quick reproducibility and further development. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper makes a big difference in how computers answer questions. Right now, it takes a lot of computation to make sure they get the right answers. But what if we could compress the information needed to answer those questions? That’s exactly what this team did. They came up with a new way to do this called Context-Aware Prompt Compression (CPC). It works by looking at each sentence and deciding how important it is for answering the question. This helps computers focus on the most important information, making them faster and more efficient. |
Keywords
» Artificial intelligence » Encoder » Inference » Prompt » Token