Summary of Prompt Compression with Context-aware Sentence Encoding For Fast and Improved Llm Inference, by Barys Liskavets et al.

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

by Barys Liskavets, Maxim Ushakov, Shuvendu Roy, Mark Klibanov, Ali Etemad, Shane Luke

First submitted to arxiv on: 2 Sep 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper proposes a novel sentence-level prompt compression technique called Context-Aware Prompt Compression (CPC) to reduce the computational cost of large language models (LLMs) while retaining helpful information. The method employs a context-aware sentence encoder that provides relevance scores for each sentence given a question, trained in a contrastive setup on a new dataset consisting of questions, positives, and negative pairs. CPC outperforms prior works on prompt compression benchmarks and is up to 10.93x faster at inference compared to the best token-level compression method, with better improvement for shorter length constraints. The code and dataset are released for quick reproducibility and further development.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper makes a big difference in how computers answer questions. Right now, it takes a lot of computation to make sure they get the right answers. But what if we could compress the information needed to answer those questions? That’s exactly what this team did. They came up with a new way to do this called Context-Aware Prompt Compression (CPC). It works by looking at each sentence and deciding how important it is for answering the question. This helps computers focus on the most important information, making them faster and more efficient.

Keywords

» Artificial intelligence » Encoder » Inference » Prompt » Token

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

by Barys Liskavets, Maxim Ushakov, Shuvendu Roy, Mark Klibanov, Ali Etemad, Shane Luke

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Beyond Efficiency: Molecular Data Pruning For Enhanced Generalization, by Dingshuo Chen et al.

Summary of Disentangling Mean Embeddings For Better Diagnostics Of Image Generators, by Sebastian G. Gruber et al.

Related Posts