Summary of Velo: a Vector Database-assisted Cloud-edge Collaborative Llm Qos Optimization Framework, by Zhi Yao et al.
VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework
by Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, Weijia Jia
First submitted to arxiv on: 19 Jun 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The Large Language Model (LLM) has become widely used across various domains, but its deployment in cloud data centers often leads to significant response delays and high costs. This paper introduces the Vector database-assisted cloud-Edge collaborative LLM QoS Optimization (VELO) framework to address these issues. The VELO framework uses vector databases to cache LLM request results at the edge, reducing response times for subsequent similar requests. Unlike traditional optimization methods that modify the internal structure of the LLM, VELO is applicable to diverse LLMs without altering their architecture. Building on this framework, the paper formulates a Markov Decision Process (MDP) and develops an algorithm based on Multi-Agent Reinforcement Learning (MARL) to determine whether to request the LLM in the cloud or return results from the vector database at the edge. The MARL policy network is refined and expert demonstrations are integrated to enhance feature extraction and training. Experimental results demonstrate that VELO effectively improves user satisfaction by reducing delay and resource consumption for edge users utilizing LLMs. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The Large Language Model (LLM) has become very popular, but using it in cloud data centers can be slow and expensive. This paper solves these problems with a new way to use vector databases at the edge of the network. The idea is to store results from similar requests in a vector database so that when another request comes in, it can get the answer quickly without having to go all the way back to the cloud. This approach doesn’t change how the LLM works and can be used with different types of LLMs. The paper also creates an algorithm to decide whether to use the cloud or the edge database based on how much time and resources are available. |
Keywords
» Artificial intelligence » Feature extraction » Large language model » Optimization » Reinforcement learning