Summary of Robust Implementation Of Retrieval-augmented Generation on Edge-based Computing-in-memory Architectures, by Ruiyang Qin et al.
Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures
by Ruiyang Qin, Zheyu Yan, Dewen Zeng, Zhenge Jia, Dancheng Liu, Jianbo Liu, Zhi Zheng, Ningyuan Cao, Kai Ni, Jinjun Xiong, Yiyu Shi
First submitted to arxiv on: 7 May 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Retrieval-Augmented Generation (RAG) is a resource-efficient Large Language Model (LLM) learning method that improves content quality without updating model parameters. While RAG-based LLMs can learn efficiently, they involve repetitive searches on profile data in every user-LLM interaction, leading to significant latency and the accumulation of user data. This paper proposes Robust CiM-backed RAG (RoCR), a novel framework utilizing Computing-in-Memory (CiM) architectures to accelerate matrix multiplications and reduce latency. RoCR combines a contrastive learning-based training method with noise-aware training to enable efficient search of profile data using CiM. To the best of our knowledge, this is the first work utilizing CiM to accelerate RAG. This paper addresses the open question of how to free RAG from constraints on edge devices. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine a way for computers to learn and improve without needing a lot of energy or memory. This is called Retrieval-Augmented Generation (RAG). RAG helps computers understand human language better, but it can be slow and take up too much space. Scientists are trying to find a solution to this problem. They propose using something called Computing-in-Memory (CiM) to speed up RAG. CiM allows the computer to do calculations inside its memory instead of sending data back and forth between different parts of the computer. This new approach, called RoCR, can make computers learn faster and more efficiently. |
Keywords
» Artificial intelligence » Large language model » Rag » Retrieval augmented generation