Summary of Robust Implementation Of Retrieval-augmented Generation on Edge-based Computing-in-memory Architectures, by Ruiyang Qin et al.

Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures

by Ruiyang Qin, Zheyu Yan, Dewen Zeng, Zhenge Jia, Dancheng Liu, Jianbo Liu, Zhi Zheng, Ningyuan Cao, Kai Ni, Jinjun Xiong, Yiyu Shi

First submitted to arxiv on: 7 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Retrieval-Augmented Generation (RAG) is a resource-efficient Large Language Model (LLM) learning method that improves content quality without updating model parameters. While RAG-based LLMs can learn efficiently, they involve repetitive searches on profile data in every user-LLM interaction, leading to significant latency and the accumulation of user data. This paper proposes Robust CiM-backed RAG (RoCR), a novel framework utilizing Computing-in-Memory (CiM) architectures to accelerate matrix multiplications and reduce latency. RoCR combines a contrastive learning-based training method with noise-aware training to enable efficient search of profile data using CiM. To the best of our knowledge, this is the first work utilizing CiM to accelerate RAG. This paper addresses the open question of how to free RAG from constraints on edge devices.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Imagine a way for computers to learn and improve without needing a lot of energy or memory. This is called Retrieval-Augmented Generation (RAG). RAG helps computers understand human language better, but it can be slow and take up too much space. Scientists are trying to find a solution to this problem. They propose using something called Computing-in-Memory (CiM) to speed up RAG. CiM allows the computer to do calculations inside its memory instead of sending data back and forth between different parts of the computer. This new approach, called RoCR, can make computers learn faster and more efficiently.

Keywords

» Artificial intelligence » Large language model » Rag » Retrieval augmented generation

Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures

by Ruiyang Qin, Zheyu Yan, Dewen Zeng, Zhenge Jia, Dancheng Liu, Jianbo Liu, Zhi Zheng, Ningyuan Cao, Kai Ni, Jinjun Xiong, Yiyu Shi

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Naturalcodebench: Examining Coding Performance Mismatch on Humaneval and Natural User Prompts, by Shudan Zhang et al.

Summary of Supervised Anomaly Detection For Complex Industrial Images, by Aimira Baitieva et al.

Related Posts