Loading Now

Summary of Enabling Energy-efficient Deployment Of Large Language Models on Memristor Crossbar: a Synergy Of Large and Small, by Zhehui Wang et al.


Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small

by Zhehui Wang, Tao Luo, Cheng Liu, Weichen Liu, Rick Siow Mong Goh, Weng-Fai Wong

First submitted to arxiv on: 21 Oct 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Large language models (LLMs) have gained attention for their diverse applications. However, their increasing size comes with a surge in computational requirements for training and deployment. Memristor crossbars offer a promising solution, showcasing a small footprint and high energy efficiency in computer vision models. Memristors provide higher density compared to conventional memory technologies, making them suitable for managing the extreme model size associated with LLMs. Nevertheless, deploying LLMs on memristor crossbars faces challenges related to size, multi-head attention blocks, and complex nonlinear operations. To address these, we propose a novel architecture for memristor crossbar deployment of state-of-the-art LLM on a single chip or package. Our testing on BERT_Large showed negligible accuracy loss. Compared to traditional memristor crossbars, our architecture achieves enhancements in area overhead (up to 39X) and energy consumption (up to 18X). Compared to modern TPU/GPU systems, our architecture demonstrates reductions in the area-delay product (at least 68X) and energy consumption (69%).
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models are getting a lot of attention because they can do many things. But these models take up a lot of computer power and space to train and use. Memristor crossbars might be the answer, as they save energy and space while still doing important tasks like computer vision. Memristors store data in a special way that’s very efficient. However, using large language models on memristor crossbars has three big problems: how to fit them all in, dealing with complicated math blocks, and handling complex operations. We found a new way to make it work without sacrificing accuracy. It takes up less space (up to 39 times less) and uses less energy (up to 18 times less). Compared to other powerful computer systems, our solution is even better.

Keywords

» Artificial intelligence  » Attention  » Multi head attention