Loading Now

Summary of Anchor-based Large Language Models, by Jianhui Pang et al.


Anchor-based Large Language Models

by Jianhui Pang, Fanghua Ye, Derek Fai Wong, Xin He, Wanshun Chen, Longyue Wang

First submitted to arxiv on: 12 Feb 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Large language models (LLMs) typically employ decoder-only transformer architectures to provide contextual information and avoid redundant computation. However, these models require massive GPU memory due to their substantial size and parameter volume. This memory demand increases with the length of input text, making it necessary to develop more efficient methods for information storage and processing. The proposed Anchor-based LLMs (AnLLMs) use an innovative anchor-based self-attention network (AnSAN) and anchor-based inference strategy to compress sequence information into an anchor token, reducing the keys/values cache and enhancing inference efficiency. Experimental results on question-answering benchmarks demonstrate that AnLLMs maintain similar accuracy levels while achieving up to 99% keys/values cache reduction and up to 3.5 times faster inference.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making language models more efficient by using a new way of storing information. Right now, these models need a lot of memory because they’re very big and have many parameters. When we ask them questions or give them longer texts to process, they need even more memory. To solve this problem, the researchers came up with a new method called Anchor-based LLMs (AnLLMs). This approach helps language models store less information and work faster without losing accuracy. The results show that AnLLMs can be up to 3.5 times faster than current models while still giving good answers.

Keywords

» Artificial intelligence  » Decoder  » Inference  » Question answering  » Self attention  » Token  » Transformer