Loading Now

Summary of Subllm: a Novel Efficient Architecture with Token Sequence Subsampling For Llm, by Quandong Wang et al.


SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM

by Quandong Wang, Yuxuan Yuan, Xiaoyu Yang, Ruike Zhang, Kang Zhao, Wei Liu, Jian Luan, Daniel Povey, Bin Wang

First submitted to arxiv on: 3 Jun 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes SUBLLM, a novel architecture that aims to address the efficiency challenges of training and inference in Large Language Models (LLLMs). The SUBLLM framework extends the decoder-only LLM by incorporating subsampling, upsampling, and bypass modules. These modules work together to shorten sequences, restore their original length, and enhance convergence. Compared to LLaMA, SUBLLM shows significant improvements in both training and inference speeds as well as memory usage while maintaining competitive few-shot performance. The proposed architecture achieves a 26% speed increase during training, cutting memory usage by 10GB per GPU. In inference mode, SUBLLM boosts speeds by up to 37% and reduces memory by 1GB per GPU. When the context window is expanded to 8192, SUBLLM’s performance can be further enhanced by 34% in training and 52% in inference. The authors’ code is available on GitHub, enabling researchers to explore and build upon this innovative architecture.
Low GrooveSquid.com (original content) Low Difficulty Summary
SUBLLM is a new way to make Large Language Models (LLMs) more efficient. The model has special parts called subsampling, upsampling, and bypass modules that work together to make the model faster and use less memory. This is important because LLMs are very big and need a lot of resources to train and run. The SUBLLM model is better than other models like LLaMA at being fast and using less memory while still doing well on tasks. It’s also good for training and running the model. You can find the code for this model on GitHub so you can try it out.

Keywords

» Artificial intelligence  » Context window  » Decoder  » Few shot  » Inference  » Llama