Loading Now

Summary of Optimus: Accelerating Large-scale Multi-modal Llm Training by Bubble Exploitation, By Weiqi Feng et al.


Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

by Weiqi Feng, Yangrui Chen, Shaoyu Wang, Yanghua Peng, Haibin Lin, Minlan Yu

First submitted to arxiv on: 7 Aug 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Multimodal large language models (MLLMs) have achieved significant performance in various domains, such as multimodal translation, visual question answering, and content generation. However, existing systems are inefficient for training MLLMs due to GPU bubbles caused by heterogeneous modality models and complex data dependencies in 3D parallelism. Optimus, a distributed MLLM training system, proposes scheduling encoder computation within LLM bubbles to reduce bubbles and accelerate end-to-end MLLM training time. The system searches separate parallel plans for encoder and LLM, adopts bubble scheduling algorithms, decomposes encoder layer computation into kernels, and optimizes sub-millisecond bubble scheduling. Our experiments show that Optimus accelerates MLLM training by 20.5%-21.3% with ViT-22B and GPT-175B models over 3072 GPUs compared to baselines.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models can understand multiple types of data, like images and text. This helps them perform tasks like translating between languages or answering questions about pictures. But making these models work efficiently is a challenge. The paper presents Optimus, a new way to train large language models that uses many computers together. It does this by scheduling the parts of the model that process information in a special way, which reduces the time it takes to train the model. We tested Optimus and found that it can speed up training by 20-21% compared to other methods.

Keywords

» Artificial intelligence  » Encoder  » Gpt  » Question answering  » Translation  » Vit