Loading Now

Summary of Omnibal: Towards Fast Instruct-tuning For Vision-language Models Via Omniverse Computation Balance, by Yongqiang Yao et al.


OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance

by Yongqiang Yao, Jingru Tan, Jiahao Hu, Feizhao Zhang, Yazhe Niu, Xin Jin, Bo Li, Ruihao Gong, Pengfei Liu, Dahua Lin, Ningyi Xu

First submitted to arxiv on: 30 Jul 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The recently developed vision-language instruct-tuning models have achieved significant progress, thanks to their comprehensive understanding of the world. However, large-scale 3D parallel training on these models leads to an imbalanced computation load across different devices. This is due to the inherent heterogeneity between the vision and language parts, which affects distributed training efficiency. To address this issue, we rebalanced the computational loads from data, model, and memory perspectives, achieving a more balanced computation across devices. Our approach involves grouping instances into new balanced mini-batches within and across devices, employing a search-based method to achieve a balanced partitioning of the model, and adaptively adjusting the re-computation strategy for each partition to utilize available memory fully. We conducted extensive experiments to validate the effectiveness of our method, achieving about 1.8x speed-up compared to the open-source training code of InternVL-Chat. Our method’s efficacy and generalizability were further demonstrated across various models and datasets.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large vision-language instruct-tuning models have made big progress, but they can be slow because some devices are working harder than others. This is a problem because the model is like two different parts: one for pictures and one for words. These parts need to work together, but it’s hard when they’re not equal. The researchers found that if they balanced how much each part does, they could make the model run faster. They did this by grouping similar things together, using a special method to make sure everything is fair, and adjusting how often the model needs to re-do some work to use up all the memory available. This made their model 1.8 times faster than before! They also tested it with different models and data and it worked well.

Keywords

* Artificial intelligence