Loading Now

Summary of Sequence Length Scaling in Vision Transformers For Scientific Images on Frontier, by Aristeidis Tsaris et al.


Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier

by Aristeidis Tsaris, Chengming Zhang, Xiao Wang, Junqi Yin, Siyan Liu, Moetasim Ashfaq, Ming Fan, Jong Youl Choi, Mohamed Wahib, Dan Lu, Prasanna Balaprakash, Feiyi Wang

First submitted to arxiv on: 17 Apr 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Vision Transformers (ViTs) are crucial for foundational models in scientific imagery, including Earth science applications, due to their ability to process large sequence lengths. A novel approach called distributed sequence parallelism is developed to handle sequences up to 1M tokens, building upon DeepSpeed-Ulysses and Long-Sequence-Segmentation with model sharding. This technique achieves a 94% batch scaling efficiency on 2,048 AMD-MI250X GPUs. The evaluation of sequence parallelism in ViTs reveals substantial bottlenecks, which are addressed using hybrid sequence, pipeline, tensor parallelism, and flash attention strategies to scale beyond single GPU memory limits. Notably, this method enhances climate modeling accuracy by 20% in temperature predictions, marking the first training of a transformer model on a full-attention matrix over 188K sequence length.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper talks about a new way to improve computer models that help us understand and predict things like weather patterns. These models are called Vision Transformers (ViTs) and they’re really good at processing long sequences of data. The problem is, as the data gets longer, it takes too much time and power to process it all on one computer. To fix this, the researchers developed a new method that lets multiple computers work together to process the data faster. This helps improve the accuracy of their predictions by 20%, which is important for things like climate modeling.

Keywords

» Artificial intelligence  » Attention  » Temperature  » Transformer