Loading Now

Summary of Yuan 2.0-m32: Mixture Of Experts with Attention Router, by Shaohua Wu et al.


Yuan 2.0-M32: Mixture of Experts with Attention Router

by Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun Li, Xudong Zhao, Tong Yu, Chao Wang, Yue Wang, Fei Wang, Weixu Qiao, Houbo He, Zeru Zhang, Zeyu Sun, Junxiong Mao, Chong Shen

First submitted to arxiv on: 28 May 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper presents a new neural network model called Yuan 2.0-M32, which uses a mixture-of-experts architecture with 32 experts to improve accuracy and efficiency. The model employs an Attention Router network for efficient expert selection, reducing computation consumption by 9.25% compared to dense models. Yuan 2.0-M32 is trained from scratch on 2000B tokens and demonstrates competitive capabilities in coding, math, and various domains of expertise with only 3.7B active parameters out of a total of 40B. The model’s forward computation per token is 7.4 GFlops, which is significantly lower than Llama3-70B. Yuan 2.0-M32 surpasses Llama3-70B on the MATH and ARC-Challenge benchmarks, achieving accuracy rates of 55.89% and 95.8%, respectively. The models and source codes are released on GitHub.
Low GrooveSquid.com (original content) Low Difficulty Summary
Yuan 2.0-M32 is a new AI model that helps computers understand and do tasks better. It’s like having many experts work together to solve problems. The model uses a special way of combining these experts, called an Attention Router, which makes it more efficient and accurate. Yuan 2.0-M32 was trained on a huge amount of data and can do things like math and coding quickly and accurately. It even beats other similar models on certain tasks! The researchers hope this work will help create better AI systems that can help us in many ways.

Keywords

» Artificial intelligence  » Attention  » Mixture of experts  » Neural network  » Token