Loading Now

Summary of Cartesianmoe: Boosting Knowledge Sharing Among Experts Via Cartesian Product Routing in Mixture-of-experts, by Zhenpeng Su et al.


CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts

by Zhenpeng Su, Xing Wu, Zijia Lin, Yizhe Xiong, Minxuan Lv, Guangyuan Ma, Hui Chen, Songlin Hu, Guiguang Ding

First submitted to arxiv on: 21 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Large language models (LLMs) have garnered significant attention due to their impressive performances in various tasks. Scaling up LLMs enhances capabilities, but also increases computational complexity. Mixture-of-Experts (MoE) models address this by allowing model size growth without substantial training or inference cost increases. However, MoE models struggle with knowledge sharing among experts, making performance sensitive to routing accuracy. To mitigate this, previous works introduced shared experts and combined outputs of top routed experts using an “addition” manner. This paper proposes CartesianMoE, which implements more effective knowledge sharing among experts in a “multiplication” manner inspired by collective matrix factorization. Experimental results demonstrate that CartesianMoE outperforms previous MoE models for building LLMs, achieving better perplexity and downstream task performance, as well as improved expert routing robustness.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making big language models (LLMs) even better by sharing knowledge between different parts of the model. Right now, these models can get very good at certain tasks, but it’s hard to make them grow without getting too complicated and slow. To solve this problem, some researchers have used “Mixture-of-Experts” (MoE) models, which let you add new parts to the model without making everything else too complex. However, these MoE models still have a problem with sharing knowledge between their different parts. In this paper, we propose a new way of doing this called CartesianMoE, which makes it easier for the different parts of the model to share information and work together. We tested our approach and found that it does better than other methods at building LLMs and achieving good results in various tasks.

Keywords

» Artificial intelligence  » Attention  » Inference  » Mixture of experts  » Perplexity