Summary of Monta: Accelerating Mixture-of-experts Training with Network-traffc-aware Parallel Optimization, by Jingming Guo et al.
MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimizationby Jingming Guo, Yan Liu, Yu Meng, Zhiwei…