Summary of Momentumsmoe: Integrating Momentum Into Sparse Mixture Of Experts, by Rachel S.y. Teo et al.
MomentumSMoE: Integrating Momentum into Sparse Mixture of Expertsby Rachel S.Y. Teo, Tan M. NguyenFirst submitted…
MomentumSMoE: Integrating Momentum into Sparse Mixture of Expertsby Rachel S.Y. Teo, Tan M. NguyenFirst submitted…
ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Predictionby Haoyu He, Haozheng Luo, Qi…
Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in…
Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Expertsby Fanqi Yan, Huy…
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Routerby Yanyue Xie, Zhi…
MoH: Multi-Head Attention as Mixture-of-Head Attentionby Peng Jin, Bo Zhu, Li Yuan, Shuicheng YanFirst submitted…
AT-MoE: Adaptive Task-planning Mixture of Experts via LoRA Approachby Xurui Li, Juanjuan YaoFirst submitted to…
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Expertsby Xu Liu, Juncheng Liu,…
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Modelsby Jun Luo, Chen Chen,…
ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RLby Zhanqiu Guo,…