Summary of Time-moe: Billion-scale Time Series Foundation Models with Mixture Of Experts, by Xiaoming Shi et al.
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Expertsby Xiaoming Shi, Shiyu Wang, Yuqi…
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Expertsby Xiaoming Shi, Shiyu Wang, Yuqi…
A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Expertsby Hugo Inzirillo, Remi GenetFirst submitted to…
On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialistsby Dongyang Fan, Bettina Messmer,…
GRIN: GRadient-INformed MoEby Liyuan Liu, Young Jin Kim, Shuohang Wang, Chen Liang, Yelong Shen, Hao…
Mixture of Diverse Size Expertsby Manxi Sun, Wei Liu, Jian Luan, Pengzhi Gao, Bin WangFirst…
LOLA – An Open-Source Massively Multilingual Large Language Modelby Nikit Srivastava, Denis Kuchelev, Tatiana Moteu…
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Modelsby Maryam Akhavan Aghdam, Hongpeng Jin, Yanzhao WuFirst…
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruningby Jaeseong Lee, seung-won hwang, Aurick Qiao, Daniel F…
VE: Modeling Multivariate Time Series Correlation with Variate Embeddingby Shangjiong Wang, Zhihong Man, Zhenwei Cao,…
Alt-MoE:A Scalable Framework for Bidirectional Multimodal Alignment and Efficient Knowledge Integrationby Hongyang Lei, Xiaolong Cheng,…