Summary of Monde: Mixture Of Near-data Experts For Large-scale Sparse Models, by Taehyun Kim et al.
MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Modelsby Taehyun Kim, Kwanseok Choi, Youngmock Cho,…
MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Modelsby Taehyun Kim, Kwanseok Choi, Youngmock Cho,…
Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Nodeby Andreas Charalampopoulos,…
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Expertsby Mohammed Nowaz Rabbani Chowdhury,…
Wasserstein Distances, Neuronal Entanglement, and Sparsityby Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir…
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Trainingby Xianzhi Du, Tom Gunter, Xiang Kong,…
Unchosen Experts Can Contribute Too: Unleashing MoE Models’ Power by Self-Contrastby Chufan Shi, Cheng Yang,…
Graph Sparsification via Mixture of Graphsby Guibin Zhang, Xiangguo Sun, Yanwei Yue, Chonghe Jiang, Kun…
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Modelsby Yongxin Guo, Zhenglin Cheng,…
Mixture of Experts Meets Prompt-Based Continual Learningby Minh Le, An Nguyen, Huy Nguyen, Trang Nguyen,…
Statistical Advantages of Perturbing Cosine Router in Mixture of Expertsby Huy Nguyen, Pedram Akbarian, Trang…