Mixture of experts – Page 19

April 15, 2025

MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Modelsby Taehyun Kim, Kwanseok Choi, Youngmock Cho,…

April 15, 2025

Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Nodeby Andreas Charalampopoulos,…

April 15, 2025

A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Expertsby Mohammed Nowaz Rabbani Chowdhury,…

April 15, 2025

Wasserstein Distances, Neuronal Entanglement, and Sparsityby Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir…

April 15, 2025

Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Trainingby Xianzhi Du, Tom Gunter, Xiang Kong,…

April 15, 2025

Unchosen Experts Can Contribute Too: Unleashing MoE Models’ Power by Self-Contrastby Chufan Shi, Cheng Yang,…

April 15, 2025

Graph Sparsification via Mixture of Graphsby Guibin Zhang, Xiangguo Sun, Yanwei Yue, Chonghe Jiang, Kun…

April 15, 2025

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Modelsby Yongxin Guo, Zhenglin Cheng,…

April 15, 2025

Mixture of Experts Meets Prompt-Based Continual Learningby Minh Le, An Nguyen, Huy Nguyen, Trang Nguyen,…

April 15, 2025

Statistical Advantages of Perturbing Cosine Router in Mixture of Expertsby Huy Nguyen, Pedram Akbarian, Trang…