Summary of Gw-moe: Resolving Uncertainty in Moe Router with Global Workspace Theory, by Haoze Wu et al.
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theoryby Haoze Wu, Zihan Qiu, Zili…
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theoryby Haoze Wu, Zihan Qiu, Zili…
Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language…
Graph Knowledge Distillation to Mixture of Expertsby Pavel Rumiantsev, Mark CoatesFirst submitted to arxiv on:…
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusionby Anke Tang, Li…
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parametersby Yixin Song, Haotong Xie, Zhengyan…
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approachby Haoyu Han, Juanhui Li,…
Continual Traffic Forecasting via Mixture of Expertsby Sanghyun Lee, Chanyoung ParkFirst submitted to arxiv on:…
Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniquesby Shwai He, Daize Dong,…
Parrot: Multilingual Visual Instruction Tuningby Hai-Long Sun, Da-Wei Zhou, Yang Li, Shiyin Lu, Chao Yi,…
Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture…