Summary of Exploiting Inter-layer Expert Affinity For Accelerating Mixture-of-experts Model Inference, by Jinghan Yao et al.
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inferenceby Jinghan Yao, Quentin Anthony, Aamir Shafi,…