Summary of Expertflow: Optimized Expert Activation and Token Allocation For Efficient Mixture-of-experts Inference, by Xin He et al.
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inferenceby Xin He, Shunkang Zhang,…