Summary of Inference Performance Optimization For Large Language Models on Cpus, by Pujiang He and Shan Zhou and Wenhuan Huang and Changqing Li and Duyi Wang and Bin Guo and Chen Meng and Sheng Gui and Weifei Yu and Yi Xie
Inference Performance Optimization for Large Language Models on CPUsby Pujiang He, Shan Zhou, Wenhuan Huang,…