Summary of Fiddler: Cpu-gpu Orchestration For Fast Inference Of Mixture-of-experts Models, by Keisuke Kamahori et al.
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
by Keisuke Kamahori, Tian Tang, Yile Gu, Kan Zhu, Baris Kasikci
First submitted to arxiv on: 10 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a resource-efficient inference system for Large Language Models (LLMs) with Mixture-of-Experts (MoE) architectures, which have shown promising performance on various tasks. The system, called Fiddler, strategically utilizes CPU and GPU resources to determine the optimal execution strategy for MoE models running in limited GPU environments. The evaluation shows that Fiddler outperforms state-of-the-art systems in all scenarios, achieving 1.26 times speed up in single batch inference, 1.30 times in long prefill processing, and 11.57 times in beam search inference. The code of Fiddler is publicly available on GitHub. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Fiddler is a new way to make big language models work better with limited computer resources. These models are very good at understanding human language but they take up a lot of space and power. Fiddler helps by figuring out how to use both the computer’s processor (CPU) and graphics card (GPU) together in the best way possible. This makes it faster and more efficient, especially when doing things like having conversations or searching for information. |
Keywords
* Artificial intelligence * Inference * Mixture of experts