Summary of Photon: Federated Llm Pre-training, by Lorenzo Sani et al.
Photon: Federated LLM Pre-Training
by Lorenzo Sani, Alex Iacob, Zeyu Cao, Royson Lee, Bill Marino, Yan Gao, Dongqi Cai, Zexi Li, Wanru Zhao, Xinchi Qiu, Nicholas D. Lane
First submitted to arxiv on: 5 Nov 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Distributed, Parallel, and Cluster Computing (cs.DC)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper introduces Photon, a complete system for federated end-to-end large language model (LLM) training that leverages cross-silo federated learning (FL) for global-scale training with minimal communication overheads. The authors show that Photon can train models up to 7B in size in a federated fashion while achieving better perplexity than centralized pre-training. They also demonstrate that Photon’s model training time decreases with available compute, achieving a similar compute-time trade-off to centralized methods. Furthermore, Photon outperforms baseline distributed training methods by 35% through communicating significantly less data. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Federated learning is a way to train artificial intelligence models without sharing personal data from different places. A team of researchers created a new system called Photon that allows many devices with limited computing power to work together and train large language models. This is the first time this has been achieved, and it’s an important step forward in developing more powerful AI models. The authors show that their system can train larger models than before, while also reducing the amount of data needed for training. |
Keywords
» Artificial intelligence » Federated learning » Large language model » Perplexity