Loading Now

Summary of G-meta: Distributed Meta Learning in Gpu Clusters For Large-scale Recommender Systems, by Youshao Xiao et al.


G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender Systems

by Youshao Xiao, Shangchun Zhao, Zhenglei Zhou, Zhaoxin Huan, Lin Ju, Xiaolu Zhang, Lin Wang, Jun Zhou

First submitted to arxiv on: 9 Jan 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper presents a high-performance framework for large-scale training of Optimization-based Meta DLRM models over a GPU cluster, dubbed G-Meta. The framework addresses efficiency issues in distributed training by leveraging both data parallelism and model parallelism, as well as optimizing computation and communication. Additionally, it proposes a Meta-IO pipeline to alleviate I/O bottlenecks. Experimental results show that G-Meta achieves notable training speed without compromising statistical performance. The framework has been deployed in Alipay’s core advertising and recommender system, reducing continuous delivery of models by four times and improving Conversion Rate (CVR) and Cost Per Mille (CPM) in homepage display advertising.
Low GrooveSquid.com (original content) Low Difficulty Summary
G-Meta is a new way to train AI models that work better for cold-start scenarios. It’s like a supercharged engine for training large-scale data. The team created a special framework called G-Meta that uses multiple GPUs to speed up the process and makes it more efficient. They also came up with a clever way to handle data ingestion, which helps reduce the time it takes to prepare data. Tests show that G-Meta is really fast without sacrificing accuracy. It’s already being used in Alipay’s main advertising system, making it faster to deliver new models and improving how well they work.

Keywords

* Artificial intelligence  * Optimization