Summary of Train Faster, Perform Better: Modular Adaptive Training in Over-parameterized Models, by Yubin Shi et al.

Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

by Yubin Shi, Yixuan Chen, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Yujiang Wang, Robert P. Dick, Qin Lv, Yingying Zhao, Fan Yang, Tun Lu, Ning Gu, Li Shang

First submitted to arxiv on: 13 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed work aims to develop an efficient training strategy for over-parameterized deep-learning models, which are prevalent in many communities but require high computational costs for proper training. The researchers study the fine-grained learning dynamics at the modular level and introduce a novel concept called modular neural tangent kernel (mNTK) to describe these dynamics. They find that the quality of a module’s learning is associated with its mNTK’s principal eigenvalue, which indicates better convergence when high and negatively impacts generalization when low. Based on this discovery, they propose a novel training strategy called Modular Adaptive Training (MAT) that selectively updates modules based on their mNTK values, focusing on learning common features and ignoring inconsistent ones. MAT reduces computational costs by nearly half while improving performance compared to baselines.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This study explores how deep-learning models can be trained more efficiently without sacrificing accuracy. Researchers discovered that different parts of these models learn at different rates, which is important for how well they generalize. They came up with a new way of training called Modular Adaptive Training (MAT) that helps the model focus on learning what’s most important and ignore what’s not. This new method can save computer time by almost half while still being as good or better than other ways.

Keywords

* Artificial intelligence * Deep learning * Generalization

Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

by Yubin Shi, Yixuan Chen, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Yujiang Wang, Robert P. Dick, Qin Lv, Yingying Zhao, Fan Yang, Tun Lu, Ning Gu, Li Shang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Semantic Loss Functions For Neuro-symbolic Structured Prediction, by Kareem Ahmed et al.

Summary of Glira: Black-box Membership Inference Attack Via Knowledge Distillation, by Andrey V. Galichin et al.

Related Posts