Summary of Train Faster, Perform Better: Modular Adaptive Training in Over-parameterized Models, by Yubin Shi et al.
Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models
by Yubin Shi, Yixuan Chen, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Yujiang Wang, Robert P. Dick, Qin Lv, Yingying Zhao, Fan Yang, Tun Lu, Ning Gu, Li Shang
First submitted to arxiv on: 13 May 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed work aims to develop an efficient training strategy for over-parameterized deep-learning models, which are prevalent in many communities but require high computational costs for proper training. The researchers study the fine-grained learning dynamics at the modular level and introduce a novel concept called modular neural tangent kernel (mNTK) to describe these dynamics. They find that the quality of a module’s learning is associated with its mNTK’s principal eigenvalue, which indicates better convergence when high and negatively impacts generalization when low. Based on this discovery, they propose a novel training strategy called Modular Adaptive Training (MAT) that selectively updates modules based on their mNTK values, focusing on learning common features and ignoring inconsistent ones. MAT reduces computational costs by nearly half while improving performance compared to baselines. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This study explores how deep-learning models can be trained more efficiently without sacrificing accuracy. Researchers discovered that different parts of these models learn at different rates, which is important for how well they generalize. They came up with a new way of training called Modular Adaptive Training (MAT) that helps the model focus on learning what’s most important and ignore what’s not. This new method can save computer time by almost half while still being as good or better than other ways. |
Keywords
» Artificial intelligence » Deep learning » Generalization