Summary of Understanding the Training Speedup From Sampling with Approximate Losses, by Rudrajit Das et al.

Understanding the Training Speedup from Sampling with Approximate Losses

by Rudrajit Das, Xi Chen, Bertram Ieong, Parikshit Bansal, Sujay Sanghavi

First submitted to arxiv on: 10 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary In this paper, researchers propose a new approach for reducing training time in deep learning models by selectively sampling samples with large approximate losses. This method, called SIFT, uses an intermediate layer’s representations to estimate the loss and then greedily selects the most promising samples. The authors demonstrate that SIFT can converge to a constant factor of the minimum average loss in fewer iterations than random selection for smooth convex losses. They also theoretically quantify the effect of approximation level on the performance. Furthermore, they evaluate SIFT on training a large-scale BERT model and show significant gains in terms of training hours and number of backpropagation steps without any optimized implementation.
Low	GrooveSquid.com (original content)	Low Difficulty Summary SIFT is a new way to train deep learning models faster by choosing which samples are most important. Instead of using all the data equally, SIFT looks at how big the problem is for each sample and chooses the ones that will help the most. This makes training much quicker, especially for large models like BERT. The researchers show that SIFT can make a big difference in real-world scenarios.

Keywords

* Artificial intelligence * Backpropagation * Bert * Deep learning

Understanding the Training Speedup from Sampling with Approximate Losses

by Rudrajit Das, Xi Chen, Bertram Ieong, Parikshit Bansal, Sujay Sanghavi

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Gemini Goes to Med School: Exploring the Capabilities Of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations, by Ankit Pal et al.

Summary of Pasoa- Particle Based Bayesian Optimal Adaptive Design, by Jacopo Iollo et al.

Related Posts