Loading Now

Summary of Understanding the Training Speedup From Sampling with Approximate Losses, by Rudrajit Das et al.


Understanding the Training Speedup from Sampling with Approximate Losses

by Rudrajit Das, Xi Chen, Bertram Ieong, Parikshit Bansal, Sujay Sanghavi

First submitted to arxiv on: 10 Feb 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
In this paper, researchers propose a new approach for reducing training time in deep learning models by selectively sampling samples with large approximate losses. This method, called SIFT, uses an intermediate layer’s representations to estimate the loss and then greedily selects the most promising samples. The authors demonstrate that SIFT can converge to a constant factor of the minimum average loss in fewer iterations than random selection for smooth convex losses. They also theoretically quantify the effect of approximation level on the performance. Furthermore, they evaluate SIFT on training a large-scale BERT model and show significant gains in terms of training hours and number of backpropagation steps without any optimized implementation.
Low GrooveSquid.com (original content) Low Difficulty Summary
SIFT is a new way to train deep learning models faster by choosing which samples are most important. Instead of using all the data equally, SIFT looks at how big the problem is for each sample and chooses the ones that will help the most. This makes training much quicker, especially for large models like BERT. The researchers show that SIFT can make a big difference in real-world scenarios.

Keywords

* Artificial intelligence  * Backpropagation  * Bert  * Deep learning