Summary of Post-hoc Reversal: Are We Selecting Models Prematurely?, by Rishabh Ranjan et al.
Post-Hoc Reversal: Are We Selecting Models Prematurely?
by Rishabh Ranjan, Saurabh Garg, Mrigank Raman, Carlos Guestrin, Zachary Lipton
First submitted to arxiv on: 11 Apr 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
| Summary difficulty | Written by | Summary | 
|---|---|---|
| High | Paper authors | High Difficulty Summary Read the original abstract here | 
| Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper presents a comprehensive empirical study on the effectiveness of post-hoc transforms in improving the performance, robustness, and uncertainty estimation of trained models. The authors demonstrate a phenomenon called post-hoc reversal, where the performance trends are reversed after applying post-hoc transforms, particularly in high-noise settings. This phenomenon is shown to affect not only base models but also ensembling and stochastic weight averaging (SWA). The study highlights the importance of considering these post-hoc transforms during model development decisions such as early stopping, checkpointing, and hyperparameter choices. The authors propose a simple technique called post-hoc selection that leverages post-hoc metrics to inform model development decisions. This approach is shown to result in significant improvements, with an LLM instruction tuning dataset achieving >1.5x MMLU improvement compared to naive selection. | 
| Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper shows that using special techniques after a model is trained can actually make it worse. They call this “post-hoc reversal” and found that it happens often when there’s a lot of noise in the data. For example, they found that ensembling (combining multiple models) and SWA (averaging the weights of different models) both tend to favor base models trained for longer periods. This means that you might need to rethink how you train your models and make changes based on how well they perform after these techniques are applied. | 
Keywords
* Artificial intelligence * Early stopping * Hyperparameter * Instruction tuning




