Summary of Can Optimization Trajectories Explain Multi-task Transfer?, by David Mueller et al.
Can Optimization Trajectories Explain Multi-Task Transfer?
by David Mueller, Mark Dredze, Nicholas Andrews
First submitted to arxiv on: 26 Aug 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the impact of multi-task training (MTL) on deep learning model generalization. Despite widespread adoption, little is understood about how MTL affects generalization. Prior work has proposed various optimization methods to improve MTL performance, but these have not consistently improved generalization. The authors empirically study how MTL affects task optimization and whether this impact can explain generalization effects. They find a generalization gap between single-task and multi-task models early in training, but factors previously proposed to explain single-task gaps cannot explain the difference. Instead, they show that gradient conflict between tasks is correlated with negative task optimization effects, but not predictive of generalization. This work sheds light on MTL’s underlying causes for failures and raises questions about general-purpose MTL optimization algorithms. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how deep learning models trained to do many things at once (multi-task training) affect their ability to perform well in the future. People have tried different ways to make this type of training work better, but it hasn’t always helped. The researchers wanted to understand why and found that when they train a model to do multiple tasks, its performance on each task is not as good as if it had only been trained for one task. They also discovered that the reasons for this are different from what was previously thought. This study helps us understand why multi-task training can sometimes be less effective than single-task training and raises questions about how we should approach designing these types of models. |
Keywords
» Artificial intelligence » Deep learning » Generalization » Multi task » Optimization