Summary of Torque-aware Momentum, by Pranshu Malviya et al.
Torque-Aware Momentum
by Pranshu Malviya, Goncalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Gintare Karolina Dziugaite, Razvan Pascanu, Sarath Chandar
First submitted to arxiv on: 25 Dec 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Torque-Aware Momentum (TAM) algorithm aims to improve the performance of deep neural networks by efficiently exploring complex loss landscapes. TAM addresses issues with oscillations caused by large, misaligned gradients in momentum-based optimizers, such as classical momentum. The algorithm introduces a damping factor based on the angle between new gradients and previous momentum, stabilizing the update direction during training. Experimental results demonstrate that TAM enhances exploration, handles distribution shifts more effectively, and improves generalization performance across various tasks, including image classification and large language model fine-tuning. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary TAM is a new way to make deep learning networks work better. It helps the network explore its “loss landscape” more efficiently. Right now, people use a kind of momentum to help the network learn. But sometimes this momentum can get stuck in an oscillation pattern, which isn’t good. TAM fixes this by adding a special damping factor that makes sure the network is moving in the right direction. This helps the network generalize better and perform well even when it’s tested on new data. |
Keywords
» Artificial intelligence » Deep learning » Fine tuning » Generalization » Image classification » Large language model