Loading Now

Summary of Torque-aware Momentum, by Pranshu Malviya et al.


Torque-Aware Momentum

by Pranshu Malviya, Goncalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Gintare Karolina Dziugaite, Razvan Pascanu, Sarath Chandar

First submitted to arxiv on: 25 Dec 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed Torque-Aware Momentum (TAM) algorithm aims to improve the performance of deep neural networks by efficiently exploring complex loss landscapes. TAM addresses issues with oscillations caused by large, misaligned gradients in momentum-based optimizers, such as classical momentum. The algorithm introduces a damping factor based on the angle between new gradients and previous momentum, stabilizing the update direction during training. Experimental results demonstrate that TAM enhances exploration, handles distribution shifts more effectively, and improves generalization performance across various tasks, including image classification and large language model fine-tuning.
Low GrooveSquid.com (original content) Low Difficulty Summary
TAM is a new way to make deep learning networks work better. It helps the network explore its “loss landscape” more efficiently. Right now, people use a kind of momentum to help the network learn. But sometimes this momentum can get stuck in an oscillation pattern, which isn’t good. TAM fixes this by adding a special damping factor that makes sure the network is moving in the right direction. This helps the network generalize better and perform well even when it’s tested on new data.

Keywords

» Artificial intelligence  » Deep learning  » Fine tuning  » Generalization  » Image classification  » Large language model