Loading Now

Summary of On-the-fly Modulation For Balanced Multimodal Learning, by Yake Wei et al.


On-the-fly Modulation for Balanced Multimodal Learning

by Yake Wei, Di Hu, Henghui Du, Ji-Rong Wen

First submitted to arxiv on: 15 Oct 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes two strategies, On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM), to address the issue of imbalanced and under-optimized uni-modal representations in multimodal learning. The joint training strategy used in current models often prioritizes modality with more discriminative information, leading to under-optimization of other modalities. The proposed strategies monitor the discriminative discrepancy between modalities during training and adjust the optimization process accordingly. OPM weakens the influence of dominant modality by dropping its feature in the feed-forward stage, while OGM mitigates its gradient in the back-propagation stage. Experimental results show significant performance improvements across various multimodal tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper solves a problem with how we train models that use information from multiple sources, like pictures and sounds. Right now, these models often prioritize one type of source over others, which can make them less accurate. The authors come up with two new ways to train these models: OPM and OGM. These methods help the model balance its training so that all types of sources are used equally well. This leads to better performance on a range of tasks.

Keywords

* Artificial intelligence  * Optimization