Loading Now

Summary of Cliploss and Norm-based Data Selection Methods For Multimodal Contrastive Learning, by Yiping Wang et al.


CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning

by Yiping Wang, Yifang Chen, Wendan Yan, Alex Fang, Wenjing Zhou, Kevin Jamieson, Simon Shaolei Du

First submitted to arxiv on: 29 May 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes two novel data selection approaches for visual-language model pretraining, specifically targeting noisy web-curated datasets used in CLIP models. The first method, surrogate-CLIPLoss (s-CLIPLoss), modifies the classical CLIP score by incorporating contrastive pairs to improve quality measurement. The second method, NormSim, measures similarity between pretraining data and target data using a norm-based metric. The proposed methods are evaluated on the DataComp benchmark, achieving 5.3% and 2.8% improvements respectively over the best baseline, OpenAI’s CLIP-L/14.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper introduces new methods for selecting high-quality data for visual-language model pretraining, which is crucial for large-scale models like CLIP. The authors propose two approaches to address noisy web-curated datasets: s-CLIPLoss and NormSim. These methods aim to improve data selection by considering the alignment between samples and their contrastive pairs, as well as the similarity between pretraining and target data.

Keywords

» Artificial intelligence  » Alignment  » Language model  » Pretraining