Summary of Learning to Rematch Mismatched Pairs For Robust Cross-modal Retrieval, by Haochen Han et al.
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
by Haochen Han, Qinghua Zheng, Guang Dai, Minnan Luo, Jingdong Wang
First submitted to arxiv on: 8 Mar 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Multimedia (cs.MM)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper addresses the challenge of collecting well-matched multimedia datasets for training cross-modal retrieval models. The issue is that real-world data often contains Partially Mismatched Pairs (PMPs), which can significantly harm performance. Previous efforts have used soft correspondence to down-weight PMPs, but this paper proposes a new approach using Optimal Transport (OT) to rematch mismatched pairs. The proposed L2RM framework learns refined alignments by seeking a minimal-cost transport plan across different modalities. This involves a self-supervised cost function that automatically learns from explicit similarity-cost mapping relation and a partial OT problem that restricts transport among false positives to boost refined alignments. Experimental results on three benchmarks demonstrate that L2RM improves the robustness of existing models against PMPs. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps us make computers better at understanding different types of information, like images and words. Right now, we have a big problem: when we collect data from the internet, it often has pieces that don’t match up. This can mess up how well the computer can find what we’re looking for. The authors of this paper came up with a new way to fix this problem by matching up the information in a special way. They used something called Optimal Transport, which helps computers figure out how similar different things are. The results show that their method makes the computer much better at finding what we want even when the data isn’t perfect. |
Keywords
» Artificial intelligence » Self supervised