Authors
Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin
Publication date
2021
Conference
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Pages
5403-5413
Description
Recently, cross-modal retrieval is emerging with the help of deep multimodal learning. However, even for unimodal data, collecting large-scale well-annotated data is expensive and time-consuming, and not to mention the additional challenges from multiple modalities. Although crowd-sourcing annotation, eg, Amazon's Mechanical Turk, can be utilized to mitigate the labeling cost, but leading to the unavoidable noise in labels for the non-expert annotating. To tackle the challenge, this paper presents a general Multimodal Robust Learning framework (MRL) for learning with multimodal noisy labels to mitigate noisy samples and correlate distinct modalities simultaneously. To be specific, we propose a Robust Clustering loss (RC) to make the deep networks focus on clean samples instead of noisy ones. Besides, a simple yet effective multimodal loss function, called Multimodal Contrastive loss (MC), is proposed to maximize the mutual information between different modalities, thus alleviating the interference of noisy samples and cross-modal discrepancy. Extensive experiments are conducted on four widely-used multimodal datasets to demonstrate the effectiveness of the proposed approach by comparing to 14 state-of-the-art methods.
Total citations
2020202120222023202413243432
Scholar articles
P Hu, X Peng, H Zhu, L Zhen, J Lin - Proceedings of the IEEE/CVF conference on computer …, 2021