Next Article in Journal
Modeling of Duty-Cycled MAC Protocols for Heterogeneous WSN with Priorities
Previous Article in Journal
Framework Integrating Lossy Compression and Perturbation for the Case of Smart Meter Privacy
Previous Article in Special Issue
Structure Fusion Based on Graph Convolutional Networks for Node Classification in Citation Networks
Open AccessArticle

Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval

School of Information and Communication Engineering, Communication University of China, Beijing 100024, China
*
Author to whom correspondence should be addressed.
Electronics 2020, 9(3), 466; https://doi.org/10.3390/electronics9030466
Received: 14 February 2020 / Revised: 2 March 2020 / Accepted: 6 March 2020 / Published: 10 March 2020
(This article belongs to the Special Issue Deep Neural Networks and Their Applications)
Multi-modal retrieval is a challenge due to heterogeneous gap and a complex semantic relationship between different modal data. Typical research map different modalities into a common subspace with a one-to-one correspondence or similarity/dissimilarity relationship of inter-modal data, in which the distances of heterogeneous data can be compared directly; thus, inter-modal retrieval can be achieved by the nearest neighboring search. However, most of them ignore intra-modal relations and complicated semantics between multi-modal data. In this paper, we propose a deep multi-modal metric learning method with multi-scale semantic correlation to deal with the retrieval tasks between image and text modalities. A deep model with two branches is designed to nonlinearly map raw heterogeneous data into comparable representations. In contrast to binary similarity, we formulate semantic relationship with multi-scale similarity to learn fine-grained multi-modal distances. Inter-modal and intra-modal correlations constructed on multi-scale semantic similarity are incorporated to train the deep model in an end-to-end way. Experiments validate the effectiveness of our proposed method on multi-modal retrieval tasks, and our method outperforms state-of-the-art methods on NUS-WIDE, MIR Flickr, and Wikipedia datasets. View Full-Text
Keywords: deep learning; metric learning; multi-modal correlation; cross-modal retrieval; image–text retrieval deep learning; metric learning; multi-modal correlation; cross-modal retrieval; image–text retrieval
Show Figures

Figure 1

MDPI and ACS Style

Hua, Y.; Yang, Y.; Du, J. Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval. Electronics 2020, 9, 466.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop