MDPI - Publisher of Open Access Journals

19 pages, 944 KiB

Open AccessArticle

Patch-Font: Enhancing Few-Shot Font Generation with Patch-Based Attention and Multitask Encoding

by Irfanullah Memon, Muhammad Ammar Ul Hassan and Jaeyoung Choi

Appl. Sci. 2025, 15(3), 1654; https://doi.org/10.3390/app15031654 - 6 Feb 2025

Viewed by 1397

Few-shot font generation seeks to create high-quality fonts using minimal reference style images, addressing traditional font design’s labor-intensive and time-consuming nature, particularly for languages with large character sets like Chinese and Korean. Existing methods often require multi-stage training or predefined components, which can [...] Read more.

Few-shot font generation seeks to create high-quality fonts using minimal reference style images, addressing traditional font design’s labor-intensive and time-consuming nature, particularly for languages with large character sets like Chinese and Korean. Existing methods often require multi-stage training or predefined components, which can be time-consuming and limit generalizability. This paper introduces Patch-Font, a novel single-stage method that overcomes the limitations of prior approaches, such as multi-stage training or reliance on predefined components, by integrating a patch-based attention mechanism and a multitask encoder. Patch-Font jointly captures global style elements (e.g., overall font family characteristics) and local style details (e.g., serifs, stroke shapes), ensuring high fidelity to the target style while maintaining computational efficiency. Our approach incorporates triplet margin loss with hard positive/negative mining to disentangle style from content and a style fidelity loss to enhance local style consistency. Experiments on Korean (printed and handwritten) and Chinese fonts demonstrate that Patch-Font outperforms state-of-the-art methods in style accuracy, perceptual quality, and generation speed while generalizing robustly to unseen characters and font styles. By simplifying the font creation process and delivering high-quality results, Patch-Font represents a significant step forward in making font design more accessible and scalable for diverse languages. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

18 pages, 1469 KiB

Open AccessArticle

A Multi-Scale CNN for Transfer Learning in sEMG-Based Hand Gesture Recognition for Prosthetic Devices

by Riccardo Fratti, Niccolò Marini, Manfredo Atzori, Henning Müller, Cesare Tiengo and Franco Bassetto

Sensors 2024, 24(22), 7147; https://doi.org/10.3390/s24227147 - 7 Nov 2024

Cited by 7 | Viewed by 2677

Abstract

Advancements in neural network approaches have enhanced the effectiveness of surface Electromyography (sEMG)-based hand gesture recognition when measuring muscle activity. However, current deep learning architectures struggle to achieve good generalization and robustness, often demanding significant computational resources. The goal of this paper was [...] Read more.

Advancements in neural network approaches have enhanced the effectiveness of surface Electromyography (sEMG)-based hand gesture recognition when measuring muscle activity. However, current deep learning architectures struggle to achieve good generalization and robustness, often demanding significant computational resources. The goal of this paper was to develop a robust model that can quickly adapt to new users using Transfer Learning. We propose a Multi-Scale Convolutional Neural Network (MSCNN), pre-trained with various strategies to improve inter-subject generalization. These strategies include domain adaptation with a gradient-reversal layer and self-supervision using triplet margin loss. We evaluated these approaches on several benchmark datasets, specifically the NinaPro databases. This study also compared two different Transfer Learning frameworks designed for user-dependent fine-tuning. The second Transfer Learning framework achieved a 97% F1 Score across 14 classes with an average of 1.40 epochs, suggesting potential for on-site model retraining in cases of performance degradation over time. The findings highlight the effectiveness of Transfer Learning in creating adaptive, user-specific models for sEMG-based prosthetic hands. Moreover, the study examined the impacts of rectification and window length, with a focus on real-time accessible normalizing techniques, suggesting significant improvements in usability and performance. Full article

(This article belongs to the Special Issue Wearable Sensors for Human Health Monitoring and Analysis)

► Show Figures

Figure 1

16 pages, 2278 KiB

Open AccessArticle

Deep Learning to Authenticate Traditional Handloom Textile

by Anindita Das, Aniruddha Deka, Kishore Medhi and Manob Jyoti Saikia

Information 2024, 15(8), 465; https://doi.org/10.3390/info15080465 - 4 Aug 2024

Cited by 1 | Viewed by 3212

Abstract

Handloom textile products play an essential role in both the financial and cultural landscape of natives, necessitating accurate and efficient methods for authenticating against replicated powerloom textiles for the protection of heritage and indigenous weavers’ economic viability. This paper presents a new approach [...] Read more.

Handloom textile products play an essential role in both the financial and cultural landscape of natives, necessitating accurate and efficient methods for authenticating against replicated powerloom textiles for the protection of heritage and indigenous weavers’ economic viability. This paper presents a new approach to the automated identification of handloom textiles leveraging a deep metric learning technique. A labeled handloom textile dataset of 25,166 images was created by collecting handloom textile samples of six unique types, working with indigenous weavers in Assam, Northeast India. The proposed method achieved remarkable success by acquiring biased feature representations that facilitate the effective separation of different fiber types in a learned feature space. Through extensive experimentation and comparison with baseline models, our approach demonstrated superior efficiency in classifying handloom textiles with an accuracy of 97.8%. Our approach not only contributes to the preservation and promotion of traditional textile craftsmanship in the region but also highlights its significance. Full article

(This article belongs to the Special Issue Intelligent Image Processing by Deep Learning)

► Show Figures

Figure 1

20 pages, 1758 KiB

Open AccessArticle

A Novel Geo-Localization Method for UAV and Satellite Images Using Cross-View Consistent Attention

by Zhuofan Cui, Pengwei Zhou, Xiaolong Wang, Zilun Zhang, Yingxuan Li, Hongbo Li and Yu Zhang

Remote Sens. 2023, 15(19), 4667; https://doi.org/10.3390/rs15194667 - 23 Sep 2023

Cited by 12 | Viewed by 6304

Abstract

Geo-localization has been widely applied as an important technique to get the longitude and latitude for unmanned aerial vehicle (UAV) navigation in outdoor flight. Due to the possible interference and blocking of GPS signals, the method based on image retrieval, which is less [...] Read more.

Geo-localization has been widely applied as an important technique to get the longitude and latitude for unmanned aerial vehicle (UAV) navigation in outdoor flight. Due to the possible interference and blocking of GPS signals, the method based on image retrieval, which is less likely to be interfered with, has received extensive attention in recent years. The geo-localization of UAVs and satellites can be achieved by querying pre-obtained satellite images with GPS-tagged and drone images from different perspectives. In this paper, an image transformation technique is used to extract cross-view geo-localization information from UAVs and satellites. A single-stage training method in UAV and satellite geo-localization is first proposed, which simultaneously realizes cross-view feature extraction and image retrieval, and achieves higher accuracy than existing multi-stage training techniques. A novel piecewise soft-margin triplet loss function is designed to avoid model parameters being trapped in suboptimal sets caused by the lack of constraint on positive and negative samples. The results illustrate that the proposed loss function enhances image retrieval accuracy and realizes a better convergence. Moreover, a data augmentation method for satellite images is proposed to overcome the disproportionate numbers of image samples. On the benchmark University-1652, the proposed method achieves the state-of-the-art result with a 6.67% improvement in recall rate (R@1) and 6.13% in average precision (AP). All codes will be publicized to promote reproducibility. Full article

► Show Figures

Figure 1

16 pages, 12986 KiB

Open AccessArticle

Label Smoothing Auxiliary Classifier Generative Adversarial Network with Triplet Loss for SAR Ship Classification

by Congan Xu, Long Gao, Hang Su, Jianting Zhang, Junfeng Wu and Wenjun Yan

Remote Sens. 2023, 15(16), 4058; https://doi.org/10.3390/rs15164058 - 16 Aug 2023

Cited by 2 | Viewed by 1961

Abstract

Deep-learning-based SAR ship classification has become a research hotspot in the military and civilian fields and achieved remarkable performance. However, the volume of available SAR ship classification data is relatively small, meaning that previous deep-learning-based methods have usually struggled with overfitting problems. Moreover, [...] Read more.

Deep-learning-based SAR ship classification has become a research hotspot in the military and civilian fields and achieved remarkable performance. However, the volume of available SAR ship classification data is relatively small, meaning that previous deep-learning-based methods have usually struggled with overfitting problems. Moreover, due to the limitation of the SAR imaging mechanism, the large intraclass diversity and small interclass similarity further degrade the classification performance. To address these issues, we propose a label smoothing auxiliary classifier generative adversarial network with triplet loss (LST-ACGAN) for SAR ship classification. In our method, an ACGAN is introduced to generate SAR ship samples with category labels. To address the model collapse problem in the ACGAN, the smooth category labels are assigned to generated samples. Moreover, triplet loss is integrated into the ACGAN for discriminative feature learning to enhance the margin of different classes. Extensive experiments on the OpenSARShip dataset demonstrate the superior performance of our method compared to the previous methods. Full article

(This article belongs to the Special Issue Artificial Intelligence Algorithm for Remote Sensing Imagery Processing III)

► Show Figures

Figure 1

17 pages, 1463 KiB

Open AccessArticle

Margin-Based Modal Adaptive Learning for Visible-Infrared Person Re-Identification

by Qianqian Zhao, Hanxiao Wu and Jianqing Zhu

Sensors 2023, 23(3), 1426; https://doi.org/10.3390/s23031426 - 27 Jan 2023

Cited by 4 | Viewed by 2779

Abstract

Visible-infrared person re-identification (VIPR) has great potential for intelligent transportation systems for constructing smart cities, but it is challenging to utilize due to the huge modal discrepancy between visible and infrared images. Although visible and infrared data can appear to be two domains, [...] Read more.

Visible-infrared person re-identification (VIPR) has great potential for intelligent transportation systems for constructing smart cities, but it is challenging to utilize due to the huge modal discrepancy between visible and infrared images. Although visible and infrared data can appear to be two domains, VIPR is not identical to domain adaptation as it can massively eliminate modal discrepancies. Because VIPR has complete identity information on both visible and infrared modalities, once the domain adaption is overemphasized, the discriminative appearance information on the visible and infrared domains would drain. For that, we propose a novel margin-based modal adaptive learning (MMAL) method for VIPR in this paper. On each domain, we apply triplet and label smoothing cross-entropy functions to learn appearance-discriminative features. Between the two domains, we design a simple yet effective marginal maximum mean discrepancy (M

^{3}

D) loss function to avoid an excessive suppression of modal discrepancies to protect the features’ discriminative ability on each domain. As a result, our MMAL method could learn modal-invariant yet appearance-discriminative features for improving VIPR. The experimental results show that our MMAL method acquires state-of-the-art VIPR performance, e.g., on the RegDB dataset in the visible-to-infrared retrieval mode, the rank-1 accuracy is 93.24% and the mean average precision is 83.77%. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning in Sensing and Image Processing)

► Show Figures

Figure 1

17 pages, 3285 KiB

Open AccessArticle

Pseudo-Phoneme Label Loss for Text-Independent Speaker Verification

by Mengqi Niu, Liang He, Zhihua Fang, Baowei Zhao and Kai Wang

Appl. Sci. 2022, 12(15), 7463; https://doi.org/10.3390/app12157463 - 25 Jul 2022

Cited by 2 | Viewed by 2144

Abstract

Compared with text-independent speaker verification (TI-SV) systems, text-dependent speaker verification (TD-SV) counterparts often have better performance for their efficient utilization of speech content information. On this account, some TI-SV methods tried to boost performance by incorporating an extra automatic speech recognition (ASR) component [...] Read more.

Compared with text-independent speaker verification (TI-SV) systems, text-dependent speaker verification (TD-SV) counterparts often have better performance for their efficient utilization of speech content information. On this account, some TI-SV methods tried to boost performance by incorporating an extra automatic speech recognition (ASR) component to explore content information, such as c-vector. However, the introduced ASR component requires a large amount of annotated data and consumes high computation resources. In this paper, we propose a pseudo-phoneme label (PPL) loss for the TI-SR task by integrating content cluster loss at the frame level and speaker recognition loss at the segment level in a unified network by multitask learning, without additional data requirement and exhausting computation. By referring to HuBERT, we generate pseudo-phoneme labels to adjust a frame level feature distribution by deep cluster to ensure each cluster corresponds to an implicit pronunciation unit in the feature space. We compare the proposed loss with the softmax loss, center loss, triplet loss, log-likelihood-ratio cost loss, additive margin softmax loss and additive angular margin loss on the VoxCeleb database. Experimental results demonstrate the effectiveness of our proposed method. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 5464 KiB

Open AccessArticle

Dynamic Re-Weighting and Cross-Camera Learning for Unsupervised Person Re-Identification

by Qingze Yin, Guan’an Wang, Jinlin Wu, Haonan Luo and Zhenmin Tang

Mathematics 2022, 10(10), 1654; https://doi.org/10.3390/math10101654 - 12 May 2022

Cited by 5 | Viewed by 2000

Abstract

Person Re-Identification (ReID) has witnessed tremendous improvements with the help of deep convolutional neural networks (CNN). Nevertheless, because different fields have their characteristics, most existing methods encounter the problem of poor generalization ability to invisible people. To address this problem, based on the [...] Read more.

Person Re-Identification (ReID) has witnessed tremendous improvements with the help of deep convolutional neural networks (CNN). Nevertheless, because different fields have their characteristics, most existing methods encounter the problem of poor generalization ability to invisible people. To address this problem, based on the relationship between the temporal and camera position, we propose a robust and effective training strategy named temporal smoothing dynamic re-weighting and cross-camera learning (TSDRC). It uses robust and effective algorithms to transfer valuable knowledge of existing labeled source domains to unlabeled target domains. In the target domain training stage, TSDRC iteratively clusters the samples into several centers and dynamically re-weights unlabeled samples from each center with a temporal smoothing score. Then, cross-camera triplet loss is proposed to fine-tune the source domain model. Particularly, to improve the discernibility of CNN models in the source domain, generally shared person attributes and margin-based softmax loss are adapted to train the source model. In terms of the unlabeled target domain, the samples are clustered into several centers iteratively and the unlabeled samples are dynamically re-weighted from each center. Then, cross-camera triplet loss is proposed to fine-tune the source domain model. Comprehensive experiments on the Market-1501 and DukeMTMC-reID datasets demonstrate that the proposed method vastly improves the performance of unsupervised domain adaptability. Full article

(This article belongs to the Special Issue Mathematical Methods in Image Processing and Computer Vision)

► Show Figures

Figure 1

21 pages, 1489 KiB

Open AccessArticle

A Class-Incremental Learning Method Based on Preserving the Learned Feature Space for EEG-Based Emotion Recognition

by Magdiel Jiménez-Guarneros and Roberto Alejo-Eleuterio

Mathematics 2022, 10(4), 598; https://doi.org/10.3390/math10040598 - 15 Feb 2022

Cited by 9 | Viewed by 3622

Abstract

Deep learning-based models have shown to be one of the main active research topics in emotion recognition systems from Electroencephalogram (EEG) signals. However, a significant challenge is to effectively recognize new emotions that are incorporated sequentially, as current models must perform retraining from [...] Read more.

Deep learning-based models have shown to be one of the main active research topics in emotion recognition systems from Electroencephalogram (EEG) signals. However, a significant challenge is to effectively recognize new emotions that are incorporated sequentially, as current models must perform retraining from scratch. In this paper, we propose a Class-Incremental Learning (CIL) method, named Incremental Learning preserving the Learned Feature Space (IL2FS), in order to enable deep learning models to incorporate new emotions (classes) into the already known. IL2FS performs a weight aligning to correct the bias on new classes, while it incorporates margin ranking loss and triplet loss to preserve the inter-class separation and feature space alignment on known classes. We evaluated IL2FS over two public datasets (DREAMER and DEAP) for emotion recognition and compared it with other recent and popular CIL methods reported in computer vision. Experimental results show that IL2FS outperforms other CIL methods by obtaining an average accuracy of 59.08 ± 08.26% and 79.36 ± 04.68% on DREAMER and DEAP, recognizing data from new emotions that are incorporated sequentially. Full article

(This article belongs to the Special Issue Mathematical Methods and Applications for Artificial Intelligence and Computer Vision)

► Show Figures

Figure 1

29 pages, 18411 KiB

Open AccessArticle

Deep Descriptor Learning with Auxiliary Classification Loss for Retrieving Images of Silk Fabrics in the Context of Preserving European Silk Heritage

by Mareike Dorozynski and Franz Rottensteiner

ISPRS Int. J. Geo-Inf. 2022, 11(2), 82; https://doi.org/10.3390/ijgi11020082 - 21 Jan 2022

Cited by 4 | Viewed by 3601

Abstract

With the growing number of digitally available collections consisting of images depicting relevant objects from the past in relation with descriptive annotations, the need for suitable information retrieval techniques is becoming increasingly important to support historians in their work. In this context, we [...] Read more.

With the growing number of digitally available collections consisting of images depicting relevant objects from the past in relation with descriptive annotations, the need for suitable information retrieval techniques is becoming increasingly important to support historians in their work. In this context, we address the problem of image retrieval for searching records in a database of silk fabrics. The descriptors, used as an index to the database, are learned by a convolutional neural network, exploiting the available annotations to automatically generate training data. Descriptor learning is combined with auxiliary classification loss with the aim of supporting the clustering in the descriptor space with respect to the properties of the depicted silk objects, such as the place or time of origin. We evaluate our approach on a dataset of fabric images in a kNN-classification, showing promising results with respect to the ability of the descriptors to represent semantic properties of silk fabrics; integrating the auxiliary loss improves the overall accuracy by 2.7% and the average F1 score by 5.6%. It can be observed that the largest improvements can be obtained for variables with imbalanced class distributions. An evaluation on the WikiArt dataset demonstrates the transferability of our approach to other digital collections. Full article

(This article belongs to the Special Issue Machine Learning and Deep Learning in Cultural Heritage)

► Show Figures

Figure 1

16 pages, 5582 KiB

Open AccessArticle

Comparing Class-Aware and Pairwise Loss Functions for Deep Metric Learning in Wildlife Re-Identification

by Nkosikhona Dlamini and Terence L. van Zyl

Sensors 2021, 21(18), 6109; https://doi.org/10.3390/s21186109 - 12 Sep 2021

Cited by 4 | Viewed by 3570

Abstract

Similarity learning using deep convolutional neural networks has been applied extensively in solving computer vision problems. This attraction is supported by its success in one-shot and zero-shot classification applications. The advances in similarity learning are essential for smaller datasets or datasets in which [...] Read more.

Similarity learning using deep convolutional neural networks has been applied extensively in solving computer vision problems. This attraction is supported by its success in one-shot and zero-shot classification applications. The advances in similarity learning are essential for smaller datasets or datasets in which few class labels exist per class such as wildlife re-identification. Improving the performance of similarity learning models comes with developing new sampling techniques and designing loss functions better suited to training similarity in neural networks. However, the impact of these advances is tested on larger datasets, with limited attention given to smaller imbalanced datasets such as those found in unique wildlife re-identification. To this end, we test the advances in loss functions for similarity learning on several animal re-identification tasks. We add two new public datasets, Nyala and Lions, to the challenge of animal re-identification. Our results are state of the art on all public datasets tested except Pandas. The achieved Top-1 Recall is

94.8

% on the Zebra dataset,

72.3

% on the Nyala dataset,

79.7

% on the Chimps dataset and, on the Tiger dataset, it is

88.9

%. For the Lion dataset, we set a new benchmark at

94.8

%. We find that the best performing loss function across all datasets is generally the triplet loss; however, there is only a marginal improvement compared to the performance achieved by Proxy-NCA models. We demonstrate that no single neural network architecture combined with a loss function is best suited for all datasets, although VGG-11 may be the most robust first choice. Our results highlight the need for broader experimentation and exploration of loss functions and neural network architecture for the more challenging task, over classical benchmarks, of wildlife re-identification. Full article

(This article belongs to the Special Issue Sensors and Artificial Intelligence for Wildlife Conservation)

► Show Figures

Figure 1

17 pages, 26803 KiB

Open AccessArticle

MFCosface: A Masked-Face Recognition Algorithm Based on Large Margin Cosine Loss

by Hongxia Deng, Zijian Feng, Guanyu Qian, Xindong Lv, Haifang Li and Gang Li

Appl. Sci. 2021, 11(16), 7310; https://doi.org/10.3390/app11167310 - 9 Aug 2021

Cited by 50 | Viewed by 8237

Abstract

The world today is being hit by COVID-19. As opposed to fingerprints and ID cards, facial recognition technology can effectively prevent the spread of viruses in public places because it does not require contact with specific sensors. However, people also need to wear [...] Read more.

The world today is being hit by COVID-19. As opposed to fingerprints and ID cards, facial recognition technology can effectively prevent the spread of viruses in public places because it does not require contact with specific sensors. However, people also need to wear masks when entering public places, and masks will greatly affect the accuracy of facial recognition. Accurately performing facial recognition while people wear masks is a great challenge. In order to solve the problem of low facial recognition accuracy with mask wearers during the COVID-19 epidemic, we propose a masked-face recognition algorithm based on large margin cosine loss (MFCosface). Due to insufficient masked-face data for training, we designed a masked-face image generation algorithm based on the detection of the detection of key facial features. The face is detected and aligned through a multi-task cascaded convolutional network; and then we detect the key features of the face and select the mask template for coverage according to the positional information of the key features. Finally, we generate the corresponding masked-face image. Through analysis of the masked-face images, we found that triplet loss is not applicable to our datasets, because the results of online triplet selection contain fewer mask changes, making it difficult for the model to learn the relationship between mask occlusion and feature mapping. We use a large margin cosine loss as the loss function for training, which can map all the feature samples in a feature space with a smaller intra-class distance and a larger inter-class distance. In order to make the model pay more attention to the area that is not covered by the mask, we designed an Att-inception module that combines the Inception-Resnet module and the convolutional block attention module, which increases the weight of any unoccluded area in the feature map, thereby enlarging the unoccluded area’s contribution to the identification process. Experiments on several masked-face datasets have proved that our algorithm greatly improves the accuracy of masked-face recognition, and can accurately perform facial recognition with masked subjects. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

29 pages, 5090 KiB

Open AccessFeature PaperArticle

Deep Learning of Appearance Affinity for Multi-Object Tracking and Re-Identification: A Comparative View

by María J. Gómez-Silva, Arturo de la Escalera and José M. Armingol

Electronics 2020, 9(11), 1757; https://doi.org/10.3390/electronics9111757 - 22 Oct 2020

Cited by 7 | Viewed by 2727

Abstract

Recognizing the identity of a query individual in a surveillance sequence is the core of Multi-Object Tracking (MOT) and Re-Identification (Re-Id) algorithms. Both tasks can be addressed by measuring the appearance affinity between people observations with a deep neural model. Nevertheless, the differences [...] Read more.

Recognizing the identity of a query individual in a surveillance sequence is the core of Multi-Object Tracking (MOT) and Re-Identification (Re-Id) algorithms. Both tasks can be addressed by measuring the appearance affinity between people observations with a deep neural model. Nevertheless, the differences in their specifications and, consequently, in the characteristics and constraints of the available training data for each one of these tasks, arise from the necessity of employing different learning approaches to attain each one of them. This article offers a comparative view of the Double-Margin-Contrastive and the Triplet loss function, and analyzes the benefits and drawbacks of applying each one of them to learn an Appearance Affinity model for Tracking and Re-Identification. A batch of experiments have been conducted, and their results support the hypothesis concluded from the presented study: Triplet loss function is more effective than the Contrastive one when an Re-Id model is learnt, and, conversely, in the MOT domain, the Contrastive loss can better discriminate between pairs of images rendering the same person or not. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

18 pages, 4585 KiB

Open AccessArticle

High-Rankness Regularized Semi-Supervised Deep Metric Learning for Remote Sensing Imagery

by Jian Kang, Rubén Fernández-Beltrán, Zhen Ye, Xiaohua Tong, Pedram Ghamisi and Antonio Plaza

Remote Sens. 2020, 12(16), 2603; https://doi.org/10.3390/rs12162603 - 12 Aug 2020

Cited by 13 | Viewed by 6555

Abstract

Deep metric learning has recently received special attention in the field of remote sensing (RS) scene characterization, owing to its prominent capabilities for modeling distances among RS images based on their semantic information. Most of the existing deep metric learning methods exploit pairwise [...] Read more.

Deep metric learning has recently received special attention in the field of remote sensing (RS) scene characterization, owing to its prominent capabilities for modeling distances among RS images based on their semantic information. Most of the existing deep metric learning methods exploit pairwise and triplet losses to learn the feature embeddings with the preservation of semantic-similarity, which requires the construction of image pairs and triplets based on the supervised information (e.g., class labels). However, generating such semantic annotations becomes a completely unaffordable task in large-scale RS archives, which may eventually constrain the availability of sufficient training data for this kind of models. To address this issue, we reformulate the deep metric learning scheme in a semi-supervised manner to effectively characterize RS scenes. Specifically, we aim at learning metric spaces by utilizing the supervised information from a small number of labeled RS images and exploring the potential decision boundaries for massive sets of unlabeled aerial scenes. In order to reach this goal, a joint loss function, composed of a normalized softmax loss with margin and a high-rankness regularization term, is proposed, as well as its corresponding optimization algorithm. The conducted experiments (including different state-of-the-art methods and two benchmark RS archives) validate the effectiveness of the proposed approach for RS image classification, clustering and retrieval tasks. The codes of this paper are publicly available. Full article

(This article belongs to the Special Issue Deep Learning for Remote Sensing Data)

► Show Figures

Figure 1

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI