MDPI - Publisher of Open Access Journals

24 pages, 3570 KiB

Open AccessArticle

Semi-Supervised Underwater Image Enhancement Method Using Multimodal Features and Dynamic Quality Repository

by Mu Ding, Gen Li, Yu Hu, Hangfei Liu, Qingsong Hu and Xiaohua Huang

J. Mar. Sci. Eng. 2025, 13(6), 1195; https://doi.org/10.3390/jmse13061195 - 19 Jun 2025

Viewed by 378

Obtaining clear underwater images is crucial for smart aquaculture, so it is necessary to repair degraded underwater images. Although underwater image restoration techniques have achieved remarkable results in recent years, the scarcity of labeled data poses a significant challenge to continued advancement. It [...] Read more.

Obtaining clear underwater images is crucial for smart aquaculture, so it is necessary to repair degraded underwater images. Although underwater image restoration techniques have achieved remarkable results in recent years, the scarcity of labeled data poses a significant challenge to continued advancement. It is well known that semi-supervised learning can make use of unlabeled data. In this study, we proposed a semi-supervised underwater image enhancement method, MCR-UIE, which utilized multimodal contrastive learning and a dynamic quality reliability repository to leverage the unlabeled data during training. This approach used multimodal feature contrast regularization to prevent the overfitting of incorrect labels, and secondly, introduced a dynamic quality reliability repository to update the output as pseudo ground truth. The robustness and generalization of the model in pseudo-label generation and unlabeled data learning were improved. Extensive experiments conducted on the UIEB and LSUI datasets demonstrated that the proposed method consistently outperformed existing traditional and deep learning-based approaches in both quantitative and qualitative evaluations. Furthermore, its successful application to images captured from deep-sea cage aquaculture environments validated its practical value. These results indicated that MCR-UIE held strong potential for real-world deployment in intelligent monitoring and visual perception tasks in complex underwater scenarios. Full article

(This article belongs to the Special Issue Selection of Deep-Sea Aquaculture Species and Development of Supporting Technologies and Equipment)

► Show Figures

Figure 1

34 pages, 65802 KiB

Open AccessArticle

Using Citizen Science Data as Pre-Training for Semantic Segmentation of High-Resolution UAV Images for Natural Forests Post-Disturbance Assessment

by Kamyar Nasiri, William Guimont-Martin, Damien LaRocque, Gabriel Jeanson, Hugo Bellemare-Vallières, Vincent Grondin, Philippe Bournival, Julie Lessard, Guillaume Drolet, Jean-Daniel Sylvain and Philippe Giguère

Forests 2025, 16(4), 616; https://doi.org/10.3390/f16040616 - 31 Mar 2025

Viewed by 712

Abstract

The ability to monitor forest areas after disturbances is key to ensure their regrowth. Problematic situations that are detected can then be addressed with targeted regeneration efforts. However, achieving this with automated photo interpretation is problematic, as training such systems requires large amounts [...] Read more.

The ability to monitor forest areas after disturbances is key to ensure their regrowth. Problematic situations that are detected can then be addressed with targeted regeneration efforts. However, achieving this with automated photo interpretation is problematic, as training such systems requires large amounts of labeled data. To this effect, we leverage citizen science data (iNaturalist) to alleviate this issue. More precisely, we seek to generate pre-training data from a classifier trained on selected exemplars. This is accomplished by using a moving-window approach on carefully gathered low-altitude images with an Unmanned Aerial Vehicle (UAV), WilDReF-Q (Wild Drone Regrowth Forest—Quebec) dataset, to generate high-quality pseudo-labels. To generate accurate pseudo-labels, the predictions of our classifier for each window are integrated using a majority voting approach. Our results indicate that pre-training a semantic segmentation network on over 140,000 auto-labeled images yields an

F 1

score of

43.74

% over 24 different classes, on a separate ground truth dataset. In comparison, using only labeled images yields a score of

32.45

%, while fine-tuning the pre-trained network only yields marginal improvements (

46.76

%). Importantly, we demonstrate that our approach is able to benefit from more unlabeled images, opening the door for learning at scale. We also optimized the hyperparameters for pseudo-labeling, including the number of predictions assigned to each pixel in the majority voting process. Overall, this demonstrates that an auto-labeling approach can greatly reduce the development cost of plant identification in regeneration regions, based on UAV imagery. Full article

(This article belongs to the Special Issue Classification of Forest Tree Species Using Remote Sensing Technologies: Latest Advances and Improvements)

► Show Figures

Figure 1

19 pages, 3546 KiB

Open AccessArticle

Proxy-Based Semi-Supervised Cross-Modal Hashing

by Hao Chen, Zhuoyang Zou and Xinghui Zhu

Appl. Sci. 2025, 15(5), 2390; https://doi.org/10.3390/app15052390 - 23 Feb 2025

Cited by 1 | Viewed by 519

Abstract

Due to the difficulty in obtaining label information in practical applications, semi-supervised cross-modal retrieval has emerged. However, the existing semi-supervised cross-modal hashing retrieval methods mainly focus on exploring the structural relationships between data and generating high-quality discrete pseudo-labels while neglecting the relationships between [...] Read more.

Due to the difficulty in obtaining label information in practical applications, semi-supervised cross-modal retrieval has emerged. However, the existing semi-supervised cross-modal hashing retrieval methods mainly focus on exploring the structural relationships between data and generating high-quality discrete pseudo-labels while neglecting the relationships between data and categories, as well as the structural relationships between data and categories inherent in continuous pseudo-labels. Based on this, Proxy-based Semi-Supervised Cross-Modal Hashing (PSSCH) is proposed. Specifically, we propose a category proxy network to generate category center points in both feature and hash spaces. Additionally, we design an Adaptive Dual-Label Loss function, which applies different learning strategies to discrete ground truth labels and continuous pseudo-labels and adaptively increases the training weights of unlabeled data with more epochs. Experiments on the MIRFLICKR-25K, NUS-WIDE, and MS COCO datasets show that PSSCH achieves the highest mAP improvements of 3%, 1%, and 4%, respectively, demonstrating better results than the latest baseline methods. Full article

► Show Figures

Figure 1

24 pages, 2067 KiB

Open AccessArticle

A Self-Supervised Feature Point Detection Method for ISAR Images of Space Targets

by Shengteng Jiang, Xiaoyuan Ren, Canyu Wang, Libing Jiang and Zhuang Wang

Remote Sens. 2025, 17(3), 441; https://doi.org/10.3390/rs17030441 - 28 Jan 2025

Viewed by 585

Abstract

Feature point detection in inverse synthetic aperture radar (ISAR) images of space targets is the foundation for tasks such as analyzing space target motion intent and predicting on-orbit status. Traditional feature point detection methods perform poorly when confronted with the low texture and [...] Read more.

Feature point detection in inverse synthetic aperture radar (ISAR) images of space targets is the foundation for tasks such as analyzing space target motion intent and predicting on-orbit status. Traditional feature point detection methods perform poorly when confronted with the low texture and uneven brightness characteristics of ISAR images. Due to the nonlinear mapping capabilities, neural networks can effectively learn features from ISAR images of space targets, providing new ideas for feature point detection. However, the scarcity of labeled ISAR image data for space targets presents a challenge for research. To address the issue, this paper introduces a self-supervised feature point detection method (SFPD), which can accurately detect the positions of feature points in ISAR images of space targets without true feature point positions during the training process. Firstly, this paper simulates an ISAR primitive dataset and uses it to train the proposed basic feature point detection model. Subsequently, the basic feature point detection model and affine transformation are utilized to label pseudo-ground truth for ISAR images of space targets. Eventually, the labeled ISAR image dataset is used to train SFPD. Therefore, SFPD can be trained without requiring ground truth for the ISAR image dataset. The experiments demonstrate that SFPD has better performance in feature point detection and feature point matching than usual algorithms. Full article

(This article belongs to the Special Issue Space-Photogrammetry for High-Precision Measurement by Multi-Sensor Data Fusion)

► Show Figures

Figure 1

21 pages, 2555 KiB

Open AccessArticle

FldtMatch: Improving Unbalanced Data Classification via Deep Semi-Supervised Learning with Self-Adaptive Dynamic Threshold

by Xin Wu, Jingjing Xu, Kuan Li, Jianping Yin and Jian Xiong

Mathematics 2025, 13(3), 392; https://doi.org/10.3390/math13030392 - 24 Jan 2025

Viewed by 928

Abstract

Among the many methods of deep semi-supervised learning (DSSL), the holistic method combines ideas from other methods, such as consistency regularization and pseudo-labeling, with great success. This method typically introduces a threshold to utilize unlabeled data. If the highest predictive value from unlabeled [...] Read more.

Among the many methods of deep semi-supervised learning (DSSL), the holistic method combines ideas from other methods, such as consistency regularization and pseudo-labeling, with great success. This method typically introduces a threshold to utilize unlabeled data. If the highest predictive value from unlabeled data exceeds the threshold, the associated class is designated as the data’s pseudo-label. However, current methods utilize fixed or dynamic thresholds, disregarding the varying learning difficulties across categories in unbalanced datasets. To overcome these issues, in this paper, we first designed Cumulative Effective Labeling (CEL) to reflect a particular class’s learning difficulty. This approach differs from previous methods because it uses effective pseudo-labels and ground truth, collectively influencing the model’s capacity to acquire category knowledge. In addition, based on CEL, we propose a simple but effective way to compute the threshold, Self-adaptive Dynamic Threshold (SDT). It requires a single hyperparameter to adjust to various scenarios, eliminating the necessity for a unique threshold modification approach for each case. SDT utilizes a clever mapping function that can solve the problem of differential learning difficulty of various categories in an unbalanced image dataset that adversely affects dynamic thresholding. Finally, we propose a deep semi-supervised method with SDT called FldtMatch. Through theoretical analysis and extensive experiments, we have fully proven that FldtMatch can overcome the negative impact of unbalanced data. Regardless of the choice of the backbone network, our method achieves the best results on multiple datasets. The maximum improvement of the macro F1-Score metric is about 5.6% in DFUC2021 and 2.2% in ISIC2018. Full article

(This article belongs to the Special Issue AI-Driven Innovations in Healthcare: Advances in Machine Learning and Computer Vision)

► Show Figures

Figure 1

21 pages, 4128 KiB

Open AccessArticle

GBVSSL: Contrastive Semi-Supervised Learning Based on Generalized Bias-Variance Decomposition

by Shu Li, Lixin Han, Yang Wang and Jun Zhu

Symmetry 2024, 16(6), 724; https://doi.org/10.3390/sym16060724 - 11 Jun 2024

Viewed by 1120

Abstract

Mainstream semi-supervised learning (SSL) techniques, such as pseudo-labeling and contrastive learning, exhibit strong generalization abilities but lack theoretical understanding. Furthermore, pseudo-labeling lacks the label enhancement from high-quality neighbors, while contrastive learning ignores the supervisory guidance provided by genuine labels. To this end, we [...] Read more.

Mainstream semi-supervised learning (SSL) techniques, such as pseudo-labeling and contrastive learning, exhibit strong generalization abilities but lack theoretical understanding. Furthermore, pseudo-labeling lacks the label enhancement from high-quality neighbors, while contrastive learning ignores the supervisory guidance provided by genuine labels. To this end, we first introduce a generalized bias-variance decomposition framework to investigate them. Then, this research inspires us to propose two new techniques to refine them: neighbor-enhanced pseudo-labeling, which enhances confidence-based pseudo-labels by incorporating aggregated predictions from high-quality neighbors; label-enhanced contrastive learning, which enhances feature representation by combining enhanced pseudo-labels and ground-truth labels to construct a reliable and complete symmetric adjacency graph. Finally, we combine these two new techniques to develop an excellent SSL method called GBVSSL. GBVSSL significantly surpasses previous state-of-the-art SSL approaches in standard benchmarks, such as CIFAR-10/100, SVHN, and STL-10. On CIFAR-100 with 400, 2500, and 10,000 labeled samples, GBVSSL outperforms FlexMatch by 3.46%, 2.72%, and 2.89%, respectively. On the real-world dataset Semi-iNat 2021, GBVSSL improves the Top-1 accuracy over CCSSL by 4.38%. Moreover, GBVSSL exhibits faster convergence and enhances unbalanced SSL. Extensive ablation and qualitative studies demonstrate the effectiveness and impact of each component of GBVSSL. Full article

(This article belongs to the Special Issue Advances in Computer Vision, Pattern Recognition, Machine Learning and Symmetry)

► Show Figures

Figure 1

29 pages, 6574 KiB

Open AccessArticle

Semi-TSGAN: Semi-Supervised Learning for Highlight Removal Based on Teacher-Student Generative Adversarial Network

by Yuanfeng Zheng, Yuchen Yan and Hao Jiang

Sensors 2024, 24(10), 3090; https://doi.org/10.3390/s24103090 - 13 May 2024

Cited by 1 | Viewed by 1260

Abstract

Despite recent notable advancements in highlight image restoration techniques, the dearth of annotated data and the lightweight deployment of highlight removal networks pose significant impediments to further advancements in the field. In this paper, to the best of our knowledge, we first propose [...] Read more.

Despite recent notable advancements in highlight image restoration techniques, the dearth of annotated data and the lightweight deployment of highlight removal networks pose significant impediments to further advancements in the field. In this paper, to the best of our knowledge, we first propose a semi-supervised learning paradigm for highlight removal, merging the fusion version of a teacher–student model and a generative adversarial network, featuring a lightweight network architecture. Initially, we establish a dependable repository to house optimal predictions as pseudo ground truth through empirical analyses guided by the most reliable No-Reference Image Quality Assessment (NR-IQA) method. This method serves to assess rigorously the quality of model predictions. Subsequently, addressing concerns regarding confirmation bias, we integrate contrastive regularization into the framework to curtail the risk of overfitting on inaccurate labels. Finally, we introduce a comprehensive feature aggregation module and an extensive attention mechanism within the generative network, considering a balance between network performance and computational efficiency. Our experimental evaluations encompass comprehensive assessments on both full-reference and non-reference highlight benchmarks. The results demonstrate conclusively the substantive quantitative and qualitative enhancements achieved by our proposed algorithm in comparison to state-of-the-art methodologies. Full article

(This article belongs to the Special Issue Image Processing and Analysis for Object Detection: 2nd Edition)

► Show Figures

Figure 1

19 pages, 11008 KiB

Open AccessArticle

SAM-Induced Pseudo Fully Supervised Learning for Weakly Supervised Object Detection in Remote Sensing Images

by Xiaoliang Qian, Chenyang Lin, Zhiwu Chen and Wei Wang

Remote Sens. 2024, 16(9), 1532; https://doi.org/10.3390/rs16091532 - 26 Apr 2024

Cited by 6 | Viewed by 2162

Abstract

Weakly supervised object detection (WSOD) in remote sensing images (RSIs) aims to detect high-value targets by solely utilizing image-level category labels; however, two problems have not been well addressed by existing methods. Firstly, the seed instances (SIs) are mined solely relying on the [...] Read more.

Weakly supervised object detection (WSOD) in remote sensing images (RSIs) aims to detect high-value targets by solely utilizing image-level category labels; however, two problems have not been well addressed by existing methods. Firstly, the seed instances (SIs) are mined solely relying on the category score (CS) of each proposal, which is inclined to concentrate on the most salient parts of the object; furthermore, they are unreliable because the robustness of the CS is not sufficient due to the fact that the inter-category similarity and intra-category diversity are more serious in RSIs. Secondly, the localization accuracy is limited by the proposals generated by the selective search or edge box algorithm. To address the first problem, a segment anything model (SAM)-induced seed instance-mining (SSIM) module is proposed, which mines the SIs according to the object quality score, which indicates the comprehensive characteristic of the category and the completeness of the object. To handle the second problem, a SAM-based pseudo-ground truth-mining (SPGTM) module is proposed to mine the pseudo-ground truth (PGT) instances, for which the localization is more accurate than traditional proposals by fully making use of the advantages of SAM, and the object-detection heads are trained by the PGT instances in a fully supervised manner. The ablation studies show the effectiveness of the SSIM and SPGTM modules. Comprehensive comparisons with 15 WSOD methods demonstrate the superiority of our method on two RSI datasets. Full article

(This article belongs to the Special Issue Object Detection and Information Extraction Based on Remote Sensing Imagery)

► Show Figures

Figure 1

39 pages, 61918 KiB

Open AccessArticle

Learning Ground Displacement Signals Directly from InSAR-Wrapped Interferograms

by Lama Moualla, Alessio Rucci, Giampiero Naletto and Nantheera Anantrasirichai

Sensors 2024, 24(8), 2637; https://doi.org/10.3390/s24082637 - 20 Apr 2024

Cited by 2 | Viewed by 1845

Abstract

Monitoring ground displacements identifies potential geohazard risks early before they cause critical damage. Interferometric synthetic aperture radar (InSAR) is one of the techniques that can monitor these displacements with sub-millimeter accuracy. However, using the InSAR technique is challenging due to the need for [...] Read more.

Monitoring ground displacements identifies potential geohazard risks early before they cause critical damage. Interferometric synthetic aperture radar (InSAR) is one of the techniques that can monitor these displacements with sub-millimeter accuracy. However, using the InSAR technique is challenging due to the need for high expertise, large data volumes, and other complexities. Accordingly, the development of an automated system to indicate ground displacements directly from the wrapped interferograms and coherence maps could be highly advantageous. Here, we compare different machine learning algorithms to evaluate the feasibility of achieving this objective. The inputs for the implemented machine learning models were pixels selected from the filtered-wrapped interferograms of Sentinel-1, using a coherence threshold. The outputs were the same pixels labeled as fast positive, positive, fast negative, negative, and undefined movements. These labels were assigned based on the velocity values of the measurement points located within the pixels. We used the Parallel Small Baseline Subset service of the European Space Agency’s GeoHazards Exploitation Platform to create the necessary interferograms, coherence, and deformation velocity maps. Subsequently, we applied a high-pass filter to the wrapped interferograms to separate the displacement signal from the atmospheric errors. We successfully identified the patterns associated with slow and fast movements by discerning the unique distributions within the matrices representing each movement class. The experiments included three case studies (from Italy, Portugal, and the United States), noted for their high sensitivity to landslides. We found that the Cosine K-nearest neighbor model achieved the best test accuracy. It is important to note that the test sets were not merely hidden parts of the training set within the same region but also included adjacent areas. We further improved the performance with pseudo-labeling, an approach aimed at evaluating the generalizability and robustness of the trained model beyond its immediate training environment. The lowest test accuracy achieved by the implemented algorithm was 80.1%. Furthermore, we used ArcGIS Pro 3.3 to compare the ground truth with the predictions to visualize the results better. The comparison aimed to explore indications of displacements affecting the main roads in the studied area. Full article

(This article belongs to the Special Issue Intelligent SAR Target Detection and Recognition)

► Show Figures

Figure 1

16 pages, 808 KiB

Open AccessArticle

GDUI: Guided Diffusion Model for Unlabeled Images

by Xuanyuan Xie and Jieyu Zhao

Algorithms 2024, 17(3), 125; https://doi.org/10.3390/a17030125 - 18 Mar 2024

Viewed by 3069

Abstract

The diffusion model has made progress in the field of image synthesis, especially in the area of conditional image synthesis. However, this improvement is highly dependent on large annotated datasets. To tackle this challenge, we present the Guided Diffusion model for Unlabeled Images [...] Read more.

The diffusion model has made progress in the field of image synthesis, especially in the area of conditional image synthesis. However, this improvement is highly dependent on large annotated datasets. To tackle this challenge, we present the Guided Diffusion model for Unlabeled Images (GDUI) framework in this article. It utilizes the inherent feature similarity and semantic differences in the data, as well as the downstream transferability of Contrastive Language-Image Pretraining (CLIP), to guide the diffusion model in generating high-quality images. We design two semantic-aware algorithms, namely, the pseudo-label-matching algorithm and label-matching refinement algorithm, to match the clustering results with the true semantic information and provide more accurate guidance for the diffusion model. First, GDUI encodes the image into a semantically meaningful latent vector through clustering. Then, pseudo-label matching is used to complete the matching of the true semantic information of the image. Finally, the label-matching refinement algorithm is used to adjust the irrelevant semantic information in the data, thereby improving the quality of the guided diffusion model image generation. Our experiments on labeled datasets show that GDUI outperforms diffusion models without any guidance and significantly reduces the gap between it and models guided by ground-truth labels. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

12 pages, 3883 KiB

Open AccessEditor’s ChoiceTechnical Note

Exploring Semantic Prompts in the Segment Anything Model for Domain Adaptation

by Ziquan Wang, Yongsheng Zhang, Zhenchao Zhang, Zhipeng Jiang, Ying Yu, Li Li and Lei Li

Remote Sens. 2024, 16(5), 758; https://doi.org/10.3390/rs16050758 - 21 Feb 2024

Cited by 11 | Viewed by 3861

Abstract

Robust segmentation in adverse weather conditions is crucial for autonomous driving. However, these scenes struggle with recognition and make annotations expensive, resulting in poor performance. As a result, the Segment Anything Model (SAM) was recently proposed to finely segment the spatial structure of [...] Read more.

Robust segmentation in adverse weather conditions is crucial for autonomous driving. However, these scenes struggle with recognition and make annotations expensive, resulting in poor performance. As a result, the Segment Anything Model (SAM) was recently proposed to finely segment the spatial structure of scenes and to provide powerful prior spatial information, thus showing great promise in resolving these problems. However, SAM cannot be applied directly for different geographic scales and non-semantic outputs. To address these issues, we propose SAM-EDA, which integrates SAM into an unsupervised domain adaptation mean-teacher segmentation framework. In this method, we use a “teacher-assistant” model to provide semantic pseudo-labels, which will fill in the holes in the fine spatial structure given by SAM and generate pseudo-labels close to the ground truth, which then guide the student model for learning. Here, the “teacher-assistant” model helps to distill knowledge. During testing, only the student model is used, thus greatly improving efficiency. We tested SAM-EDA on mainstream segmentation benchmarks in adverse weather conditions and obtained a more-robust segmentation model. Full article

(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)

► Show Figures

Graphical abstract

17 pages, 3683 KiB

Open AccessArticle

A Weakly Supervised Semantic Segmentation Model of Maize Seedlings and Weed Images Based on Scrawl Labels

by Lulu Zhao, Yanan Zhao, Ting Liu and Hanbing Deng

Sensors 2023, 23(24), 9846; https://doi.org/10.3390/s23249846 - 15 Dec 2023

Cited by 4 | Viewed by 1784

Abstract

The task of semantic segmentation of maize and weed images using fully supervised deep learning models requires a large number of pixel-level mask labels, and the complex morphology of the maize and weeds themselves can further increase the cost of image annotation. To [...] Read more.

The task of semantic segmentation of maize and weed images using fully supervised deep learning models requires a large number of pixel-level mask labels, and the complex morphology of the maize and weeds themselves can further increase the cost of image annotation. To solve this problem, we proposed a Scrawl Label-based Weakly Supervised Semantic Segmentation Network (SL-Net). SL-Net consists of a pseudo label generation module, encoder, and decoder. The pseudo label generation module converts scrawl labels into pseudo labels that replace manual labels that are involved in network training, improving the backbone network for feature extraction based on the DeepLab-V3+ model and using a migration learning strategy to optimize the training process. The results show that the intersection over union of the pseudo labels that are generated by the pseudo label module with the ground truth is 83.32%, and the cosine similarity is 93.55%. In the semantic segmentation testing of SL-Net for image seedling of maize plants and weeds, the mean intersection over union and average precision reached 87.30% and 94.06%, which is higher than the semantic segmentation accuracy of DeepLab-V3+ and PSPNet under weakly and fully supervised learning conditions. We conduct experiments to demonstrate the effectiveness of the proposed method. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)

► Show Figures

Figure 1

17 pages, 1221 KiB

Open AccessArticle

ProMatch: Semi-Supervised Learning with Prototype Consistency

by Ziyu Cheng, Xianmin Wang and Jing Li

Mathematics 2023, 11(16), 3537; https://doi.org/10.3390/math11163537 - 16 Aug 2023

Cited by 2 | Viewed by 2315

Abstract

Recent state-of-the-art semi-supervised learning (SSL) methods have made significant advancements by combining consistency-regularization and pseudo-labeling in a joint learning paradigm. The core concept of these methods is to identify consistency targets (pseudo-labels) by selecting predicted distributions with high confidence from weakly augmented unlabeled [...] Read more.

Recent state-of-the-art semi-supervised learning (SSL) methods have made significant advancements by combining consistency-regularization and pseudo-labeling in a joint learning paradigm. The core concept of these methods is to identify consistency targets (pseudo-labels) by selecting predicted distributions with high confidence from weakly augmented unlabeled samples. However, they often face the problem of erroneous high confident pseudo-labels, which can lead to noisy training. This issue arises due to two main reasons: (1) when the model is poorly calibrated, the prediction of a single sample may be overconfident and incorrect, and (2) propagating pseudo-labels from unlabeled samples can result in error accumulation due to the margin between the pseudo-label and the ground-truth label. To address this problem, we propose a novel consistency criterion called Prototype Consistency (PC) to improve the reliability of pseudo-labeling by leveraging the prototype similarities between labeled and unlabeled samples. First, we instantiate semantic-prototypes (centers of embeddings) and prediction-prototypes (centers of predictions) for each category using memory buffers that store the features of labeled examples. Second, for a given unlabeled sample, we determine the most similar semantic-prototype and prediction-prototype by assessing the similarities between the features of the unlabeled sample and the prototypes of the labeled samples. Finally, instead of using the prediction of the unlabeled sample as the pseudo-label, we select the most similar prediction-prototype as the consistency target, as long as the predicted category of the most similar prediction-prototype, the ground-truth category of the most similar semantic-prototype, and the ground-truth category of the most similar prediction-prototype are equivalent. By combining the PC approach with the techniques developed by the MixMatch family, our proposed ProMatch framework demonstrates significant performance improvements compared to previous algorithms on datasets such as CIFAR-10, CIFAR-100, SVHN, and Mini-ImageNet. Full article

(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)

► Show Figures

Figure 1

12 pages, 1117 KiB

Open AccessArticle

Alleviating Long-Tailed Image Classification via Dynamical Classwise Splitting

by Ye Yuan, Jiaqi Wang, Xin Xu, Ruoshi Li, Yongtong Zhu, Lihong Wan, Qingdu Li and Na Liu

Mathematics 2023, 11(13), 2996; https://doi.org/10.3390/math11132996 - 5 Jul 2023

Viewed by 1809

Abstract

With the rapid increase in data scale, real-world datasets tend to exhibit long-tailed class distributions (i.e., a few classes account for most of the data, while most classes contain only a few data points). General solutions typically exploit class rebalancing strategies involving resampling [...] Read more.

With the rapid increase in data scale, real-world datasets tend to exhibit long-tailed class distributions (i.e., a few classes account for most of the data, while most classes contain only a few data points). General solutions typically exploit class rebalancing strategies involving resampling and reweighting based on the sample number for each class. In this work, we explore an orthogonal direction, category splitting, which is motivated by the empirical observation that naive splitting of majority samples could alleviate the heavy imbalance between majority and minority classes. To this end, we propose a novel classwise splitting (CWS) method built upon a dynamic cluster, where classwise prototypes are updated using a moving average technique. CWS generates intra-class pseudo labels for splitting intra-class samples based on the point-to-point distance. Moreover, a group mapping module was developed to recover the ground truth of the training samples. CWS can be plugged into any existing method as a complement. Comprehensive experiments were conducted on artificially induced long-tailed image classification datasets, such as CIFAR-10-LT, CIFAR-100-LT, and OCTMNIST. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets. Full article

(This article belongs to the Special Issue Advanced Artificial Intelligence Models and Its Applications)

► Show Figures

Figure 1

26 pages, 10301 KiB

Open AccessArticle

Semi-Supervised Person Detection in Aerial Images with Instance Segmentation and Maximum Mean Discrepancy Distance

by Xiangqing Zhang, Yan Feng, Shun Zhang, Nan Wang, Shaohui Mei and Mingyi He

Remote Sens. 2023, 15(11), 2928; https://doi.org/10.3390/rs15112928 - 4 Jun 2023

Cited by 14 | Viewed by 2929

Abstract

Detecting sparse, small, lost persons with only a few pixels in high-resolution aerial images was, is, and remains an important and difficult mission, in which a vital role is played by accurate monitoring and intelligent co-rescuing for the search and rescue (SaR) system. [...] Read more.

Detecting sparse, small, lost persons with only a few pixels in high-resolution aerial images was, is, and remains an important and difficult mission, in which a vital role is played by accurate monitoring and intelligent co-rescuing for the search and rescue (SaR) system. However, many problems have not been effectively solved in existing remote-vision-based SaR systems, such as the shortage of person samples in SaR scenarios and the low tolerance of small objects for bounding boxes. To address these issues, a copy-paste mechanism (ISCP) with semi-supervised object detection (SSOD) via instance segmentation and maximum mean discrepancy distance is proposed (MMD), which can provide highly robust, multi-task, and efficient aerial-based person detection for the prototype SaR system. Specifically, numerous pseudo-labels are obtained by accurately segmenting the instances of synthetic ISCP samples to obtain their boundaries. The SSOD trainer then uses soft weights to balance the prediction entropy of the loss function between the ground truth and unreliable labels. Moreover, a novel evaluation metric MMD for anchor-based detectors is proposed to elegantly compute the IoU of the bounding boxes. Extensive experiments and ablation studies on Heridal and optimized public datasets demonstrate that our approach is effective and achieves state-of-the-art person detection performance in aerial images. Full article

(This article belongs to the Special Issue Active Learning Methods for Remote Sensing Data Processing)

► Show Figures

Figure 1

Search Results (26)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (26)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI