MDPI - Publisher of Open Access Journals

17 pages, 3498 KB

Open AccessArticle

Self-Supervised Learning and Multi-Sensor Fusion for Alpine Wetland Vegetation Mapping: Bayinbuluke, China

by Muhammad Murtaza Zaka, Alim Samat, Jilili Abuduwaili, Enzhao Zhu, Arslan Akhtar and Wenbo Li

Plants 2025, 14(20), 3153; https://doi.org/10.3390/plants14203153 - 13 Oct 2025

Viewed by 818

Accurate mapping of wetland vegetation is essential for ecological monitoring and conservation, yet it remains challenging due to the spatial heterogeneity of wetlands, the scarcity of ground-truth data, and the spread of invasive species. Invasive plants alter native vegetation patterns, making their early [...] Read more.

Accurate mapping of wetland vegetation is essential for ecological monitoring and conservation, yet it remains challenging due to the spatial heterogeneity of wetlands, the scarcity of ground-truth data, and the spread of invasive species. Invasive plants alter native vegetation patterns, making their early detection critical for preserving ecosystem integrity. This study proposes a novel framework that integrates self-supervised learning (SSL), supervised segmentation, and multi-sensor data fusion to enhance vegetation classification in the Bayinbuluke Alpine Wetland, China. High-resolution satellite imagery from PlanetScope-3 and Jilin-1 was fused, and SSL methods—including BYOL, DINO, and MoCo v3—were employed to learn transferable feature representations without extensive labeled data. The results show that SSL methods exhibit consistent variations in classification performance, while multi-sensor fusion significantly improves the detection of rare and fragmented vegetation patches and enables the early identification of invasive species. Overall, the proposed SSL–fusion strategy reduces reliance on labor-intensive field data collection and provides a scalable, high-precision solution for wetland monitoring and invasive species management. Full article

(This article belongs to the Special Issue Computer Vision Techniques for Plant Phenomics Applications)

► Show Figures

Graphical abstract

17 pages, 3628 KB

Open AccessArticle

A Unified Self-Supervised Framework for Plant Disease Detection on Laboratory and In-Field Images

by Xiaoli Huan, Bernard Chen and Hong Zhou

Electronics 2025, 14(17), 3410; https://doi.org/10.3390/electronics14173410 - 27 Aug 2025

Cited by 1 | Viewed by 1284

Abstract

Early and accurate detection of plant diseases is essential for ensuring food security and maintaining sustainable agricultural productivity. However, most deep learning models for plant disease classification rely heavily on large-scale annotated datasets, which are expensive, labor-intensive, and often impractical to obtain in [...] Read more.

Early and accurate detection of plant diseases is essential for ensuring food security and maintaining sustainable agricultural productivity. However, most deep learning models for plant disease classification rely heavily on large-scale annotated datasets, which are expensive, labor-intensive, and often impractical to obtain in real-world farming environments. To address this limitation, we propose a unified self-supervised learning (SSL) framework that leverages unlabeled plant imagery to learn meaningful and transferable visual representations. Our method integrates three complementary objectives—Bootstrap Your Own Latent (BYOL), Masked Image Modeling (MIM), and contrastive learning—within a ResNet101 backbone, optimized through a hybrid loss function that captures global alignment, local structure, and instance-level distinction. GPU-based data augmentations are used to introduce stochasticity and enhance generalization during pretraining. Experimental results on the challenging PlantDoc dataset demonstrate that our model achieves an accuracy of 77.82%, with macro-averaged precision, recall, and F1-score of 80.00%, 78.24%, and 77.48%, respectively—on par with or exceeding most state-of-the-art supervised and self-supervised approaches. Furthermore, when fine-tuned on the PlantVillage dataset, the pretrained model attains 99.85% accuracy, highlighting its strong cross-domain generalization and practical transferability. These findings underscore the potential of self-supervised learning as a scalable, annotation-efficient, and robust solution for plant disease detection in real-world agricultural settings, especially where labeled data is scarce or unavailable. Full article

(This article belongs to the Special Issue Deep Learning in Video and Image Processing: Challenges, Solutions, and Future Directions)

► Show Figures

Figure 1

20 pages, 6052 KB

Open AccessArticle

Representation Learning for Vision-Based Autonomous Driving via Probabilistic World Modeling

by Haoqiang Chen, Yadong Liu and Dewen Hu

Machines 2025, 13(3), 231; https://doi.org/10.3390/machines13030231 - 12 Mar 2025

Viewed by 1647

Abstract

Representation learning plays a vital role in autonomous driving by extracting meaningful features from raw sensory inputs. World models emerge as an effective approach to representation learning by capturing predictive features that can anticipate multiple possible futures, which is particularly suited for driving [...] Read more.

Representation learning plays a vital role in autonomous driving by extracting meaningful features from raw sensory inputs. World models emerge as an effective approach to representation learning by capturing predictive features that can anticipate multiple possible futures, which is particularly suited for driving scenarios. However, existing world model approaches face two critical limitations: First, conventional methods rely heavily on computationally expensive variational inference that requires decoding back to high-dimensional observation space. Second, current end-to-end autonomous driving systems demand extensive labeled data for training, resulting in prohibitive annotation costs. To address these challenges, we present BYOL-Drive, a novel method that firstly introduces the self-supervised representation-learning paradigm BYOL (Bootstrap Your Own Latent) to implement world modeling. Our method eliminates the computational burden of observation space decoding while requiring substantially fewer labeled data compared to mainstream approaches. Additionally, our model only relies on monocular camera images as input, making it easy to deploy and generalize. Based on this learned representation, experiments on the standard closed-loop CARLA benchmark demonstrate that our BYOL-Drive achieves competitive performance with improved computational efficiency and significantly reduced annotation requirements compared to the state-of-the-art methods. Our work contributes to the development of end-to-end autonomous driving. Full article

► Show Figures

Figure 1

17 pages, 916 KB

Open AccessArticle

A Multi-Scale Self-Supervision Approach for Bearing Anomaly Detection Using Sensor Data Under Multiple Operating Conditions

by Zhuoheng Dai, Lei Jiang, Feifan Li and Yingna Chen

Sensors 2025, 25(4), 1185; https://doi.org/10.3390/s25041185 - 15 Feb 2025

Cited by 3 | Viewed by 1303

Abstract

Early fault detection technologies play a decisive role in preventing equipment failures in industrial production. The primary challenges in early fault detection for industrial applications include the severe imbalance of time-series data, where normal operating data vastly outnumber anomalous data, and in some [...] Read more.

Early fault detection technologies play a decisive role in preventing equipment failures in industrial production. The primary challenges in early fault detection for industrial applications include the severe imbalance of time-series data, where normal operating data vastly outnumber anomalous data, and in some cases, anomalies may be virtually absent. Additionally, the frequent changes in operational modes during machinery operation further complicate the detection process, making it difficult to effectively identify faults across varying conditions. This study proposes a bearing early anomaly detection method based on contrastive learning and reconstruction approaches to address the aforementioned issues. The raw time-domain vibration data, which were collected from sensors mounted on the bearings of the machinery, are first preprocessed using the Ricker wavelet transform to effectively remove noise and extract useful signal components. These processed signals are then fed into a BYOL-based contrastive learning network to learn more discriminative global feature representations. In addition, we design the reconstruction loss to complement contrastive learning. By reconstructing the masked original data, the reconstruction loss forces the model to learn detailed information, thereby emphasizing the preservation and restoration of local details. Our model not only eliminates the reliance on negative samples found in mainstream unsupervised methods but also captures data features more comprehensively, achieving superior fault detection accuracy under different operating conditions compared to related methods. Experiments on the widely used CWRU multi-condition-bearing fault dataset demonstrate that our method achieves an average fault detection accuracy of 96.97%. Moreover, the experimental results show that on the full-cycle IMS dataset, our method detects early faults at least 2.3 h earlier than the other unsupervised methods. Furthermore, the validation results for the full-cycle XJTU-SY dataset further demonstrate its excellent generalization ability. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

18 pages, 6146 KB

Open AccessArticle

A Near-Infrared Imaging System for Robotic Venous Blood Collection

by Zhikang Yang, Mao Shi, Yassine Gharbi, Qian Qi, Huan Shen, Gaojian Tao, Wu Xu, Wenqi Lyu and Aihong Ji

Sensors 2024, 24(22), 7413; https://doi.org/10.3390/s24227413 - 20 Nov 2024

Cited by 1 | Viewed by 3391

Abstract

Venous blood collection is a widely used medical diagnostic technique, and with rapid advancements in robotics, robotic venous blood collection has the potential to replace traditional manual methods. The success of this robotic approach is heavily dependent on the quality of vein imaging. [...] Read more.

Venous blood collection is a widely used medical diagnostic technique, and with rapid advancements in robotics, robotic venous blood collection has the potential to replace traditional manual methods. The success of this robotic approach is heavily dependent on the quality of vein imaging. In this paper, we develop a vein imaging device based on the simulation analysis of vein imaging parameters and propose a U-Net+ResNet18 neural network for vein image segmentation. The U-Net+ResNet18 neural network integrates the residual blocks from ResNet18 into the encoder of the U-Net to form a new neural network. ResNet18 is pre-trained using the Bootstrap Your Own Latent (BYOL) framework, and its encoder parameters are transferred to the U-Net+ResNet18 neural network, enhancing the segmentation performance of vein images with limited labelled data. Furthermore, we optimize the AD-Census stereo matching algorithm by developing a variable-weight version, which improves its adaptability to image variations across different regions. Results show that, compared to U-Net, the BYOL+U-Net+ResNet18 method achieves an 8.31% reduction in Binary Cross-Entropy (BCE), a 5.50% reduction in Hausdorff Distance (HD), a 15.95% increase in Intersection over Union (IoU), and a 9.20% increase in the Dice coefficient (Dice), indicating improved image segmentation quality. The average error of the optimized AD-Census stereo matching algorithm is reduced by 25.69%, and the improvement of the image stereo matching performance is more obvious. Future research will explore the application of the vein imaging system in robotic venous blood collection to facilitate real-time puncture guidance. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

14 pages, 5869 KB

Open AccessTechnical Note

Intelligent Recognition of Coastal Outfall Drainage Based on Sentinel-2/MSI Imagery

by Hongzhe Li, Xianqiang He, Yan Bai, Fang Gong, Teng Li and Difeng Wang

Remote Sens. 2024, 16(2), 423; https://doi.org/10.3390/rs16020423 - 22 Jan 2024

Cited by 1 | Viewed by 2367

Abstract

In this study, we developed an innovative and self-supervised pretraining approach using Sentinel-2/MSI satellite imagery specifically designed for the intelligent identification of drainage at sea discharge outlets. By integrating the geographical information from remote sensing images into our proposed methodology, we surpassed the [...] Read more.

In this study, we developed an innovative and self-supervised pretraining approach using Sentinel-2/MSI satellite imagery specifically designed for the intelligent identification of drainage at sea discharge outlets. By integrating the geographical information from remote sensing images into our proposed methodology, we surpassed the classification accuracy of conventional models, such as MoCo (momentum contrast) and BYOL (bootstrap your own latent). Using Sentinel-2/MSI remote sensing imagery, we developed our model through an unsupervised dataset comprising 25,600 images. The model was further refined using a supervised dataset composed of 1100 images. After supervised fine-tuning, the resulting framework yielded an adept model that was capable of classifying outfall drainage with an accuracy rate of 90.54%, facilitating extensive outfall monitoring. A series of ablation experiments affirmed the effectiveness of our enhancement of the training framework, showing a 10.81% improvement in accuracy compared to traditional models. Furthermore, the authenticity of the learned features was further validated using visualization techniques. This study contributes an efficient approach to large-scale monitoring of coastal outfalls, with implications for augmenting environmental protection measures and reducing manual inspection efforts. Full article

► Show Figures

Figure 1

14 pages, 933 KB

Open AccessArticle

Self-Supervised Clustering Models Based on BYOL Network Structure

by Xuehao Chen, Jin Zhou, Yuehui Chen, Shiyuan Han, Yingxu Wang, Tao Du, Cheng Yang and Bowen Liu

Electronics 2023, 12(23), 4723; https://doi.org/10.3390/electronics12234723 - 21 Nov 2023

Cited by 4 | Viewed by 2406

Abstract

Contrastive-based clustering models usually rely on a large number of negative pairs to capture uniform representations, which requires a large batch size and high computational complexity. In contrast, some self-supervised methods perform non-contrastive learning to capture discriminative representations only with positive pairs, but [...] Read more.

Contrastive-based clustering models usually rely on a large number of negative pairs to capture uniform representations, which requires a large batch size and high computational complexity. In contrast, some self-supervised methods perform non-contrastive learning to capture discriminative representations only with positive pairs, but suffer from the collapse of clustering. To solve these issues, a novel end-to-end self-supervised clustering model is proposed in this paper. The basic self-supervised learning network is first modified, followed by the incorporation of a Softmax layer to obtain cluster assignments as data representation. Then, adversarial learning on the cluster assignments is integrated into the methods to further enhance discrimination across different clusters and mitigate the collapse between clusters. To further encourage clustering-oriented guidance, a new cluster-level discrimination is assembled to promote clustering performance by measuring the self-correlation between the learned cluster assignments. Experimental results on real-world datasets exhibit better performance of the proposed model compared with the existing deep clustering methods. Full article

(This article belongs to the Special Issue Deep Learning for Data Mining: Theory, Methods, and Applications)

► Show Figures

Figure 1

19 pages, 1137 KB

Open AccessArticle

Unlocking the Potential of Data Augmentation in Contrastive Learning for Hyperspectral Image Classification

by Jinhui Li, Xiaorun Li and Yunfeng Yan

Remote Sens. 2023, 15(12), 3123; https://doi.org/10.3390/rs15123123 - 15 Jun 2023

Cited by 13 | Viewed by 3653

Abstract

Despite the rapid development of deep learning in hyperspectral image classification (HSIC), most models require a large amount of labeled data, which are both time-consuming and laborious to obtain. However, contrastive learning can extract spatial–spectral features from samples without labels, which helps to [...] Read more.

Despite the rapid development of deep learning in hyperspectral image classification (HSIC), most models require a large amount of labeled data, which are both time-consuming and laborious to obtain. However, contrastive learning can extract spatial–spectral features from samples without labels, which helps to solve the above problem. Our focus is on optimizing the contrastive learning process and improving feature extraction from all samples. In this study, we propose the Unlocking-the-Potential-of-Data-Augmentation (UPDA) strategy, which involves adding superior data augmentation methods to enhance the representation of features extracted by contrastive learning. Specifically, we introduce three augmentation methods—band erasure, gradient mask, and random occlusion—to the Bootstrap-Your-Own-Latent (BYOL) structure. Our experimental results demonstrate that our method can effectively improve feature representation and thus improve classification accuracy. Additionally, we conduct ablation experiments to explore the effectiveness of different data augmentation methods. Full article

(This article belongs to the Special Issue Advances in Hyperspectral Data Exploitation II)

► Show Figures

Figure 1

27 pages, 1846 KB

Open AccessArticle

Nearest Neighboring Self-Supervised Learning for Hyperspectral Image Classification

by Yao Qin, Yuanxin Ye, Yue Zhao, Junzheng Wu, Han Zhang, Kenan Cheng and Kun Li

Remote Sens. 2023, 15(6), 1713; https://doi.org/10.3390/rs15061713 - 22 Mar 2023

Cited by 15 | Viewed by 3639

Abstract

Recently, state-of-the-art classification performance of natural images has been obtained by self-supervised learning (S2L) as it can generate latent features through learning between different views of the same images. However, the latent semantic information of similar images has hardly been exploited by these [...] Read more.

Recently, state-of-the-art classification performance of natural images has been obtained by self-supervised learning (S2L) as it can generate latent features through learning between different views of the same images. However, the latent semantic information of similar images has hardly been exploited by these S2L-based methods. Consequently, to explore the potential of S2L between similar samples in hyperspectral image classification (HSIC), we propose the nearest neighboring self-supervised learning (N2SSL) method, by interacting between different augmentations of reliable nearest neighboring pairs (RN2Ps) of HSI samples in the framework of bootstrap your own latent (BYOL). Specifically, there are four main steps: pretraining of spectral spatial residual network (SSRN)-based BYOL, generation of nearest neighboring pairs (N2Ps), training of BYOL based on RN2P, final classification. Experimental results of three benchmark HSIs validated that S2L on similar samples can facilitate subsequent classification. Moreover, we found that BYOL trained on an un-related HSI can be fine-tuned for classification of other HSIs with less computational cost and higher accuracy than training from scratch. Beyond the methodology, we present a comprehensive review of HSI-related data augmentation (DA), which is meaningful to future research of S2L on HSIs. Full article

(This article belongs to the Special Issue Computational Intelligence for Remote Sensing Image Analysis and Applications)

► Show Figures

Figure 1

26 pages, 27891 KB

Open AccessArticle

How Well Do Self-Supervised Models Transfer to Medical Imaging?

by Jonah Anton, Liam Castelli, Mun Fai Chan, Mathilde Outters, Wan Hee Tang, Venus Cheung, Pancham Shukla, Rahee Walambe and Ketan Kotecha

J. Imaging 2022, 8(12), 320; https://doi.org/10.3390/jimaging8120320 - 1 Dec 2022

Cited by 8 | Viewed by 6526

Abstract

Self-supervised learning approaches have seen success transferring between similar medical imaging datasets, however there has been no large scale attempt to compare the transferability of self-supervised models against each other on medical images. In this study, we compare the generalisability of seven self-supervised [...] Read more.

Self-supervised learning approaches have seen success transferring between similar medical imaging datasets, however there has been no large scale attempt to compare the transferability of self-supervised models against each other on medical images. In this study, we compare the generalisability of seven self-supervised models, two of which were trained in-domain, against supervised baselines across eight different medical datasets. We find that ImageNet pretrained self-supervised models are more generalisable than their supervised counterparts, scoring up to 10% better on medical classification tasks. The two in-domain pretrained models outperformed other models by over 20% on in-domain tasks, however they suffered significant loss of accuracy on all other tasks. Our investigation of the feature representations suggests that this trend may be due to the models learning to focus too heavily on specific areas. Full article

(This article belongs to the Topic Medical Image Analysis)

► Show Figures

Figure 1

14 pages, 3012 KB

Open AccessArticle

Specific Emitter Identification Model Based on Improved BYOL Self-Supervised Learning

by Dongxing Zhao, Junan Yang, Hui Liu and Keju Huang

Electronics 2022, 11(21), 3485; https://doi.org/10.3390/electronics11213485 - 27 Oct 2022

Cited by 13 | Viewed by 2913

Abstract

Specific emitter identification (SEI) is extracting the features of the received radio signals and determining the emitter individuals that generate the signals. Although deep learning-based methods have been effectively applied for SEI, their performance declines dramatically with the smaller number of labeled training [...] Read more.

Specific emitter identification (SEI) is extracting the features of the received radio signals and determining the emitter individuals that generate the signals. Although deep learning-based methods have been effectively applied for SEI, their performance declines dramatically with the smaller number of labeled training samples and in the presence of significant noise. To address this issue, we propose an improved Bootstrap Your Own Late (BYOL) self-supervised learning scheme to fully exploit the unlabeled samples, which comprises the pretext task adopting contrastive learning conception and the downstream task. We designed three optimized data augmentation methods for communication signals in the former task to serve the contrastive concept. We built two neural networks, online and target networks, which interact and learn from each other. The proposed scheme demonstrates the generality of handling the small and sufficient sample cases across a wide range from 10 to 400, being labeled in each group. The experiment also shows promising accuracy and robustness where the recognition results increase at 3-8% from 3 to 7 signal-to-noise ratio (SNR). Our scheme can accurately identify the individual emitter in a complicated electromagnetic environment. Full article

(This article belongs to the Special Issue New Advances in Visual Computing and Virtual Reality)

► Show Figures

Figure 1

29 pages, 4775 KB

Open AccessArticle

Self-Learning for Few-Shot Remote Sensing Image Captioning

by Haonan Zhou, Xiaoping Du, Lurui Xia and Sen Li

Remote Sens. 2022, 14(18), 4606; https://doi.org/10.3390/rs14184606 - 15 Sep 2022

Cited by 14 | Viewed by 4177

Abstract

Large-scale caption-labeled remote sensing image samples are expensive to acquire, and the training samples available in practical application scenarios are generally limited. Therefore, remote sensing image caption generation tasks will inevitably fall into the dilemma of few-shot, resulting in poor qualities of the [...] Read more.

Large-scale caption-labeled remote sensing image samples are expensive to acquire, and the training samples available in practical application scenarios are generally limited. Therefore, remote sensing image caption generation tasks will inevitably fall into the dilemma of few-shot, resulting in poor qualities of the generated text descriptions. In this study, we propose a self-learning method named SFRC for few-shot remote sensing image captioning. Without relying on additional labeled samples and external knowledge, SFRC improves the performance in few-shot scenarios by ameliorating the way and efficiency of the method of learning on limited data. We first train an encoder for semantic feature extraction using a supplemental modified BYOL self-supervised learning method on a small number of unlabeled remote sensing samples, where the unlabeled remote sensing samples are derived from caption-labeled samples. When training the model for caption generation in a small number of caption-labeled remote sensing samples, the self-ensemble yields a parameter-averaging teacher model based on the integration of intermediate morphologies of the model over a certain training time horizon. The self-distillation uses the self-ensemble-obtained teacher model to generate pseudo labels to guide the student model in the next generation to achieve better performance. Additionally, when optimizing the model by parameter back-propagation, we design a baseline incorporating self-critical self-ensemble to reduce the variance during gradient computation and weaken the effect of overfitting. In a range of experiments only using limited caption-labeled samples, the performance evaluation metric scores of SFRC exceed those of recent methods. We conduct percentage sampling few-shot experiments to test the performance of the SFRC method in few-shot remote sensing image captioning with fewer samples. We also conduct ablation experiments on key designs in SFRC. The results of the ablation experiments prove that these self-learning designs we generated for captioning in sparse remote sensing sample scenarios are indeed fruitful, and each design contributes to the performance of the SFRC method. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

22 pages, 4466 KB

Open AccessArticle

Heuristic Attention Representation Learning for Self-Supervised Pretraining

by Van Nhiem Tran, Shen-Hsuan Liu, Yung-Hui Li and Jia-Ching Wang

Sensors 2022, 22(14), 5169; https://doi.org/10.3390/s22145169 - 10 Jul 2022

Cited by 7 | Viewed by 3634

Abstract

Recently, self-supervised learning methods have been shown to be very powerful and efficient for yielding robust representation learning by maximizing the similarity across different augmented views in embedding vector space. However, the main challenge is generating different views with random cropping; the semantic [...] Read more.

Recently, self-supervised learning methods have been shown to be very powerful and efficient for yielding robust representation learning by maximizing the similarity across different augmented views in embedding vector space. However, the main challenge is generating different views with random cropping; the semantic feature might exist differently across different views leading to inappropriately maximizing similarity objective. We tackle this problem by introducing Heuristic Attention Representation Learning (HARL). This self-supervised framework relies on the joint embedding architecture in which the two neural networks are trained to produce similar embedding for different augmented views of the same image. HARL framework adopts prior visual object-level attention by generating a heuristic mask proposal for each training image and maximizes the abstract object-level embedding on vector space instead of whole image representation from previous works. As a result, HARL extracts the quality semantic representation from each training sample and outperforms existing self-supervised baselines on several downstream tasks. In addition, we provide efficient techniques based on conventional computer vision and deep learning methods for generating heuristic mask proposals on natural image datasets. Our HARL achieves +1.3% advancement in the ImageNet semi-supervised learning benchmark and +0.9% improvement in AP₅₀ of the COCO object detection task over the previous state-of-the-art method BYOL. Our code implementation is available for both TensorFlow and PyTorch frameworks. Full article

(This article belongs to the Special Issue Computer Vision and Machine Learning for Intelligent Sensing Systems)

► Show Figures

Figure 1

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI