Saved Queries

The accurate identification of rice growth stages is critical for precision agriculture, crop management, and yield estimation. Remote sensing technologies, particularly multimodal approaches that integrate high spatial and hyperspectral resolution imagery, have demonstrated great potential in large-scale crop monitoring. Multimodal data fusion offers complementary and enriched spectral–spatial information, providing novel pathways for crop growth stage recognition in complex agricultural scenarios. However, the lack of publicly available multimodal datasets specifically designed for rice growth stage identification remains a significant bottleneck that limits the development and evaluation of relevant methods. To address this gap, we present RiceStageSeg, a multimodal benchmark dataset captured by unmanned aerial vehicles (UAVs), designed to support the development and assessment of segmentation models for rice growth monitoring. RiceStageSeg contains paired centimeter-level RGB and 10-band multispectral (MS) images acquired during several critical rice growth stages, including jointing and heading. Each image is accompanied by fine-grained, pixel-level annotations that distinguish between the different growth stages. We establish baseline experiments using several state-of-the-art semantic segmentation models under both unimodal (RGB-only, MS-only) and multimodal (RGB + MS fusion) settings. The experimental results demonstrate that multimodal feature-level fusion outperforms unimodal approaches in segmentation accuracy. RiceStageSeg offers a standardized benchmark to advance future research in multimodal semantic segmentation for agricultural remote sensing. The dataset will be made publicly available on GitHub v0.11.0 (accessed on 1 August 2025). Full article

(This article belongs to the Special Issue Deep Learning for Multi-Source Remote Sensing Image Interpretation: Exploring, Rethinking, and Limiting Breakthroughs)

►▼ Show Figures

Figure 1

16 pages, 1418 KiB

Open AccessArticle

Prototype-Guided Promptable Retinal Lesion Segmentation from Coarse Annotations

by Qinji Yu and Xiaowei Ding

Electronics 2025, 14(16), 3252; https://doi.org/10.3390/electronics14163252 - 15 Aug 2025

Abstract

Accurate segmentation of retinal lesions is critical for the diagnosis and management of ophthalmic diseases, but pixel-level annotation is labor-intensive and demanding in clinical scenarios. To address this, we introduce a promptable segmentation approach based on prototype learning that enables precise retinal lesion segmentation from low-cost, coarse annotations. Our framework treats clinician-provided coarse masks (such as ellipses) as prompts to guide the extraction and refinement of lesion and background feature prototypes. A lightweight U-Net backbone fuses image content with spatial priors, while a superpixel-guided prototype weighting module is employed to mitigate background interference within coarse prompts. We simulate coarse prompts from fine-grained masks to train the model, and extensively validate our method across three datasets (IDRiD, DDR, and a private clinical set) with a range of annotation coarseness levels. Experimental results demonstrate that our prototype-based model significantly outperforms fully supervised and non-prototypical promptable baselines, achieving more accurate and robust segmentation, particularly for challenging and variable lesions. The approach exhibits excellent adaptability to unseen data distributions and lesion types, maintaining stable performance even under highly coarse prompts. This work highlights the potential of prompt-driven, prototype-based solutions for efficient and reliable medical image segmentation in practical clinical settings. Full article

(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)

47 pages, 12839 KiB

Open AccessArticle

Tree Type Classification from ALS Data: A Comparative Analysis of 1D, 2D, and 3D Representations Using ML and DL Models

by Sead Mustafić, Mathias Schardt and Roland Perko

Remote Sens. 2025, 17(16), 2847; https://doi.org/10.3390/rs17162847 - 15 Aug 2025

Abstract

Accurate classification of individual tree types is a key component in forest inventory, biodiversity monitoring, and ecological modeling. This study evaluates and compares multiple Machine Learning (ML) and Deep Learning (DL) approaches for tree type classification based on Airborne Laser Scanning (ALS) data. A mixed-species forest in southeastern Austria, Europe, served as the test site, with spruce, pine, and a grouped class of broadleaf species as target categories. To examine the impact of data representation, ALS point clouds were transformed into four distinct structures: 1D feature vectors, 2D raster profiles, 3D voxel grids, and unstructured 3D point clouds. A comprehensive dataset, combining field measurements and manually annotated aerial data, was used to train and validate 45 ML and DL models. Results show that DL models based on 3D point clouds achieved the highest overall accuracy (up to 88.1%), followed by multi-view 2D raster and voxel-based methods. Traditional ML models performed well on 1D data but struggled with high-dimensional inputs. Spruce trees were classified most reliably, while confusion between pine and broadleaf species remained challenging across methods. The study highlights the importance of selecting suitable data structures and model types for operational tree classification and outlines potential directions for improving accuracy through multimodal and temporal data fusion. Full article

(This article belongs to the Section Forest Remote Sensing)

►▼ Show Figures

Figure 1

24 pages, 2703 KiB

Open AccessArticle

Unsupervised Person Re-Identification via Deep Attribute Learning

by Shun Zhang, Yaohui Xu, Xuebin Zhang, Boyang Cheng and Ke Wang

Future Internet 2025, 17(8), 371; https://doi.org/10.3390/fi17080371 - 15 Aug 2025

Abstract

Driven by growing public security demands and the advancement of intelligent surveillance systems, person re-identification (ReID) has emerged as a prominent research focus in the field of computer vision. %The primary objective of person ReID is to retrieve individuals with the same identity across different camera views. However, this task presents challenges due to its high sensitivity to variations in visual appearance caused by factors such as body pose and camera parameters. Although deep learning-based methods have achieved marked progress in ReID, the high cost of annotation remains a challenge that cannot be overlooked. To address this, we propose an unsupervised attribute learning framework that eliminates the need for costly manual annotations while maintaining high accuracy. The framework learns the mid-level human attributes (such as clothing type and gender) that are robust to substantial visual appearance variations and can hence boost the accuracy of attributes with a small amount of labeled data. To carry out our framework, we present a part-based convolutional neural network (CNN) architecture, which consists of two components for image and body attribute learning on a global level and upper- and lower-body image and attribute learning at a local level. The proposed architecture is trained to learn attribute-semantic and identity-discriminative feature representations simultaneously. For model learning, we first train our part-based network using a supervised approach on a labeled attribute dataset. Then, we apply an unsupervised clustering method to assign pseudo-labels to unlabeled images in a target dataset using our trained network. To improve feature compatibility, we introduce an attribute consistency scheme for unsupervised domain adaptation on this unlabeled target data. During training on the target dataset, we alternately perform three steps: extracting features with the updated model, assigning pseudo-labels to unlabeled images, and fine-tuning the model. % change Through a unified framework that fuses complementary attribute-label and identity label information, our approach achieves considerable improvements of 10.6\% and 3.91\% mAP on Market-1501→DukeMTMC-ReID and DukeMTMC-ReID→Market-1501 unsupervised domain adaptation tasks, respectively. Full article

(This article belongs to the Special Issue Advances in Deep Learning and Next-Generation Internet Technologies)

22 pages, 2050 KiB

Open AccessArticle

A Trustworthy Dataset for APT Intelligence with an Auto-Annotation Framework

by Rui Qi, Ga Xiang, Yangsen Zhang, Qunsheng Yang, Mingyue Cheng, Haoyang Zhang, Mingming Ma, Lu Sun and Zhixing Ma

Electronics 2025, 14(16), 3251; https://doi.org/10.3390/electronics14163251 - 15 Aug 2025

Abstract

Advanced Persistent Threats (APTs) pose significant cybersecurity challenges due to their multi-stage complexity. Knowledge graphs (KGs) effectively model APT attack processes through node-link architectures; however, the scarcity of high-quality, annotated datasets limits research progress. The primary challenge lies in balancing annotation cost and quality, particularly due to the lack of quality assessment methods for graph annotation data. This study addresses these issues by extending existing APT ontology definitions and developing a dynamic, trustworthy annotation framework for APT knowledge graphs. The framework introduces a self-verification mechanism utilizing large language model (LLM) annotation consistency and establishes a comprehensive graph data metric system for problem localization in annotated data. This metric system, based on structural properties, logical consistency, and APT attack chain characteristics, comprehensively evaluates annotation quality across representation, syntax semantics, and topological structure. Experimental results show that this framework significantly reduces annotation costs while maintaining quality. Using this framework, we constructed LAPTKG, a reliable dataset containing over 10,000 entities and relations. Baseline evaluations show substantial improvements in entity and relation extraction performance after metric correction, validating the framework’s effectiveness in reliable APT knowledge graph dataset construction. Full article

(This article belongs to the Special Issue Advances in Information Processing and Network Security)

19 pages, 939 KiB

Open AccessArticle

From Convolution to Spikes for Mental Health: A CNN-to-SNN Approach Using the DAIC-WOZ Dataset

by Victor Triohin, Monica Leba and Andreea Cristina Ionica

Appl. Sci. 2025, 15(16), 9032; https://doi.org/10.3390/app15169032 - 15 Aug 2025

Abstract

Depression remains a leading cause of global disability, yet scalable and objective diagnostic tools are still lacking. Speech has emerged as a promising non-invasive modality for automated depression detection, due to its strong correlation with emotional state and ease of acquisition. While convolutional neural networks (CNNs) have achieved state-of-the-art performance in this domain, their high computational demands limit deployment in low-resource or real-time settings. Spiking neural networks (SNNs), by contrast, offer energy-efficient, event-driven computation inspired by biological neurons, but they are difficult to train directly and often exhibit degraded performance on complex tasks. This study investigates whether CNNs trained on audio data from the clinically annotated DAIC-WOZ dataset can be effectively converted into SNNs while preserving diagnostic accuracy. We evaluate multiple conversion thresholds using the SpikingJelly framework and find that the 99.9% mode yields an SNN that matches the original CNN in both accuracy (82.5%) and macro F1 score (0.8254). Lower threshold settings offer increased sensitivity to depressive speech at the cost of overall accuracy, while naïve conversion strategies result in significant performance loss. These findings support the feasibility of CNN-to-SNN conversion for real-world mental health applications and underscore the importance of precise calibration in achieving clinically meaningful results. Full article

(This article belongs to the Special Issue eHealth Innovative Approaches and Applications: 2nd Edition)

16 pages, 7955 KiB

Open AccessArticle

Development and Validation of a Computer Vision Dataset for Object Detection and Instance Segmentation in Earthwork Construction Sites

by JongHo Na, JaeKang Lee, HyuSoung Shin and IlDong Yun

Appl. Sci. 2025, 15(16), 9000; https://doi.org/10.3390/app15169000 - 14 Aug 2025

Abstract

Construction sites report the highest rate of industrial accidents, prompting the active development of smart safety management systems based on deep learning-based computer vision technology. To support the digital transformation of construction sites, securing site-specific datasets is essential. In this study, raw data were collected from an actual earthwork site. Key construction equipment and terrain objects primarily operated at the site were identified, and 89,766 images were processed to build a site-specific training dataset. This dataset includes annotated bounding boxes for object detection and polygon masks for instance segmentation. The performance of the dataset was validated using representative models—YOLO v7 for object detection and Mask R-CNN for instance segmentation. Quantitative metrics and visual assessments confirmed the validity and practical applicability of the dataset. The dataset used in this study has been made publicly available for use by researchers in related fields. This dataset is expected to serve as a foundational resource for advancing object detection applications in construction safety. Full article

(This article belongs to the Section Civil Engineering)

►▼ Show Figures

Figure 1

40 pages, 6883 KiB

Open AccessArticle

SYNTHUA-DT: A Methodological Framework for Synthetic Dataset Generation and Automatic Annotation from Digital Twins in Urban Accessibility Applications

by Santiago Felipe Luna Romero, Mauren Abreu de Souza and Luis Serpa Andrade

Technologies 2025, 13(8), 359; https://doi.org/10.3390/technologies13080359 - 14 Aug 2025

Abstract

Urban scene understanding for inclusive smart cities remains challenged by the scarcity of training data capturing people with mobility impairments. We propose SYNTHUA-DT, a novel methodological framework that integrates unmanned aerial vehicle (UAV) photogrammetry, 3D digital twin modeling, and high-fidelity simulation in Unreal Engine to generate annotated synthetic datasets for urban accessibility applications. This framework produces photo-realistic images with automatic pixel-perfect segmentation labels, dramatically reducing the need for manual annotation. Focusing on the detection of individuals using mobility aids (e.g., wheelchairs) in complex urban environments, SYNTHUA-DT is designed as a generalized, replicable pipeline adaptable to different cities and scenarios. The novelty lies in combining real-city digital twins with procedurally placed virtual agents, enabling diverse viewpoints and scenarios that are impractical to capture in real life. The computational efficiency and scale of this synthetic data generation offer significant advantages over conventional datasets (such as Cityscapes or KITTI), which are limited in accessibility-related content and costly to annotate. A case study using a digital twin of Curitiba, Brazil, validates the framework’s real-world applicability: 22,412 labeled images were synthesized to train and evaluate vision models for mobility aids user detection. The results demonstrate improved recognition performance and robustness, highlighting SYNTHUA-DT’s potential to advance urban accessibility by providing abundant, bias-mitigating training data. This work paves the way for inclusive computer vision systems in smart cities through a rigorously engineered synthetic data pipeline. Full article

(This article belongs to the Topic Digital Twins and Artificial Intelligence for Advancing Smart Green Building and City Resilience)

►▼ Show Figures

Figure 1

18 pages, 3931 KiB

Open AccessArticle

Dual-Generator and Dynamically Fused Discriminators Adversarial Network to Create Synthetic Coronary Optical Coherence Tomography Images for Coronary Artery Disease Classification

by Junaid Zafar, Faisal Sharif and Haroon Zafar

Optics 2025, 6(3), 38; https://doi.org/10.3390/opt6030038 - 14 Aug 2025

Abstract

Deep neural networks have led to a substantial increase in multifaceted classification tasks by making use of large-scale and diverse annotated datasets. However, diverse optical coherence tomography (OCT) datasets in cardiovascular imaging remain an uphill task. This research focuses on improving the diversity and generalization ability of augmentation architectures while maintaining the baseline classification accuracy for coronary atrial plaques using a novel dual-generator and dynamically fused discriminator conditional generative adversarial network (DGDFGAN). Our method is demonstrated on an augmented OCT dataset with 6900 images. With dual generators, our network provides the diverse outputs for the same input condition, as each generator acts as a regulator for the other. In our model, this mutual regularization enhances the ability of both generators to generalize better across different features. The fusion discriminators use one discriminator for classification purposes, hence avoiding the need for a separate deep architecture. A loss function, including the SSIM loss and FID scores, confirms that perfect synthetic OCT image aliases are created. We optimize our model via the gray wolf optimizer during model training. Furthermore, an inter-comparison and recorded SSID loss of 0.9542 ± 0.008 and a FID score of 7 are suggestive of better diversity and generation characteristics that outperform the performance of leading GAN architectures. We trust that our approach is practically viable and thus assists professionals in informed decision making in clinical settings. Full article

(This article belongs to the Special Issue Advancements in Optical Imaging and Sensing for Biomedical and Environmental Applications)

►▼ Show Figures

Figure 1

26 pages, 4742 KiB

Open AccessArticle

Design and Evaluation of LLDPE/Epoxy Composite Tiles with YOLOv8-Based Defect Detection for Flooring Applications

by I. Infanta Mary Priya, Siddharth Anand, Aravindan Bishwakarma, M. Uma, Sethuramalingam Prabhu and M. M. Reddy

Processes 2025, 13(8), 2568; https://doi.org/10.3390/pr13082568 - 14 Aug 2025

Abstract

With the increasing demand for sustainable and cost-effective alternatives in the construction industry, polymer composites have emerged as a promising solution. This study focuses on the development of innovative composite tiles using Linear Low-Density Polyethylene (LLDPE) powder blended with epoxy resin and a hardener as a green substitute for conventional ceramic and cement tiles. LLDPE is recognized for its flexibility, durability, and chemical resistance, making it an effective filler within the epoxy matrix. To optimize its material properties, composite samples were fabricated using three different LLDPE-to-epoxy ratios: 30:70, 40:60, and 50:50. Flexural strength testing revealed that while the 50:50 blend achieved the highest maximum value (29.887 MPa), it also exhibited significant variability, reducing its reliability for practical applications. In contrast, the 40:60 ratio demonstrated more consistent and repeatable flexural strength, ranging from 16 to 20 MPa, which is ideal for flooring applications where mechanical performance under repeated loading is critical. Scanning Electron Microscopy (SEM) images confirmed uniform filler dispersion in the 40:60 mix, further supporting its mechanical consistency. The 30:70 composition showed irregular and erratic behaviour, with values ranging from 11.596 to 25.765 MPa, indicating poor dispersion and increased brittleness. To complement the development of the materials, deep learning techniques were employed for real-time defect detection in the manufactured tiles. Utilizing the YOLOv8 (You Only Look Once version 8) algorithm, this study implemented an automated, vision-based surface monitoring system capable of identifying surface deterioration and defects. A dataset comprising over 100 annotated images was prepared, featuring various surface defects such as cracks, craters, glaze detachment, and tile lacunae, alongside defect-free samples. The integration of machine learning not only enhances quality control in the production process but also offers a scalable solution for defect detection in large-scale manufacturing environments. This research demonstrates a dual approach to material innovation and intelligent defect detection to improve the performance and quality assurance of composite tiles, contributing to sustainable construction practices. Full article

(This article belongs to the Special Issue Recent Trends in Advanced Manufacturing Technologies for Materials Processing and Production)

►▼ Show Figures

Figure 1

23 pages, 18349 KiB

Open AccessArticle

Estimating Radicle Length of Germinating Elm Seeds via Deep Learning

by Dantong Li, Yang Luo, Hua Xue and Guodong Sun

Sensors 2025, 25(16), 5024; https://doi.org/10.3390/s25165024 - 13 Aug 2025

Viewed by 80

Abstract

Accurate measurement of seedling traits is essential for plant phenotyping, particularly in understanding growth dynamics and stress responses. Elm trees (Ulmus spp.), ecologically and economically significant, pose unique challenges due to their curved seedling morphology. Traditional manual measurement methods are time-consuming, prone to human error, and often lack consistency. Moreover, automated approaches remain limited and often fail to accurately process seedlings with nonlinear or curved morphologies. In this study, we introduce GLEN, a deep learning-based model for detecting germinating elm seeds and accurately estimating their lengths of germinating structures. It leverages a dual-path architecture that combines pixel-level spatial features with instance-level semantic information, enabling robust measurement of curved radicles. To support training, we construct GermElmData, a curated dataset of annotated elm seedling images, and introduce a novel synthetic data generation pipeline that produces high-fidelity, morphologically diverse germination images. This reduces the dependence on extensive manual annotations and improves model generalization. Experimental results demonstrate that GLEN achieves an estimation error on the order of millimeters, outperforming existing models. Beyond quantifying germinating elm seeds, the architectural design and data augmentation strategies in GLEN offer a scalable framework for morphological quantification in both plant phenotyping and broader biomedical imaging domains. Full article

(This article belongs to the Section Intelligent Sensors)

►▼ Show Figures

Figure 1

25 pages, 9564 KiB

Open AccessArticle

Semantic-Aware Cross-Modal Transfer for UAV-LiDAR Individual Tree Segmentation

by Fuyang Zhou, Haiqing He, Ting Chen, Tao Zhang, Minglu Yang, Ye Yuan and Jiahao Liu

Remote Sens. 2025, 17(16), 2805; https://doi.org/10.3390/rs17162805 - 13 Aug 2025

Viewed by 104

Abstract

Cross-modal semantic segmentation of individual tree LiDAR point clouds is critical for accurately characterizing tree attributes, quantifying ecological interactions, and estimating carbon storage. However, in forest environments, this task faces key challenges such as high annotation costs and poor cross-domain generalization. To address these issues, this study proposes a cross-modal semantic transfer framework tailored for individual tree point cloud segmentation in forested scenes. Leveraging co-registered UAV-acquired RGB imagery and LiDAR data, we construct a technical pipeline of “2D semantic inference—3D spatial mapping—cross-modal fusion” to enable annotation-free semantic parsing of 3D individual trees. Specifically, we first introduce a novel Multi-Source Feature Fusion Network (MSFFNet) to achieve accurate instance-level segmentation of individual trees in the 2D image domain. Subsequently, we develop a hierarchical two-stage registration strategy to effectively align dense matched point clouds (MPC) generated from UAV imagery with LiDAR point clouds. On this basis, we propose a probabilistic cross-modal semantic transfer model that builds a semantic probability field through multi-view projection and the expectation–maximization algorithm. By integrating geometric features and semantic confidence, the model establishes semantic correspondences between 2D pixels and 3D points, thereby achieving spatially consistent semantic label mapping. This facilitates the transfer of semantic annotations from the 2D image domain to the 3D point cloud domain. The proposed method is evaluated on two forest datasets. The results demonstrate that the proposed individual tree instance segmentation approach achieves the highest performance, with an IoU of 87.60%, compared to state-of-the-art methods such as Mask R-CNN, SOLOV2, and Mask2Former. Furthermore, the cross-modal semantic label transfer framework significantly outperforms existing mainstream methods in individual tree point cloud semantic segmentation across complex forest scenarios. Full article

(This article belongs to the Topic Vegetation Characterization and Classification With Multi-Source Remote Sensing Data)

►▼ Show Figures

Figure 1

24 pages, 3617 KiB

Open AccessArticle

A Comparison Between Unimodal and Multimodal Segmentation Models for Deep Brain Structures from T1- and T2-Weighted MRI

by Nicola Altini, Erica Lasaracina, Francesca Galeone, Michela Prunella, Vladimiro Suglia, Leonarda Carnimeo, Vito Triggiani, Daniele Ranieri, Gioacchino Brunetti and Vitoantonio Bevilacqua

Mach. Learn. Knowl. Extr. 2025, 7(3), 84; https://doi.org/10.3390/make7030084 - 13 Aug 2025

Viewed by 268

Abstract

Accurate segmentation of deep brain structures is critical for preoperative planning in such neurosurgical procedures as Deep Brain Stimulation (DBS). Previous research has showcased successful pipelines for segmentation from T1-weighted (T1w) Magnetic Resonance Imaging (MRI) data. Nevertheless, the role of T2-weighted (T2w) MRI data has been underexploited so far. This study proposes and evaluates a fully automated deep learning pipeline based on nnU-Net for the segmentation of eight clinically relevant deep brain structures. A heterogeneous dataset has been prepared by gathering 325 paired T1w and T2w MRI scans from eight publicly available sources, which have been annotated by means of an atlas-based registration approach. Three 3D nnU-Net models—unimodal T1w, unimodal T2w, and multimodal (encompassing both T1w and T2w)—have been trained and compared by using 5-fold cross-validation and a separate test set. The outcomes prove that the multimodal model consistently outperforms the T2w unimodal model and achieves comparable performance with the T1w unimodal model. On our dataset, all proposed models significantly exceed the performance of the state-of-the-art DBSegment tool. These findings underscore the value of multimodal MRI in enhancing deep brain segmentation and offer a robust framework for accurate delineation of subcortical targets in both research and clinical settings. Full article

(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition, 2nd Edition)

►▼ Show Figures

Figure 1

25 pages, 28917 KiB

Open AccessArticle

Synthetic Data-Driven Methods to Accelerate the Deployment of Deep Learning Models: A Case Study on Pest and Disease Detection in Precision Viticulture

by Telmo Adão, Agnieszka Chojka, David Pascoal, Nuno Silva, Raul Morais and Emanuel Peres

Computers 2025, 14(8), 327; https://doi.org/10.3390/computers14080327 - 13 Aug 2025

Viewed by 144

Abstract

The development of reliable visual inference models is often constrained by the burdensome and time-consuming processes involved in collecting and annotating high-quality datasets. This challenge becomes more acute in domains where key phenomena are time-dependent or event-driven, narrowing the opportunity window to capture representative observations. Yet, accelerating the deployment of deep learning (DL) models is crucial to support timely, data-driven decision-making in operational settings. To tackle such an issue, this paper explores the use of 2D synthetic data grounded in real-world patterns to train initial DL models in contexts where annotated datasets are scarce or can only be acquired within restrictive time windows. Two complementary approaches to synthetic data generation are investigated: rule-based digital image processing and advanced text-to-image generative diffusion models. These methods can operate independently or be combined to enhance flexibility and coverage. A proof-of-concept is presented through a couple case studies in precision viticulture, a domain often constrained by seasonal dependencies and environmental variability. Specifically, the detection of Lobesia botrana in sticky traps and the classification of grapevine foliar symptoms associated with black rot, ESCA, and leaf blight are addressed. The results suggest that the proposed approach potentially accelerates the deployment of preliminary DL models by comprehensively automating the production of context-aware datasets roughly inspired by specific challenge-driven operational settings, thereby mitigating the need for time-consuming and labor-intensive processes, from image acquisition to annotation. Although models trained on such synthetic datasets require further refinement—for example, through active learning—the approach offers a scalable and functional solution that reduces human involvement, even in scenarios of data scarcity, and supports the effective transition of laboratory-developed AI to real-world deployment environments. Full article

(This article belongs to the Special Issue Machine Learning and Statistical Learning with Applications 2025)

►▼ Show Figures

Graphical abstract

20 pages, 4191 KiB

Open AccessArticle

A Deep Transfer Contrastive Learning Network for Few-Shot Hyperspectral Image Classification

by Gan Yang and Zhaohui Wang

Remote Sens. 2025, 17(16), 2800; https://doi.org/10.3390/rs17162800 - 13 Aug 2025

Viewed by 202

Abstract

Over recent decades, the hyperspectral image (HSI) classification landscape has undergone significant transformations driven by advances in deep learning (DL). Despite substantial progress, few-shot scenarios remain a significant challenge, primarily due to the high cost of manual annotation and the unreliability of visual interpretation. Traditional DL models require massive datasets to learn sophisticated feature representations, hindering their full potential in data-scarce contexts. To tackle this issue, a deep transfer contrastive learning network is proposed. A spectral data augmentation module is incorporated to expand limited sample pairs. Subsequently, a spatial–spectral feature extraction module is designed to fuse the learned feature information. The weights of the spatial feature extraction network are initialized with knowledge transferred from source-domain pretraining, while the spectral residual network acquires rich spectral information. Furthermore, contrastive learning is integrated to enhance discriminative representation learning from scarce samples, effectively mitigating obstacles arising from the high inter-class similarity and large intra-class variance inherent in HSIs. Experiments on four public HSI datasets demonstrate that our method achieves competitive performance against state-of-the-art approaches. Full article

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 55.

Go to page 1 2 3 4 5

Search Results (2,743)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI