Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (404)

Search Parameters:
Keywords = multi-class feature fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 1264 KB  
Article
ES2-LeafSeg: Lightweight State Space Modeling-Driven Agricultural Leaf Segmentation
by Hao Wang, Zhiyang Li, Pengsen Zhao and Jinlong Yu
Appl. Sci. 2026, 16(8), 3745; https://doi.org/10.3390/app16083745 - 10 Apr 2026
Abstract
Agricultural robots and unmanned farmland management require real-time and precise parsing of crop leaves at the edge to support variable application of pesticides, seedling condition monitoring, and phenotypic analysis. However, the field environment features drastic changes in light, leaf occlusion, and interference from [...] Read more.
Agricultural robots and unmanned farmland management require real-time and precise parsing of crop leaves at the edge to support variable application of pesticides, seedling condition monitoring, and phenotypic analysis. However, the field environment features drastic changes in light, leaf occlusion, and interference from background weeds, which can cause semantic fragmentation and boundary artifacts in lightweight models. This paper presents ES2-LeafSeg, a lightweight framework for leaf semantic segmentation tailored for edge deployment. The method employs EfficientNetV2 as the backbone encoder and introduces the State Space Semantic Enhancement Module (S2FEM) on skip connection features, modeling long-range dependencies and suppressing local texture noise through SSM pooling in row and column directions. Meanwhile, a cross-scale decoder (CSD) and a global context transformation (GCT) are designed to achieve multi-scale semantic fusion and boundary refinement. On the three-class segmentation task of the SoyCotton dataset, ES2-LeafSeg achieved mIoU of 0.817, mDice of 0.869, Fβw of 0.925, and MAE of 0.011, outperforming multiple classic and recent baselines while maintaining 23.67 M parameters and 49.62 FPS. Ablation experiments further verified the complementary contributions of S2FEM and GCT to regional consistency and boundary quality. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
35 pages, 3452 KB  
Article
LUMINA-Net: Acute Lymphocytic Leukemia Subtype Classification via Interpretable Convolution Neural Network Based on Wavelet and Attention Mechanisms
by Omneya Attallah
Algorithms 2026, 19(4), 298; https://doi.org/10.3390/a19040298 - 10 Apr 2026
Abstract
Acute Lymphoblastic Leukemia (ALL) is a highly prevalent hematological malignancy, especially in children, for whom precise and prompt subtype identification is essential to establish suitable treatment protocols. Current deep learning-based computer-aided diagnosis (CAD) methods for identifying ALL are hindered by numerous drawbacks, such [...] Read more.
Acute Lymphoblastic Leukemia (ALL) is a highly prevalent hematological malignancy, especially in children, for whom precise and prompt subtype identification is essential to establish suitable treatment protocols. Current deep learning-based computer-aided diagnosis (CAD) methods for identifying ALL are hindered by numerous drawbacks, such as a dependence on solely spatial feature depictions, elevated feature dimensions, computationally extensive deep learning architectures, inadequate multi-layer feature utilization, and poor interpretability. This paper introduces LUMINA-Net, a custom, lightweight, and interpretable deep learning CAD for the automated identification and subtype diagnosis of ALL using microscopic blood smear pictures. LUMINA-Net makes four principal contributions: first, it integrates a self-attention module within a lightweight custom Convolution Neural Network (CNN) to effectively capture long-range spatial relationships across clinically pertinent cytological patterns while preserving a compact design. Second, it employs a Discrete Wavelet Transform (DWT)-based wavelet pooling layer that decreases feature dimensions by up to 96.875% while enhancing the obtained depictions with spatial-spectral information. Third, it utilizes a multi-layer feature fusion strategy that combines wavelet-pooled features from two deep layers with a third fully connected layer to create a discriminating multi-scale feature vector. Fourth, it incorporates Gradient-weighted Class Activation Mapping as a dedicated explainability process to furnish clinicians with apparent visual explanations for each classification decision. Withoit the need for image enhancement or segmentation preprocessing, LUMINA-Net outperforms the competing state-of-the-art methods on the same dataset, achieving a peak accuracy of 99.51%, specificity of 99.84%, and sensitivity of 99.51% on the publicly available Kaggle ALL dataset. This demonstrates that LUMINA-Net has the potential to be a dependable, effective, and clinically interpretable CAD tool for ALL diagnosis. Full article
34 pages, 10089 KB  
Article
GateProtoNet: A Compute-Aware Two-Stage Hybrid Framework with Prototype Evidence and Faithfulness-Verified Explainability for Wheat and Cotton Leaf Disease Classification
by Muhammad Irfan Sharif, Yong Zhong, Muhammad Zaheer Sajid and Francesco Marinello
AgriEngineering 2026, 8(4), 152; https://doi.org/10.3390/agriengineering8040152 - 10 Apr 2026
Abstract
Accurate diagnosis of wheat leaf diseases in real farming conditions requires models that are not only highly accurate but also computationally efficient and interpretable for practical deployment on edge devices. We propose GateProtoNet (GPN), a two-stage, compute-aware, and explainable framework for multi-class leaf [...] Read more.
Accurate diagnosis of wheat leaf diseases in real farming conditions requires models that are not only highly accurate but also computationally efficient and interpretable for practical deployment on edge devices. We propose GateProtoNet (GPN), a two-stage, compute-aware, and explainable framework for multi-class leaf disease recognition. Stage-1 performs ultra-light healthy-versus-diseased screening, enabling early exit for healthy samples and substantially reducing average expected inference cost. For diseased samples, Stage-2 applies a novel hybrid backbone featuring a frequency-factorized Discrete Wavelet Transform (DWT) stem, parallel micro-lesion convolutional encoding for fine texture patterns, and a linear token mixer for global context modeling. A cross-gated fusion module adaptively integrates local and global evidence with minimal computational overhead. To ensure trustworthy predictions, GPN introduces a prototype evidence head that performs classification via similarity to learned class prototypes, providing human-interpretable explanations, along with a faithfulness constraint that enforces explanation reliability by measuring confidence degradation under salient region removal. Rigorous evaluation on four publicly available wheat and cotton leaf disease datasets demonstrate that GateProtoNet achieves 99.2% classification accuracy, 99.1% macro-F1 score, and 99.3% AUC, significantly outperforming existing CNN, transformer, and hybrid baselines while requiring substantially fewer parameters and FLOPs. The two-stage inference strategy reduces average computational cost by avoiding full model execution on healthy leaves, enabling real-time, on-device diagnosis for resource-constrained agricultural environments. Full article
Show Figures

Figure 1

19 pages, 4608 KB  
Article
SGH-Net: An Efficient Hierarchical Fusion Network with Spectrally Guided Attention for Multi-Modal Landslide Segmentation
by Jing Wang, Haiyang Li, Shuguang Wu, Yukui Yu, Guigen Nie and Zhaoquan Fan
Remote Sens. 2026, 18(8), 1115; https://doi.org/10.3390/rs18081115 - 9 Apr 2026
Abstract
Accurate landslide segmentation from remote sensing imagery is important for geohazard assessment and emergency response, yet it remains challenging because landslide regions are often spectrally confused with bare soil, riverbeds, shadows, and disturbed surfaces while also suffering from severe foreground–background imbalance. To address [...] Read more.
Accurate landslide segmentation from remote sensing imagery is important for geohazard assessment and emergency response, yet it remains challenging because landslide regions are often spectrally confused with bare soil, riverbeds, shadows, and disturbed surfaces while also suffering from severe foreground–background imbalance. To address these issues, we propose an Efficient Spectrally Guided Hierarchical Fusion Network (SGH-Net) for multi-modal landslide segmentation. Instead of directly concatenating heterogeneous inputs at the image level, SGH-Net adopts an asymmetric encoder–decoder design in which a pretrained EfficientNet-B4 extracts RGB features, while two lightweight guidance encoders capture complementary multispectral band and DEM-derived terrain cues. These guidance features are progressively injected into the RGB backbone through multi-stage Guided Attention Blocks, enabling selective feature recalibration and reducing cross-modal interference. In addition, a hybrid Dice–Focal loss is used to alleviate class imbalance. Experiments on the Landslide4Sense dataset show that SGH-Net achieves the best overall performance among the compared methods under the adopted evaluation protocol, reaching 81.15% IoU and a 77.86% F1-score. Compared with representative multi-modal baselines, the proposed method delivers more accurate boundary delineation and fewer false alarms while maintaining favorable model complexity. These results indicate that modality-guided hierarchical fusion is an effective and efficient strategy for multi-modal landslide segmentation. Full article
Show Figures

Figure 1

22 pages, 4792 KB  
Article
Distracted Driving Behavior Recognition Based on Improved YOLOv8n-Pose and Multi-Feature Fusion
by Zhuzhou Li, Dudu Guo, Zhenxun Wei, Guoliang Chen, Miao Sun and Yuhao Sun
Appl. Sci. 2026, 16(7), 3532; https://doi.org/10.3390/app16073532 - 3 Apr 2026
Viewed by 171
Abstract
Distracted driving is one of the primary causes of road traffic accidents. Behavior recognition technology based on machine vision has emerged as a research hotspot due to its non-contact and high-efficiency nature. To address the challenges of complex lighting conditions in the driver’s [...] Read more.
Distracted driving is one of the primary causes of road traffic accidents. Behavior recognition technology based on machine vision has emerged as a research hotspot due to its non-contact and high-efficiency nature. To address the challenges of complex lighting conditions in the driver’s cabin, low detection accuracy for small-scale keypoints, and the difficulty in effectively characterizing behavioral features, this paper proposes a distracted driving behavior recognition method based on an improved YOLOv8n-Pose model and multi-feature fusion. First, the original YOLOv8n-Pose model is optimized. A P2 detection layer is added to enhance the feature extraction capabilities for small-scale human keypoints, and the SE attention module is incorporated to improve the model’s robustness under complex lighting conditions. In addition, the loss function is replaced with focal loss to tackle the class imbalance problem, thus forming the YOLOv8n-PSF-Pose keypoint detection network. Subsequently, based on the coordinates of 12 human keypoints extracted by this network, a multi-dimensional feature vector is constructed, which takes joint angles as the core and integrates the relative distances between keypoints and the number of valid keypoints. Finally, a BP neural network is adopted to classify the constructed feature vectors, enabling the accurate recognition of six typical distracted driving behaviors (normal driving, drinking or eating, making phone calls, using mobile phones, operating vehicle infotainment systems, and turning around to fetch items). The experimental results show that the improved YOLOv8n-PSF-Pose model achieves an mAP50 of 93.8% in keypoint detection, which is 6.7 percentage points higher than the original model; the BP classification model based on multi-feature fusion achieves an F1-score of 97.7% in the behavior recognition task, which is significantly better than traditional classifiers such as SVM and random forest, and the image processing speed on the NVIDIA RTX 3090TI reaches a high throughput of 45 FPS. This proves that the proposed method achieves an excellent balance between accuracy and speed. This study provides an effective solution for the real-time and accurate recognition of distracted driving behaviors. Full article
Show Figures

Figure 1

39 pages, 96608 KB  
Article
Multi-Modal Feature Fusion and Hierarchical Classification for Automated Equine–Human Interaction Behavior Recognition
by Samierra Arora, Emily Kieson, Christine Rudd and Peter A. Gloor
Sensors 2026, 26(7), 2202; https://doi.org/10.3390/s26072202 - 2 Apr 2026
Viewed by 727
Abstract
Automated recognition of equine–human interaction behaviors from video represents a significant challenge in computational ethology, with critical applications spanning animal welfare assessment, equine-assisted services evaluation, and safety monitoring in equestrian environments. Existing approaches to animal behavior recognition typically focus on single species in [...] Read more.
Automated recognition of equine–human interaction behaviors from video represents a significant challenge in computational ethology, with critical applications spanning animal welfare assessment, equine-assisted services evaluation, and safety monitoring in equestrian environments. Existing approaches to animal behavior recognition typically focus on single species in isolation, rely solely on facial expression analysis while ignoring full-body posture, or employ flat classification architectures that fail under the severe class imbalances characteristic of naturalistic behavioral datasets. Furthermore, no prior framework integrates simultaneous analysis of both human and equine body language for cross-species interaction classification. This paper presents a novel hierarchical classification framework integrating multi-modal computer vision features to distinguish behavioral states during horse–human encounters. Our methodology employs three complementary feature extraction pipelines: YOLOv8 for spatial relationship modeling, MediaPipe for human postural analysis, and AP-10K for equine body language interpretation. From 28 annotated interaction videos comprising 50,270 temporal samples across five horse breeds, we extract 35 discriminative features capturing proximity dynamics, body orientation, and species-specific behavioral indicators. To address severe class imbalance (18.3:1 ratio between affiliative and avoidant categories), we implement cost-sensitive gradient boosting with automatic class weight optimization within a two-stage hierarchical architecture. The first stage classifies interactions into three parent categories (affiliative, neutral, avoidant) achieving 73.2% balanced accuracy, while stage two discriminates six fine-grained sub-behaviors achieving 88.5% balanced accuracy (under oracle parent-category routing; cascaded end-to-end performance is 62.9% balanced accuracy due to Stage 1 error propagation, identifying parent classification as the primary bottleneck). Notably, our system achieves 85.0% recall on safety-critical avoidant behaviors despite their representation of only 3.8% of the dataset. Extensive ablation studies demonstrate that equine pose features contribute most critically to classification performance, while comprehensive cross-validation analysis confirms model robustness across diverse interaction contexts. The proposed framework establishes the first systematic multimodal cross-species behavioral assessment pipeline in human–animal interaction research, with direct implications for improving equine welfare monitoring and rider safety protocols. Full article
(This article belongs to the Special Issue Innovative Sensing Methods for Motion and Behavior Analysis)
Show Figures

Figure 1

31 pages, 4842 KB  
Article
FDR-Net: Fine-Grained Lesion Detection Model for Tilapia in Aquaculture via Multi-Scale Feature Enhancement and Spatial Attention Fusion
by Chenhui Zhou and Vladimir Y. Mariano
Symmetry 2026, 18(4), 598; https://doi.org/10.3390/sym18040598 - 31 Mar 2026
Viewed by 290
Abstract
In disease control and precision management in aquaculture, rapid and accurate identification of common fish diseases is pivotal to mitigating economic losses and ensuring aquaculture profitability. However, fish diseases are characterized by subtle symptoms, polymorphic lesions, and high susceptibility to environmental perturbations such [...] Read more.
In disease control and precision management in aquaculture, rapid and accurate identification of common fish diseases is pivotal to mitigating economic losses and ensuring aquaculture profitability. However, fish diseases are characterized by subtle symptoms, polymorphic lesions, and high susceptibility to environmental perturbations such as water turbidity and illumination fluctuations. Existing detection models generally suffer from inadequate lightweight design, poor fine-grained lesion feature extraction, and deficient adaptability to class imbalance, failing to meet the stringent requirements of precise diagnosis in real-world aquaculture scenarios. To address these challenges, this study proposes FDR-Net: a fine-grained lesion detection model for tilapia via multi-scale feature enhancement and spatial attention fusion. Using image data of Nile tilapia (Oreochromis niloticus) covering 6 common diseases and healthy individuals (from the NTD-1 dataset), the model incorporates symmetry-aware design logic, leveraging the morphological and textural symmetry of healthy tilapia tissues to capture lesion-induced symmetry-breaking features, thereby improving fine-grained lesion detection accuracy. Through depth-width scaling coefficients, FDR-Net achieves lightweight optimization while integrating three core modules and a task-specific loss function for full-chain optimization: specifically, a Micro-lesion Feature Enhancement Module (MLFEM) is embedded in key feature layers of the backbone network to accurately extract edge and texture features of incipient fine-grained lesions via multi-scale frequency decomposition and residual fusion; subsequently, a Lightweight Multi-scale Position Attention Module (MS_PSA) and a Single-modal Intra-feature Contrastive Fusion Module (SMICFM) are collaboratively deployed—the former focusing on spatial localization of lesion features, and the latter enhancing lesion-background discriminability through channel-spatial feature recalibration and contrastive fusion; finally, a Class-Aware Weighted Hybrid Loss (CAWHL) function is combined with customized small-target anchor boxes to alleviate class imbalance and further improve localization and classification accuracy of fine-grained lesions. Empirical evaluations on the NTD-1 dataset demonstrate that compared with mainstream state-of-the-art baseline models, FDR-Net achieves a peak recognition accuracy of 90.1% with substantially enhanced mAP50-95 performance. Retaining lightweight characteristics, it exhibits superior performance in identifying incipient fine-grained lesions and strong adaptability to simulated complex aquaculture scenarios. Collectively, this study provides an efficient technical backbone for the rapid and precise detection of tilapia fine-grained lesions, offering a potential solution for precise disease management in tilapia farming. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry in Computer Vision Under Extreme Environments)
Show Figures

Figure 1

29 pages, 6909 KB  
Article
MDE-UNet: A Physically Guided Asymmetric Fusion Network for Multi-Source Meteorological Data Lightning Identification
by Yihua Chen, Yuanpeng Han, Yujian Zhang, Yi Liu, Lin Song, Jialei Wang, Xinjue Wang and Qilin Zhang
Remote Sens. 2026, 18(7), 1027; https://doi.org/10.3390/rs18071027 - 29 Mar 2026
Viewed by 205
Abstract
Utilizing multi-source meteorological data for lightning identification is crucial for monitoring severe convective weather. However, several key challenges persist in this field: dimensional imbalance and modal competition among multi-source heterogeneous data, model training bias caused by the extreme sparsity of lightning samples, and [...] Read more.
Utilizing multi-source meteorological data for lightning identification is crucial for monitoring severe convective weather. However, several key challenges persist in this field: dimensional imbalance and modal competition among multi-source heterogeneous data, model training bias caused by the extreme sparsity of lightning samples, and an imbalance between false alarms and missed detections resulting from complex background noise. To address these challenges, this paper proposes a lightning identification network guided by physical priors and constrained by supervision. First, to tackle the issue of modal competition in fusing satellite (high-dimensional) and radar (low-dimensional) data, a physical prior-guided asymmetric radar information enhancement mechanism is introduced. This mechanism uses radar physical features as contextual guidance to selectively enhance the latent weak radar signatures. Second, at the architectural level, a multi-source multi-scale feature fusion module and a weighted sliding window–multilayer perceptron (MLP) enhanced decoding unit are constructed. The former achieves the coupling of multi-scale physical features at a 2 km grid scale through cross-level semantic alignment, building a highly consistent feature field that effectively improves the model’s ability to detect lightning signals. The latter leverages adaptive receptive fields and the nonlinear modeling capability of MLPs to effectively smooth spatially discrete noise, ensuring spatial continuity in the reconstructed results. Finally, to address the model bias caused by severe class imbalance between positive and negative samples—resulting from the extreme sparsity of lightning events—an asymmetrically weighted BCE-DICE loss function is designed. Its “asymmetric” characteristic is implemented by assigning different penalty weights to false-positive and false-negative predictions. This loss function balances pixel-level accuracy and inter-class equilibrium while imposing high-weight penalties on false-positive predictions, achieving synergistic optimization of feature enhancement and directional suppression. Experimental results show that the proposed method effectively increases the hit rate while substantially reducing the false alarm rate, enabling efficient utilization of multi-source data and high-precision identification of lightning strike areas. Full article
Show Figures

Figure 1

27 pages, 7912 KB  
Article
Hierarchical Wetland Mapping in the East China Sea Based on Integrated Multifaceted Source Features
by Jie Wang, Yixuan Zhou, Xin Fang, Shengqi Wang, Haiyang Zhang and Runbin Hu
Remote Sens. 2026, 18(7), 1023; https://doi.org/10.3390/rs18071023 - 29 Mar 2026
Viewed by 263
Abstract
The East China Sea represents a critical coastal wetland region, characterized by complex geomorphology, heterogeneous land-cover composition, and diverse wetland types. Accurate delineation of coastal wetland extent is essential for ecosystem service assessment and sustainable coastal management, directly contributing to wetland-related Sustainable Development [...] Read more.
The East China Sea represents a critical coastal wetland region, characterized by complex geomorphology, heterogeneous land-cover composition, and diverse wetland types. Accurate delineation of coastal wetland extent is essential for ecosystem service assessment and sustainable coastal management, directly contributing to wetland-related Sustainable Development Goals (SDGs), particularly SDG 15, on ecosystem conservation and biodiversity protection. However, pronounced spectral similarity and structural heterogeneity among wetland classes pose substantial challenges to reliable classification. To address these challenges, this study developed a hierarchical classification framework integrating Random Forest, K-means clustering, and a decision tree classifier based on multi-source Sentinel-1 and Sentinel-2 imagery. Spectral, polarimetric, texture, and morphological features were systematically constructed to enhance class separability. Using this framework, a 10 m resolution coastal wetland map of the East China Sea was generated for 2023. The proposed approach achieved an overall accuracy of 91.32% and improved the discrimination of spectrally similar wetland types. Feature fusion reduced confusion among water-related classes, while object-based clustering improved the extraction of linear riverine wetlands. The resulting 10 m wetland map provides updated spatial information for ecological assessment and coastal management in the East China Sea. Full article
(This article belongs to the Special Issue Big Earth Data in Support of the Sustainable Development Goals)
Show Figures

Figure 1

20 pages, 2114 KB  
Article
Cross-Project Software Defect Prediction Based on Domain Adaptation and Feature Fusion
by Guanhua Guo, Yinglei Song and Peng Zhang
Algorithms 2026, 19(4), 253; https://doi.org/10.3390/a19040253 - 26 Mar 2026
Viewed by 252
Abstract
With the advancement of computer science, software has become increasingly prevalent across all facets of society, making software quality issues a focal point of industry concern. The scarcity of sufficient defect data in the early stages of projects undermines prediction accuracy, driving research [...] Read more.
With the advancement of computer science, software has become increasingly prevalent across all facets of society, making software quality issues a focal point of industry concern. The scarcity of sufficient defect data in the early stages of projects undermines prediction accuracy, driving research into cross-project software defect prediction. The traditional manual measurement features face challenges due to the data distribution discrepancies between original and cross-project contexts, which hinder the prediction effectiveness. Furthermore, single features fail to comprehensively characterize software information. This paper proposes a domain adaptation and feature fusion-based cross-project software defect prediction method (DAFF-CPDP). The model employs the TCA+ algorithm for domain adaptation and utilizes an encoder layer for progressive feature fusion. Multiple Java projects were selected for evaluation. The comparisons with various baseline models demonstrated that the proposed model outperforms both the traditional machine learning-based feature models and the diverse deep learning-based single-feature or multi-feature models. Concurrently, this paper analyzes the impact of different source projects on target projects, confirming that class-balanced datasets and datasets with smaller distribution differences are more conducive to project prediction. Full article
Show Figures

Figure 1

27 pages, 8177 KB  
Article
DINOv3-PEFT: A Dual-Branch Collaborative Network with Parameter-Efficient Fine-Tuning for Precise Road Segmentation in SAR Imagery
by Debao Chen, Wanlin Yang, Ye Yuan and Juntao Gu
Remote Sens. 2026, 18(7), 973; https://doi.org/10.3390/rs18070973 - 24 Mar 2026
Viewed by 226
Abstract
Extracting road networks from Synthetic Aperture Radar (SAR) data represents a core challenge in remote sensing scene analysis, particularly for applications in traffic monitoring and emergency management. The task is complicated by several inherent limitations: speckle noise degrades image quality, geometric distortions arise [...] Read more.
Extracting road networks from Synthetic Aperture Radar (SAR) data represents a core challenge in remote sensing scene analysis, particularly for applications in traffic monitoring and emergency management. The task is complicated by several inherent limitations: speckle noise degrades image quality, geometric distortions arise from the side-looking acquisition geometry, and roads often exhibit weak radiometric separation from surrounding terrain. Traditional processing pipelines and recent single-branch deep learning frameworks have shown insufficient performance when global contextual reasoning and fine-scale spatial detail must both be addressed. This work presents DINOv3-PEFT, a parameter-efficient dual-encoder network designed specifically for SAR road segmentation. The architecture employs two complementary processing streams tailored to SAR characteristics: one stream utilizes adapter-based fine-tuning applied to pre-trained DINOv3 weights (kept frozen), which captures long-distance spatial relationships crucial for maintaining network connectivity despite speckle corruption. The second stream, based on convolutional operations, focuses on extracting localized geometric features that preserve the narrow, elongated structure and sharp boundaries typical of road infrastructure. Feature fusion occurs through the Topological-Geometric Feature Integration (TGFI) Module, which synthesizes multi-scale representations hierarchically. This mechanism proves effective at bridging fragmented road segments and recovering geometric accuracy in scenarios with heavy shadow casting or signal interference. Performance evaluation on the GF-3 satellite dataset across four spatial resolutions (1 m, 3 m, 5 m, and 10 m) demonstrates the proposed method achieves an 82.61% F1-score, a 76.51% IoU, and a 98.08% overall accuracy, all averaged across the four resolutions. When benchmarked against six state-of-the-art methods, DINOv3-PEFT demonstrates substantial improvements in road class segmentation quality and topological connectivity preservation, supporting its robustness for operational SAR road mapping tasks. Full article
Show Figures

Figure 1

18 pages, 2903 KB  
Article
Infrasound Signal Classification Fusion Model Based on Double-Branch and Multi-Scale CNN and LSTM
by Hao Yin, Yu Lu, Yunhui Wu, Wei Cheng, Xinliang Pang and Peng Li
Acoustics 2026, 8(2), 21; https://doi.org/10.3390/acoustics8020021 - 24 Mar 2026
Viewed by 290
Abstract
The accurate classification of infrasound events is significant in natural disaster warning, verification of nuclear test bans and geophysical research. Current deep learning-based classification methods mostly focus on denoised and filtered signals. To simplify the process, avoid information loss, and address the issues [...] Read more.
The accurate classification of infrasound events is significant in natural disaster warning, verification of nuclear test bans and geophysical research. Current deep learning-based classification methods mostly focus on denoised and filtered signals. To simplify the process, avoid information loss, and address the issues of incomplete feature extraction by single-scale convolution kernels and the potential loss of physical information by single models, this paper directly utilizes raw infrasound signals and proposes two fusion classification models based on multi-scale Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM). Experiments were conducted on a typical infrasound signal dataset (comprising four signal types: mountain-associated waves, auroral infrasound waves, volcanic eruptions, and microbaroms). The performances of the two models were compared in terms of accuracy, convergence speed, and stability. The results indicate that both models achieve classification accuracies exceeding 99% with optimal parameter combinations. The dual-branch multi-scale CNN-LSTM model generally outperforms the multi-scale CNN-LSTM model in classification accuracy, while also demonstrating faster convergence speed and better stability. Addressing the class imbalance in the dataset, evaluations using precision, recall, and F1-score further validated the effectiveness of the proposed models. This study demonstrates that the proposed methods can effectively achieve end-to-end classification of raw infrasound signals and are competitive with existing techniques. Full article
Show Figures

Figure 1

23 pages, 1109 KB  
Review
Strategies for Class-Imbalanced Learning in Multi-Sensor Medical Imaging
by Da Zhou, Song Gao and Xinrui Huang
Sensors 2026, 26(6), 1998; https://doi.org/10.3390/s26061998 - 23 Mar 2026
Viewed by 391
Abstract
This narrative critical review addresses class imbalance in medical imaging, particularly within the context of multi-sensor and multi-modal environments, poses a critical challenge to developing reliable AI diagnostic systems. The integration of heterogeneous data from sources like CT, MRI, and PET presents a [...] Read more.
This narrative critical review addresses class imbalance in medical imaging, particularly within the context of multi-sensor and multi-modal environments, poses a critical challenge to developing reliable AI diagnostic systems. The integration of heterogeneous data from sources like CT, MRI, and PET presents a unique opportunity to address data scarcity for rare conditions through fusion techniques. This review provides a structured analysis of strategies to tackle class imbalance, categorizing them into data-centric (e.g., advanced resampling like SMOTE-ENC for mixed data types, GAN-based synthesis) and model-centric (e.g., loss function engineering, transfer learning, and ensemble methods) approaches. Crucially, we highlight how multi-sensor feature fusion and decision-level fusion paradigms can inherently enrich representations for minority classes, offering a powerful frontier beyond single-modality learning. We evaluate each method’s merits, clinical viability, and compliance considerations (e.g., FDA). Finally, we identify emerging trends where imbalance-aware learning synergizes with multi-sensor fusion frameworks, federated learning, and explainable AI, charting a roadmap toward robust, equitable, and clinically deployable diagnostic tools. Our quantitative synthesis shows that data-centric strategies can improve minority class recall by 12–35% in datasets with imbalance ratios (majority:minority) ≥10:1, while model-centric strategies achieve an average AUC improvement of 0.08–0.21 in multi-sensor medical imaging tasks with sample sizes ranging from 50 to 50,000. Full article
(This article belongs to the Special Issue Multi-sensor Fusion in Medical Imaging, Diagnosis and Therapy)
Show Figures

Figure 1

19 pages, 1890 KB  
Article
PolSAR Forest Height Inversion Based on Multi-Class Feature Fusion
by Bing Zhang, Jinze Li, Jichao Zhang, Dongfeng Ren, Weidong Song, Jianjun Zhu and Cui Zhou
Remote Sens. 2026, 18(6), 946; https://doi.org/10.3390/rs18060946 - 20 Mar 2026
Viewed by 262
Abstract
Forest height is a key structural parameter for characterizing forest architecture and estimating carbon storage. However, under complex terrain and heterogeneous forest conditions, Polarimetric synthetic aperture radar (PolSAR)-based forest height inversion using multi-category features still faces several challenges, including feature redundancy, insufficient characterization [...] Read more.
Forest height is a key structural parameter for characterizing forest architecture and estimating carbon storage. However, under complex terrain and heterogeneous forest conditions, Polarimetric synthetic aperture radar (PolSAR)-based forest height inversion using multi-category features still faces several challenges, including feature redundancy, insufficient characterization of the nonlinear couplings among high-dimensional features by deep learning models, and the difficulty of jointly achieving model stability and interpretability. In this paper, to address these issues, we propose a method for SHapley Additive exPlanations (SHAP) interpretability-driven PolSAR forest height inversion based on deep learning and multi-category feature fusion. Firstly, a deep neural network (DNN) is constructed, and SHAP is introduced to interpret the model decision process, enabling the identification of key feature interactions with clear physical significance and guiding the iterative model optimization in an explainability-driven manner. Furthermore, a SHAP-guided feature attention DNN is developed, in which the feature contribution scores are incorporated as prior knowledge for attention weight initialization, thereby establishing a closed-loop modeling framework from “interpretation” to “optimization”. Experiments were conducted at the site of the Huangfengqiao forest farm, Youxian County, Hunan province, China, using ALOS-2 L-band fully polarimetric SAR imagery. The experimental results demonstrated that the proposed method can significantly outperform the conventional machine learning approaches and various deep learning architectures for forest height inversion. The final model achieved a coefficient of determination (R2) score of 0.75 and a root-mean-square error (RMSE) of 1.35 m on the test dataset. These findings indicate that the combination of SHAP-driven multi-category feature fusion and deep learning can effectively enhance both the inversion accuracy and physical interpretability, providing a reliable solution for PolSAR-based forest structural parameter retrieval at the Huangfengqiao study site, with potential applicability to complex terrain conditions. Full article
Show Figures

Figure 1

41 pages, 14137 KB  
Article
Hierarchical Extraction and Multi-Feature Optimization of Complex Crop Planting Structures in the Hetao Irrigation District Based on Multi-Source Remote Sensing Data
by Shan Yu, Rong Li, Wala Du, Lide Su, Buqi Na and Liangliang Yu
Remote Sens. 2026, 18(6), 937; https://doi.org/10.3390/rs18060937 - 19 Mar 2026
Viewed by 291
Abstract
Accurate extraction of crop planting structures is important for crop area and yield estimation, but complex and fragmented cropping patterns with overlapping phenology in the Hetao Irrigation District hinder reliable crop discrimination. This study proposes a hierarchical workflow that integrates vegetation masking with [...] Read more.
Accurate extraction of crop planting structures is important for crop area and yield estimation, but complex and fragmented cropping patterns with overlapping phenology in the Hetao Irrigation District hinder reliable crop discrimination. This study proposes a hierarchical workflow that integrates vegetation masking with multi-source feature optimization for crop mapping. First, dual-temporal Sentinel-2 imagery (May and August) is used to generate a vegetation region-of-interest(ROI) mask via Otsu thresholding applied to the Normalized Difference Vegetation Index (NDVI), combined with pixel-wise maximum-value fusion to reduce phenology-driven omissions and background interference. Second, within the vegetation mask, Sentinel-2 spectral, vegetation-index, and texture features are combined with Sentinel-1 synthetic aperture radar (SAR) backscatter and SAR texture features to construct a multi-source feature set. Random Forest(RF) feature-importance ranking is used to select an effective feature subset, and four classifiers (RF, support vector machine (SVM), eXtreme Gradient Boosting (XGBoost), and convolutional neural network (CNN)) are compared under the same training/validation setting. The vegetation extraction achieves an overall accuracy of 91% (Kappa = 0.80). Using Sentinel-2 features only, the optimized subset with CNN attains the best performance (overall accuracy = 95%, Kappa = 0.93). Adding Sentinel-1 SAR texture features provides an additional improvement (overall accuracy = 96%, Kappa = 0.94), particularly for classes prone to confusion in fragmented plots. Area proportions derived from the final map are consistent with statistical yearbook data (percentage errors: maize 3.45%, sunflower 2.66%, wheat 0.11%, tomato 0.92%) under the study conditions. This workflow supports practical crop-structure monitoring in complex irrigation districts. Full article
Show Figures

Figure 1

Back to TopTop