MDPI - Publisher of Open Access Journals

23 pages, 12407 KB

Open AccessArticle

ADS-MIR: A Machine Perception-Oriented Visible-Infrared Sensor Fusion Framework for Intelligent Transportation Perception Under Complex Illumination Conditions

by Jun Yang, Jianguo Wu, Xiaolan Zhang, Zenglong Yang, Hongfei Shen, Botao Shen and Chang Zeng

Sensors 2026, 26(12), 3675; https://doi.org/10.3390/s26123675 - 9 Jun 2026

Viewed by 243

Abstract

Multimodal sensor fusion in intelligent transportation systems faces severe challenges in maintaining reliable visual information acquisition under complex illumination conditions. Extreme low-light and intense glare significantly degrade visible-light sensor imaging quality, making it difficult for single-modal vision systems to maintain reliable target perception. [...] Read more.

Multimodal sensor fusion in intelligent transportation systems faces severe challenges in maintaining reliable visual information acquisition under complex illumination conditions. Extreme low-light and intense glare significantly degrade visible-light sensor imaging quality, making it difficult for single-modal vision systems to maintain reliable target perception. Meanwhile, although infrared sensors provide a relatively stable saliency complement for target regions, modal discrepancies and spatial misalignment between heterogeneous visible and infrared sensors often degrade fusion performance, limiting the practical benefits of multimodal sensing for machine perception. To address these issues, this study proposes Aligned, Dual-Gated, and Saliency-Guided MIRNet (ADS-MIR), a machine perception-oriented visible-infrared sensor fusion framework that enhances the discriminability and structural representation of target regions for roadside perception sensors operating under complex conditions. Specifically, the framework employs a domain alignment layer to mitigate feature distribution discrepancies and spatial misalignment between heterogeneous sensor modalities. An illumination-guided adaptive gating mechanism dynamically modulates bimodal sensor feature contributions, while a saliency-guided frequency decoupling reinforcement strategy reinforces target-related high-frequency edge details. Experimental results on the LLVIP and M3FD datasets demonstrate that ADS-MIR improves the edge information transfer factor (

Q^{A B / F}

) by 49.6% to 111.6% compared with existing methods, highlighting its distinct advantage in preserving target contours and restoring edge information. Furthermore, the enhanced results provide more discriminative input features for downstream object detection, exhibiting more stable perception capabilities under complex illumination and challenging sensing scenarios. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

24 pages, 62342 KB

Open AccessArticle

DCAFuse: A Differential Cross-Attention Transformer Network for Infrared and Visible Image Fusion in UAV-Based Wilderness Search and Rescue

by Yu Jing, Yili Yan, Zhao Li, Fugui Qi, Tao Lei, Jianqi Wang and Guohua Lu

Drones 2026, 10(6), 449; https://doi.org/10.3390/drones10060449 - 9 Jun 2026

Viewed by 185

Abstract

Infrared and visible image fusion is critical for unmanned aerial vehicle (UAV) wilderness search and rescue. By integrating thermal radiation of the targets and texture details of the scenario, it enables accurate search for the wounded and comprehensive perception of disaster areas, thereby [...] Read more.

Infrared and visible image fusion is critical for unmanned aerial vehicle (UAV) wilderness search and rescue. By integrating thermal radiation of the targets and texture details of the scenario, it enables accurate search for the wounded and comprehensive perception of disaster areas, thereby significantly improving emergency rescue efficiency. To alleviate data scarcity, we construct UAV-MSR, an infrared-visible dataset for casualty search, comprising 3889 paired images captured under diverse weather, illumination, and scenarios. Existing Transformer-based fusion methods mainly focus on high-intensity pixels while inadequately modeling low-intensity complementary features, resulting in blurred details and degraded target contrast in fused images. To this end, we propose a novel differential cross-attention Transformer network to address the issue of complementary information loss. Specifically, the encoder integrates convolution operations for local detail extraction and self-attention mechanisms for global context modeling. Then, we design a differential cross-attention guided feature fusion module to enhance the representation and preservation of detailed complementary features. Furthermore, a pixel loss function with a segmentation strategy is employed to improve the saliency of the target, enabling the fused image to facilitate subsequent target detection tasks. Experimental results and ablation studies demonstrate that the proposed method achieves notable performance and generalization ability. In summary, this work delivers a multimodal dataset and an efficient infrared-visible image fusion network to enable comprehensive perception for UAVs in wilderness search and rescue scenarios. Full article

► Show Figures

Figure 1

39 pages, 82622 KB

Open AccessArticle

Small-Target Ship Detection with Joint Spatio-Temporal Features Across Multiple Frames

by Ye Qian, Zhen Hu, Bo Zhang, Wenguang Yang and Qian Chen

Sensors 2026, 26(11), 3588; https://doi.org/10.3390/s26113588 - 4 Jun 2026

Viewed by 281

Abstract

Detecting small ship targets in sea–sky background environments is challenging due to interference from clouds, islands, sea clutter, and the limited spatial information in long-range infrared imagery. To address these issues, this paper proposes a robust detection framework that integrates multi-scale spatial feature [...] Read more.

Detecting small ship targets in sea–sky background environments is challenging due to interference from clouds, islands, sea clutter, and the limited spatial information in long-range infrared imagery. To address these issues, this paper proposes a robust detection framework that integrates multi-scale spatial feature enhancement with temporal trajectory analysis. First, a candidate target extraction method based on a multi-scale differential histogram of oriented gradients is introduced. By exploiting gradient distribution differences between targets and surrounding backgrounds, our method effectively enhances target responses while suppressing structured background edges. This response is further fused with a log-spectrum-based saliency map to improve target contrast and reduce clutter. Next, a candidate trajectory extraction algorithm based on inverse optical flow matching is developed to utilize temporal consistency. Optical flow-based grayscale compensation predicts target intensity changes between frames, while Kalman filtering estimates motion states and performs trajectory association. Finally, a multi-feature trajectory filtering strategy is designed, combining motion entropy stability, peak signal-to-noise ratio, and trajectory lifecycle to distinguish true targets from false alarms. Experimental results on eight infrared maritime sequences demonstrate superior performance. The proposed method achieves an average Background Suppression Factor (BSF) of 45.2 and an average Signal-to-Clutter Ratio Gain (SCRG) of 22.3 × 10³, representing a substantial improvement over all baseline algorithms. Receiver Operating Characteristic analysis further confirms a mean detection rate exceeding 90% at a false-alarm rate of 10⁻³ across all sequences, confirming improved detection performance and robustness in complex maritime environments. Full article

(This article belongs to the Special Issue Sensor Techniques for Signal, Image and Video Processing)

► Show Figures

Figure 1

38 pages, 7245 KB

Open AccessArticle

A Hybrid Architecture of CNN–Swin-T Integrated with Attention Mechanism and Explainable AI for Alzheimer’s Disease Classification

by Saeed Mohsen, Saada Khadragy, Norah Alnaim, Noorah Albehaijan and Ahmed F. Ibrahim

Computers 2026, 15(6), 361; https://doi.org/10.3390/computers15060361 - 3 Jun 2026

Viewed by 243

Abstract

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that requires early and accurate diagnosis to improve patient outcomes. In this paper, an attention-enhanced hybrid deep learning (DL) framework is proposed that combines Convolutional Neural Network (CNN) and Swin Transformer (Swin-T) architectures for multi-class [...] Read more.

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that requires early and accurate diagnosis to improve patient outcomes. In this paper, an attention-enhanced hybrid deep learning (DL) framework is proposed that combines Convolutional Neural Network (CNN) and Swin Transformer (Swin-T) architectures for multi-class Alzheimer’s classification. The proposed model integrates an attention mechanism to enhance feature representation and improve classification performance. Experiments are conducted on a dataset containing three classes: Mild Demented, Very Mild Demented, and Non-Demented. To improve the model’s generalization, data augmentation techniques are applied to enhance the model’s performance. Additionally, three explainable artificial intelligence (XAI) techniques are employed, including Grad-CAM++, Integrated Gradients, and Saliency maps, to interpret the model’s predictions and to provide visual insights into decision-making processes. The proposed attention-enhanced hybrid CNN–Swin-T model achieves a testing accuracy of 99.92% and reaches 99.71%, 99.73%, and 99.72%, for precision, recall, and F1-score, respectively. The hybrid CNN–Swin-T with attention outperforms three implemented models: baseline CNN, standalone Swin-T, and hybrid CNN–Swin-T. The explainability results validate the proposed model’s focus on relevant regions, increasing trust in automated diagnosis systems. Finally, a comparative analysis with an ablation study is presented to demonstrate that the integration of the attention mechanism with a hybrid CNN–Swin-T architecture leads to the highest performance and more reliable predictions compared to the other three models. Full article

(This article belongs to the Section AI-Driven Innovations)

► Show Figures

Figure 1

21 pages, 2539 KB

Open AccessArticle

CG-IRNet: Structure–Confidence Hybrid Learning for Low-False-Alarm Infrared Small Target Detection

by Ziwen Zhu and Mengmeng Liao

Electronics 2026, 15(11), 2405; https://doi.org/10.3390/electronics15112405 - 1 Jun 2026

Viewed by 211

Abstract

Infrared small target detection (IRSTD) is a task in target detection and computer vision that remains challenging but also critical. The cause of its complexity and difficulty lies in the inherent features of this class of targets, as most of the dataset has [...] Read more.

Infrared small target detection (IRSTD) is a task in target detection and computer vision that remains challenging but also critical. The cause of its complexity and difficulty lies in the inherent features of this class of targets, as most of the dataset has extreme class imbalance, weak classification contrast, and complex noise clutter in the background. Focusing on these existing issues, this work proposes CG-IRNet, a structure-aware detection framework that integrates multi-scale feature aggregation with Structure–Confidence Hybrid (SCH) loss, which integrates an augmented variant of confidence-aware Scale–Location Sensitive (SLS) loss with instance-wise structural supervision and a confidence-guided background suppression mechanism, which are all targeted towards enhancing localization consistency while largely reducing false alarms. In addition to these, a frequency-aware feature refinement module is incorporated to strengthen small target saliency under highly cluttered scenes. This work included a series of extensive experiments across three benchmark datasets included in SIRST, namely IRSTD-1K, NUAA-SIRST, and NUDT-SIRST. These experiments demonstrate a superior trade-off between detection probability (Pd) and false alarm rate. On IRSTD-1K, CG-IRNet achieves 65.09 mIoU and reduces the false alarm rate to 30.992 × 10⁻⁶, which is significantly lower than SCTransNet (55.74 × 10⁻⁶) at the same detection probability (93.27%). On NUAA-SIRST and NUDT-SIRST, the proposed method achieves 96.95% and 98.62% detection probability, respectively, while maintaining competitive or lower false alarm rates under challenging background conditions. These outcomes effectively demonstrate the improvements achieved in this work and the effectiveness of the proposed confidence-guided suppression and structure-aware optimization. Also included in the group of experiments performed in this work is the ablation study on model hyperparameters and qualitative analyses, which further confirm the joint improvements contributed by the proposed structural supervision and confidence-aware design, particularly in regimes where a low false alarm rate is the goal of optimization. Full article

► Show Figures

Figure 1

27 pages, 37256 KB

Open AccessArticle

CFP-DETR: Collaborative Feature Purification Network with Spatial Alignment for Aerial Small Object Detection

by Sihui Wang, Zhihang Guo, Zhenjie Yu and Zhangbing Zhou

Remote Sens. 2026, 18(11), 1750; https://doi.org/10.3390/rs18111750 - 30 May 2026

Viewed by 214

Abstract

Object detection in aerial imagery faces extreme target sparsity and high-intensity environmental interference, causing weak targets to be submerged in background clutter. To address this, we propose a Collaborative Feature Purification Detection Transformer (CFP-DETR), which reconstructs discriminative target representations through a collaborative feature [...] Read more.

Object detection in aerial imagery faces extreme target sparsity and high-intensity environmental interference, causing weak targets to be submerged in background clutter. To address this, we propose a Collaborative Feature Purification Detection Transformer (CFP-DETR), which reconstructs discriminative target representations through a collaborative feature purification mechanism. Specifically, the Global Context Denoising Module (GCDM) first suppresses environmental noise at the semantic level to enhance target saliency. The purified features are then fused across scales through an Adaptive Cross-scale Feature Alignment (ACFA) module, which resolves spatial misalignment that otherwise dilutes small-object features during multi-level interaction. Concurrently, a Fine-Grained Detail Injection Module (FGDIM) recovers shallow high-resolution details and injects them into the semantic flow, compensating for information loss caused by progressive downsampling. Together, these modules denoise, align, and recover features to counteract submergence at different stages. Additionally, an efficient lightweight variant, Efficient Lightweight CFP-DETR (EL-CFP-DETR), reconstructs the backbone with partial convolution and structural re-parameterization to improve efficiency while maintaining competitive detection accuracy. Extensive experiments across five datasets validate the effectiveness of this collaborative design. On the SeaDronesSee dataset, CFP-DETR increases

A P_{50}

and

A P_{S}^{v a l}

by 1.64% and 4.03% over the baseline, while EL-CFP-DETR reduces parameters by 18% to 16.4M and GFLOPs by 15% to 48.3, reaching 42.8 FPS. Notably, CFP-DETR achieves an inference speed of 37.72 FPS, a 31.2% improvement over the baseline Real-Time Detection Transformer (RT-DETR). Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

24 pages, 28629 KB

Open AccessArticle

TailBoost: Tail-Synthetic Learning for Boosting Long-Tailed Skin Cancer Image Classification

by Tianyunxi Wei, Yijin Huang, Li Lin, Pujin Cheng and Xiaoying Tang

Sensors 2026, 26(11), 3343; https://doi.org/10.3390/s26113343 - 25 May 2026

Viewed by 371

Abstract

Skin cancer image data often exhibit long-tailed distributions due to the inherent challenges in data collection and annotation. Specifically, a few predominant classes dominate a dataset of interest, while minority classes, referred to as tail classes, are underrepresented with only limited numbers of [...] Read more.

Skin cancer image data often exhibit long-tailed distributions due to the inherent challenges in data collection and annotation. Specifically, a few predominant classes dominate a dataset of interest, while minority classes, referred to as tail classes, are underrepresented with only limited numbers of samples. Such imbalance is highly likely to adversely affect the performance of deep learning models. To address this issue, previous methods employ mixup techniques to synthesize tail-class images, thereby attempting to balance the training data. However, traditional mixup methods typically do not specifically pay attention to specific regions of interest, blending two images with indistinction between objects of interest and background. Such disregard for important semantic features may result in synthetic samples with broken or distorted diagnostic features. In this work, we introduce a novel framework, the Tail-synthetic Learning for Boosting Long-tailed Skin Cancer Image Classification (TailBoost) framework. Our approach generates a new tail-class image by combining a tail-class image with a head-class image under the guidance of their corresponding saliency maps. This strategy, namely SPMix, preserves and enhances the discriminative features of the tail-class image with minimum interference from the head-class image. We further refine the learned representations by incorporating supervised contrastive learning with class-center rebalance. Extensive experiments on the ISIC2018, ISIC2019, and PAD-UFES-20 datasets demonstrate that TailBoost outperforms existing state-of-the-art long-tailed learning methods. Full article

(This article belongs to the Special Issue Advanced Sensing Techniques in Biomedical Signal Processing)

► Show Figures

Figure 1

29 pages, 2568 KB

Open AccessArticle

Crack Segmentation Model for Low-Quality Crack Images Based on Feature Integration and Triple Attention

by Yonghua Xie and Yuyang Wang

Appl. Sci. 2026, 16(11), 5185; https://doi.org/10.3390/app16115185 - 22 May 2026

Viewed by 136

Abstract

To address the problem of road crack detection in low-quality pavement images, existing semantic segmentation methods still have shortcomings such as missed crack detection and inaccurate localization due to weak crack boundaries, low contrast, and complex pavement texture. To address these limitations, this [...] Read more.

To address the problem of road crack detection in low-quality pavement images, existing semantic segmentation methods still have shortcomings such as missed crack detection and inaccurate localization due to weak crack boundaries, low contrast, and complex pavement texture. To address these limitations, this study proposes a crack segmentation model based on feature integration and a triple attention mechanism. The model uses DeepLabv3+ as the backbone network and introduces the proposed three-dimensional interactive attention module after feature extraction. The attention module enhances the extraction of key features related to the spatial location and morphological details of cracks, thereby improving the ability of crack location. A hierarchical feature integration branch is introduced in the cross-layer connection, and a dimension-aware selective fusion module is used to enhance the saliency of small cracks in complex backgrounds. In addition, the proposed multi-group dilation feature fusion module is introduced to improve the multi-scale modeling of small and slender cracks and reduce background interference. The experimental results on Crack500 and GAPS384 datasets show that the proposed model achieves better overall segmentation performance than the comparison model, especially in reducing the missed detection of weak, small, and discontinuous cracks in low-quality pavement images. Complexity analysis further shows that the proposed model maintains practical inference efficiency rather than relying on too large a model size. These results show that the proposed method provides an effective solution for low-quality road crack segmentation, but it still needs to be further verified in actual detection scenarios. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

18 pages, 1802 KB

Open AccessArticle

User Requirements Analysis for Audiovisual Products Based on User Review Data

by Chuchu Liu, Xin Zhang, Mengsi Cai and Zheng Han

J. Theor. Appl. Electron. Commer. Res. 2026, 21(5), 157; https://doi.org/10.3390/jtaer21050157 - 20 May 2026

Viewed by 283

Abstract

This study analyzed online review data to examine user requirements for audiovisual products and to compare requirement salience and satisfaction across traditional and emerging product contexts. We collected 86,213 Chinese-language reviews of Skyworth TVs, Xiaomi TVs, and Xiaomi projectors from JD.com. LDA topic [...] Read more.

This study analyzed online review data to examine user requirements for audiovisual products and to compare requirement salience and satisfaction across traditional and emerging product contexts. We collected 86,213 Chinese-language reviews of Skyworth TVs, Xiaomi TVs, and Xiaomi projectors from JD.com. LDA topic modeling was used to identify major user requirement areas, and Logistic Regression, Random Forest, and Support Vector Machine (SVM) models were compared for sentiment classification, with the tuned SVM model retained for downstream analysis. The results show that user discussions primarily concern audiovisual experience, cost performance, service quality, design aesthetics, and intelligent operation. Skyworth TVs receive particularly strong evaluations for picture and sound quality (97.89% positive sentiment), whereas Xiaomi TVs are more strongly associated with cost-effectiveness and smart features (94.05% positive sentiment). Xiaomi projectors attract attention for portability but receive lower satisfaction ratings on core audiovisual performance and intelligent operation. These findings suggest that traditional manufacturers should continue strengthening core performance while improving service responsiveness, whereas emerging brands should build on their technological advantages while further enhancing their product reliability and user experience. Full article

► Show Figures

Figure 1

31 pages, 29237 KB

Open AccessArticle

ARTEMIS: An Explainable AI Framework for Multi-Class COVID-19 Diagnosis with a Newly Curated Dataset

by Muhammet Emin Sahin, Hasan Ulutas, Mustafa Fatih Erkoc, Baris Karakaya, Recep Batuhan Günay and Enes Eren Suzgen

Bioengineering 2026, 13(5), 588; https://doi.org/10.3390/bioengineering13050588 - 20 May 2026

Viewed by 326

Abstract

In this work, we propose ARTEMIS, a novel and highly interpretable deep learning pipeline for the automatic classification of Chest X-ray (CXR) and Computed Tomography (CT) images into different categories related to important clinical outcomes: COVID-19 infection, Community-Acquired Pneumonia (CAP) cases, and Normal [...] Read more.

In this work, we propose ARTEMIS, a novel and highly interpretable deep learning pipeline for the automatic classification of Chest X-ray (CXR) and Computed Tomography (CT) images into different categories related to important clinical outcomes: COVID-19 infection, Community-Acquired Pneumonia (CAP) cases, and Normal cases. Unlike existing models based on the static feature enhancement step, ARTEMIS proposes a learnable preprocessing component that dynamically adapts the image contrast and sharpness in training mode, facilitating adaptive optimization. Our hybrid network combines EfficientNet-B0 backbone with built-in SE attention with the optional lightweight Transformer encoder block to jointly learn local radiological features and global relationships between pixels. Comprehensive experiments have been conducted on five different datasets, which comprise four publicly available ones and one novel CT dataset annotated by radiologists, including X-ray and CT modalities. Experimental results show strong robustness and generalization with macro F1-scores greater than 96% on public datasets and 99.39% accuracy on our new CT dataset. To interpret the decision-making process, Grad-CAM++ is employed to generate class-discriminative saliency maps; the highlighted regions are systematically validated against established radiological criteria by a board-certified radiologist, confirming that model decisions are grounded in clinically meaningful pulmonary findings rather than imaging artifacts. Full article

(This article belongs to the Special Issue Explainable Artificial Intelligence (XAI) in Medical Imaging)

► Show Figures

Figure 1

17 pages, 1515 KB

Open AccessArticle

Attention-Based Multimodal Fusion for Salience-Aware Blended Emotion Recognition

by José Salas-Cáceres, Modesto Castrillón-Santana, Oliverio J. Santana, Daniel Hernández-Sosa and Javier Lorenzo-Navarro

Multimodal Technol. Interact. 2026, 10(5), 56; https://doi.org/10.3390/mti10050056 - 20 May 2026

Viewed by 473

Abstract

Blended emotion recognition introduces the challenge of identifying not only which emotions are present in an expressive display but also their relative salience. The proposed methodology builds upon the pre-extracted features provided with the dataset and enhances performance through a combination of temporal [...] Read more.

Blended emotion recognition introduces the challenge of identifying not only which emotions are present in an expressive display but also their relative salience. The proposed methodology builds upon the pre-extracted features provided with the dataset and enhances performance through a combination of temporal modeling and multimodal fusion strategies. Unimodal experiments revealed that visual encoders consistently outperformed audio ones, with the multimodal HiCMAE encoder achieving the strongest single-encoder results with 34% presence accuracy and 18.23% salience accuracy. Multimodal fusion further improved performance, with the best validation results obtained using a combination of simple concatenation and attention-based fusion, reaching 47.86% in presence accuracy and 27.92% in salience accuracy. Overall, the proposed methodology surpasses the chosen baseline introduced in the original paper across a k-fold experiment, confirming the effectiveness of multimodal attention-based fusion for the accurate prediction of both emotion presence and salience in blended affective behaviour. The experimental results further indicate that multimodal expression recognition consistently outperforms unimodal approaches, highlighting the complementary nature of cross-modal information. Full article

► Show Figures

Figure 1

20 pages, 5808 KB

Open AccessTechnical Note

LMRD: A Large-Scale Multi-Source Rotated Dataset for SAR Ship Detection

by Yujia Cheng, Zhaocheng Wang, Yu Chen, Yu Zhang, Yong Chen and Hongdong Zhao

Remote Sens. 2026, 18(10), 1639; https://doi.org/10.3390/rs18101639 - 20 May 2026

Viewed by 154

Abstract

The rapid development of synthetic aperture radar (SAR) imaging technology has significantly enhanced maritime monitoring capabilities; however, SAR ship detection remains constrained by the limited scale and representation capacity of existing rotated bounding box datasets. Most publicly available datasets rely on horizontal annotations, [...] Read more.

The rapid development of synthetic aperture radar (SAR) imaging technology has significantly enhanced maritime monitoring capabilities; however, SAR ship detection remains constrained by the limited scale and representation capacity of existing rotated bounding box datasets. Most publicly available datasets rely on horizontal annotations, which introduce redundancy and localization ambiguity in densely distributed and nearshore scenarios. Although rotated bounding boxes provide more precise geometric representation, large-scale multi-source rotated SAR datasets are still insufficient to support robust model training. To address this limitation, we construct a large-scale multi-source rotated SAR ship dataset (LMRD) consisting of 13,024 high-resolution image chips with over 38,000 annotated ship instances, covering multiple satellite sources, polarization modes, and diverse maritime environments, including offshore, nearshore, complex coastal, and densely distributed port scenes, thereby enhancing scene diversity and annotation precision. Furthermore, independent of the dataset construction, we propose a multi-domain feature fusion (MDF) framework built upon Oriented RCNN, which integrates high-frequency information and visual saliency cues to improve feature representation under complex backgrounds. Experimental results on the LMRD demonstrate that, compared with the baseline Oriented RCNN, the proposed MDF framework achieves a 2.7% improvement in average precision. Additional analysis indicates that the dataset characteristics and the multi-domain fusion strategy contribute to performance enhancement at different stages of the detection pipeline, validating the effectiveness of the proposed dataset for rotated ship detection while demonstrating the complementary role of multi-domain feature enhancement. Full article

(This article belongs to the Special Issue SAR Monitoring of Marine and Coastal Environments)

► Show Figures

Figure 1

37 pages, 4167 KB

Open AccessArticle

EGMamba-Net: Edge-Guided Global–Local Mamba Network with Region-Adaptive Routing for Salient Object Detection in Optical Remote Sensing Images

by Fubin Zhang, Zichi Zhang and Feihu Zhang

Remote Sens. 2026, 18(10), 1568; https://doi.org/10.3390/rs18101568 - 14 May 2026

Viewed by 387

Abstract

Salient object detection in optical remote sensing images remains challenging due to complex backgrounds, blurred boundaries, small objects, unstable foreground–background contrast, and dense object distributions. Existing convolution-based methods are effective at modeling local structures, but they are limited in capturing long-range dependencies, whereas [...] Read more.

Salient object detection in optical remote sensing images remains challenging due to complex backgrounds, blurred boundaries, small objects, unstable foreground–background contrast, and dense object distributions. Existing convolution-based methods are effective at modeling local structures, but they are limited in capturing long-range dependencies, whereas Transformer-based approaches usually incur substantial computational cost when handling high-resolution remote sensing imagery. To address these issues, this paper proposes EGMamba-Net, an edge-guided global–local collaborative network for salient object detection in optical remote sensing images. Specifically, a hybrid global–local backbone is first constructed to preserve shallow texture, edge, and geometric details while introducing Mamba-based global modeling in deeper stages for efficient long-range dependency representation. An Edge Prior Enhancement Module (EPEM) is then designed to explicitly extract boundary priors from shallow features and refine feature representations through edge-guided modulation. To alleviate the representation conflict between global semantics and local details, a Global–Local Interaction Module (GLIM) is further developed, where convolutional local modeling and Mamba-based global modeling interact through cross-gating for complementary feature learning. Moreover, a Region-Adaptive Routing Decoder (RARD) is introduced to dynamically assign different refinement paths according to regional saliency response, boundary intensity, and contextual complexity, thereby improving the recovery of small, low-contrast, and densely distributed objects. In addition, a Difficulty-Aware Joint Loss (DAJL) is designed to enhance optimization on boundary regions and hard samples, improving robustness under challenging conditions. Extensiveexperiments on ORSSD, EORSSD, and ORSI-4199 datasets demonstrate the superiority of the proposed method. In particular, on the more challenging EORSSD dataset, EGMamba-Net achieves 0.9389 S-measure, 0.8972 max F-measure, and 0.0066 MAE. Compared with the representative remote-sensing method DAF-Net, it improves S-measure and max F-measure by 0.0223 and 0.0358, respectively, indicating stronger capability in background suppression, structural preservation, and boundary recovery. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

34 pages, 2306 KB

Open AccessReview

A Review of Explainable Machine Learning in Medical Thermography for Interpretable Thermal Feature Analysis and Biomarker Discovery

by Muhammad Sohail, Hikmat Yar and Heung Soo Kim

Mathematics 2026, 14(10), 1666; https://doi.org/10.3390/math14101666 - 13 May 2026

Viewed by 326

Abstract

Medical thermography is a noninvasive, contactless imaging technique that captures spatial temperature distributions across the human body, providing insights into vascular function, inflammation, metabolism, physiological regulation, and aging. Recently, machine learning has been increasingly utilized to analyze thermographic data for disease screening, functional [...] Read more.

Medical thermography is a noninvasive, contactless imaging technique that captures spatial temperature distributions across the human body, providing insights into vascular function, inflammation, metabolism, physiological regulation, and aging. Recently, machine learning has been increasingly utilized to analyze thermographic data for disease screening, functional assessment, and biomarker identification. However, the existing literature is fragmented, with varied clinical applications, feature-engineering strategies, and predictive modeling frameworks, often lacking a focus on interpretability and the reliable identification of clinically relevant thermal markers. This review offers a structured overview of explainable machine learning in medical thermography, emphasizing thermal feature representation, model interpretability, and biomarker discovery. It categorizes thermographic features into pixel-based representations, region-wise statistical descriptors, texture measures, and deep latent features. Additionally, it evaluates conventional machine learning and deep learning methods for classification, regression, and risk assessment tasks. The review pays special attention to interpretable learning strategies, such as feature importance analysis, surrogate explanation models, saliency-based visualization, and Shapley-value-based methods, which can enhance transparency and confidence in model outputs. Key challenges are critically discussed, including imaging variability, limited dataset sizes, weak protocol standardization, class imbalance, generalizability, and the gap between predictive performance and clinical trust. Overall, this review synthesizes current advancements, identifies major research gaps, and outlines future directions for developing trustworthy machine learning frameworks in medical thermography and enhancing interpretable thermal biomarker discovery. Full article

(This article belongs to the Special Issue Advances in Machine Learning and Intelligent Systems)

► Show Figures

Figure 1

24 pages, 3506 KB

Open AccessArticle

RIF-Fuse: Invertible Frequency Decomposition with Residual Enhancement for Robust Multimodal Fusion

by Anke Yang, Bingqi Liu, Mingzhe Liu, Haihua Ding, Peijun Mo, Chengqiang Zhao, Xianghe Liu and Tao Ye

Remote Sens. 2026, 18(10), 1520; https://doi.org/10.3390/rs18101520 - 12 May 2026

Viewed by 320

Abstract

Infrared–visible image fusion (IVIF) seeks to combine the thermal saliency of infrared images with the rich textures of visible images in a single representation. This study proposes RIF-Fuse, a framework designed to enhance fusion stability and detail fidelity through a band-controllable structure–detail decoupling [...] Read more.

Infrared–visible image fusion (IVIF) seeks to combine the thermal saliency of infrared images with the rich textures of visible images in a single representation. This study proposes RIF-Fuse, a framework designed to enhance fusion stability and detail fidelity through a band-controllable structure–detail decoupling mechanism. We utilize a wavelet-based pipeline to explicitly separate low-frequency structural components from high-frequency textures. A Haar residual enhancement path is integrated into the high-frequency branch to provide low-loss compensation for weak textures, while a band-aware differential fusion strategy is designed to suppress structural conflicts and accentuate edges at the subband level. A two-stage training scheme is further applied to ensure optimization stability. Extensive experiments on the TNO and RoadScene datasets demonstrate that RIF-Fuse produces sharper details and more natural structures compared to state-of-the-art methods. The results indicate that RIF-Fuse achieves a superior balance across multiple objective metrics, offering a robust solution for high-fidelity multimodal image synthesis. Full article

► Show Figures

Figure 1

Search Results (375)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (375)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI