Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,018)

Search Parameters:
Keywords = Attention-UNet

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 2388 KB  
Article
MAF-GAN: A Multi-Attention Fusion Generative Adversarial Network for Remote Sensing Image Super-Resolution
by Zhaohe Wang, Hai Tan, Zhongwu Wang, Jinlong Ci and Haoran Zhai
Remote Sens. 2025, 17(24), 3959; https://doi.org/10.3390/rs17243959 (registering DOI) - 7 Dec 2025
Abstract
Existing Generative Adversarial Networks (GANs) frequently yield remote sensing images with blurred fine details, distorted textures, and compromised spatial structures when applied to super-resolution (SR) tasks, so this study proposes a Multi-Attention Fusion Generative Adversarial Network (MAF-GAN) to address these limitations: the generator [...] Read more.
Existing Generative Adversarial Networks (GANs) frequently yield remote sensing images with blurred fine details, distorted textures, and compromised spatial structures when applied to super-resolution (SR) tasks, so this study proposes a Multi-Attention Fusion Generative Adversarial Network (MAF-GAN) to address these limitations: the generator of MAF-GAN is built on a U-Net backbone, which incorporates Oriented Convolutions (OrientedConv) to enhance the extraction of directional features and textures, while a novel co-calibration mechanism—incorporating channel, spatial, gating, and spectral attention—is embedded in the encoding path and skip connections, supplemented by an adaptive weighting strategy to enable effective multi-scale feature fusion, and a composite loss function is further designed to integrate adversarial loss, perceptual loss, hybrid pixel loss, total variation loss, and feature consistency loss for optimizing model performance; extensive experiments on the GF7-SR4×-MSD dataset demonstrate that MAF-GAN achieves state-of-the-art performance, delivering a Peak Signal-to-Noise Ratio (PSNR) of 27.14 dB, Structural Similarity Index (SSIM) of 0.7206, Learned Perceptual Image Patch Similarity (LPIPS) of 0.1017, and Spectral Angle Mapper (SAM) of 1.0871, which significantly outperforms mainstream models including SRGAN, ESRGAN, SwinIR, HAT, and ESatSR as well as exceeds traditional interpolation methods (e.g., Bicubic) by a substantial margin, and notably, MAF-GAN maintains an excellent balance between reconstruction quality and inference efficiency to further reinforce its advantages over competing methods; additionally, ablation studies validate the individual contribution of each proposed component to the model’s overall performance, and this method generates super-resolution remote sensing images with more natural visual perception, clearer spatial structures, and superior spectral fidelity, thus offering a reliable technical solution for high-precision remote sensing applications. Full article
(This article belongs to the Section Environmental Remote Sensing)
26 pages, 2661 KB  
Article
Dual-Attention EfficientNet Hybrid U-Net for Segmentation of Rheumatoid Arthritis Hand X-Rays
by Madallah Alruwaili, Mahmood A. Mahmood and Murtada K. Elbashir
Diagnostics 2025, 15(24), 3105; https://doi.org/10.3390/diagnostics15243105 (registering DOI) - 6 Dec 2025
Abstract
Background: Accurate segmentation in radiographic imaging remains difficult due to heterogeneous contrast, acquisition artifacts, and fine-scale anatomical boundaries. Objective: This paper presents a Hybrid Attention U-Net, which paired an EfficientNet-B3 encoder with a decoder that is both lightweight, featuring CBAM and [...] Read more.
Background: Accurate segmentation in radiographic imaging remains difficult due to heterogeneous contrast, acquisition artifacts, and fine-scale anatomical boundaries. Objective: This paper presents a Hybrid Attention U-Net, which paired an EfficientNet-B3 encoder with a decoder that is both lightweight, featuring CBAM and SCSE modules, and complementary for channel-wise and spatial-wise recalibration of sharper boundary recovery. Methods: The preprocessing phase uses percentile windowing, N4 bias compensation, per-image normalization, and geometric standardization as well as sparse geometric augmentations to reduce domain shift and make the pipeline viable. Results: For hand X-ray segmentation, the model achieves results with Dice = 0.8426, IoU around 0.78, pixel accuracy = 0.9058, ROC-AUC = 0.9074, and PR-AUC = 0.8452, and converges quickly at the early stages and remains steady at late epochs. Controlled ablation shows that the main factor of overlap quality of EfficientNet-B3 and that smaller batches (bs = 16) are always better at gradient noise and implicit regularization than larger batches. The qualitative overlays are complementary to quantitative gains that reveal more distinct cortical profiles and lower background leakage. Conclusions: It is computationally moderate, end-to-end trainable, and can be easily extended to multi-class problems through a softmax head and class-balanced objectives, rendering it a powerful, deployable option for musculoskeletal radiograph segmentation as well as an effective baseline in future clinical translation analyses. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
27 pages, 12824 KB  
Article
Multiscale Attention-Enhanced Complex-Valued Graph U-Net for PolSAR Image Classification
by Wanying Song, Qian Liu, Kuncheng Pu, Yinyin Jiang and Yan Wu
Remote Sens. 2025, 17(24), 3943; https://doi.org/10.3390/rs17243943 - 5 Dec 2025
Abstract
The powerful graph convolutional network (GCN) for polarimetric synthetic aperture radar (PolSAR) image classification generally relies on real-valued features, ignoring the phase information and thus limiting the modeling of complex-valued (CV) polarization characteristics. To address this issue, this paper proposes a novel multiscale [...] Read more.
The powerful graph convolutional network (GCN) for polarimetric synthetic aperture radar (PolSAR) image classification generally relies on real-valued features, ignoring the phase information and thus limiting the modeling of complex-valued (CV) polarization characteristics. To address this issue, this paper proposes a novel multiscale attention-enhanced CV graph U-Net model, abbreviated as MAE-CV-GUNet, by embedding CV-GCN into a graph U-Net framework augmented with multiscale attention mechanisms. First, a CV-GCN is constructed based on the real-valued GCN, to effectively capture the intrinsic amplitude and phase information of the PolSAR data, along with the underlying correlations between them. This way can well lead to an improved feature representation for PolSAR images. Based on CV-GCN, a CV graph U-Net (CV-GUNet) architecture is constructed by integrating multiple CV-GCN components, aiming to extract multi-scale features and further enhance the ability to extract discriminative features in the complex domain. Then, a multiscale attention (MSA) mechanism is designed, enabling the proposed MAE-CV-GUNet to adaptively learn the importances of features at various scales, thereby dynamically fusing the multiscale information among them. The comparisons and ablation experiments on three PolSAR datasets show that MAE-CV-GUNet has excellent performance in PolSAR image classification. Full article
28 pages, 4643 KB  
Article
JM-Guided Sentinel 1/2 Fusion and Lightweight APM-UNet for High-Resolution Soybean Mapping
by Ruyi Wang, Jixian Zhang, Xiaoping Lu, Zhihe Fu, Guosheng Cai, Bing Liu and Junfeng Li
Remote Sens. 2025, 17(24), 3934; https://doi.org/10.3390/rs17243934 - 5 Dec 2025
Abstract
Accurate soybean mapping is critical for food–oil security and cropping assessment, yet spatiotemporal heterogeneity arising from fragmented parcels and phenological variability reduces class separability and robustness. This study aims to deliver a high-resolution, reusable pipeline and quantify the marginal benefits of feature selection [...] Read more.
Accurate soybean mapping is critical for food–oil security and cropping assessment, yet spatiotemporal heterogeneity arising from fragmented parcels and phenological variability reduces class separability and robustness. This study aims to deliver a high-resolution, reusable pipeline and quantify the marginal benefits of feature selection and architecture design. We built a full-season multi-temporal Sentinel-1/2 stack and derived candidate optical/SAR features (raw bands, vegetation indices, textures, and polarimetric terms). Jeffries–Matusita (JM) distance was used for feature–phase joint selection, producing four comparable feature sets. We propose a lightweight APM-UNet: an Attention Sandglass Layer (ASL) in the shallow path to enhance texture/boundary details, and a Parallel Vision Mamba layer (PVML with Mamba-SSM) in the middle/bottleneck to model long-range/global context with near-linear complexity. Under a unified preprocessing and training/evaluation protocol, the four feature sets were paired with U-Net, SegFormer, Vision-Mamba, and APM-UNet, yielding 16 controlled configurations. Results showed consistent gains from JM-guided selection across architectures; given the same features, APM-UNet systematically outperformed all baselines. The best setup (JM-selected composite features + APM-UNet) achieved PA 92.81%, OA 97.95, Kappa 0.9649, Recall 91.42%, IoU 0.7986, and F1 0.9324, improving PA and OA by ~7.5 and 6.2 percentage points over the corresponding full-feature counterpart. These findings demonstrate that JM-guided, phenology-aware features coupled with a lightweight local–global hybrid network effectively mitigate heterogeneity-induced uncertainty, improving boundary fidelity and overall consistency while maintaining efficiency, offering a potentially transferable framework for soybean mapping in complex agricultural landscapes. Full article
(This article belongs to the Special Issue Machine Learning of Remote Sensing Imagery for Land Cover Mapping)
Show Figures

Figure 1

24 pages, 3036 KB  
Article
MPG-SwinUMamba: High-Precision Segmentation and Automated Measurement of Eye Muscle Area in Live Sheep Based on Deep Learning
by Zhou Zhang, Yaojing Yue, Fuzhong Li, Leifeng Guo and Svitlana Pavlova
Animals 2025, 15(24), 3509; https://doi.org/10.3390/ani15243509 - 5 Dec 2025
Viewed by 26
Abstract
Accurate EMA assessment in live sheep is crucial for genetic breeding and production management within the meat sheep industry. However, the segmentation accuracy and reliability of existing automated methods are limited by challenges inherent to B-mode ultrasound images, such as low contrast and [...] Read more.
Accurate EMA assessment in live sheep is crucial for genetic breeding and production management within the meat sheep industry. However, the segmentation accuracy and reliability of existing automated methods are limited by challenges inherent to B-mode ultrasound images, such as low contrast and noise interference. To address these challenges, we present MPG-SwinUMamba, a novel deep learning-based segmentation network. This model uniquely combines the state-space model with a U-Net architecture. It also integrates an edge-enhancement multi-scale attention module (MSEE) and a pyramid attention refinement module (PARM) to improve the detection of indistinct boundaries and better capture global context. The global context aggregation decoder (GCAD) is employed to precisely reconstruct the segmentation mask, enabling automated measurement of the EMA. Compared to 12 other leading segmentation models, MPG-SwinUMamba achieved superior performance, with an intersection-over-union of 91.62% and a Dice similarity coefficient of 95.54%. Additionally, automated measurements show excellent agreement with expert manual assessments (correlation coefficient r = 0.9637), with a mean absolute percentage error of only 4.05%. This method offers non-invasive and efficient and objective evaluation of carcass performance in live sheep, with the potential to reduce measurement costs and enhance breeding efficiency. Full article
(This article belongs to the Section Animal System and Management)
Show Figures

Figure 1

25 pages, 7527 KB  
Article
A Multifocal RSSeg Approach for Skeletal Age Estimation in an Indian Medicolegal Perspective
by Priyanka Manchegowda, Manohar Nageshmurthy, Suresha Raju and Dayananda Rudrappa
Algorithms 2025, 18(12), 765; https://doi.org/10.3390/a18120765 - 4 Dec 2025
Viewed by 151
Abstract
Estimating bone age is essential for accurate diagnoses, appropriate care based on biological age, and fairness in legal matters. In the Indian medicolegal context, determining age through a clinical approach involves analyzing multiple joints; however, the traditional method can be tedious and subjective, [...] Read more.
Estimating bone age is essential for accurate diagnoses, appropriate care based on biological age, and fairness in legal matters. In the Indian medicolegal context, determining age through a clinical approach involves analyzing multiple joints; however, the traditional method can be tedious and subjective, relying heavily on human expertise, which may lead to biased decisions in age-related legal disputes. Moreover, commonly used radiographs often exhibit pixel-level variations due to heterogeneous contrast, which complicate segmentation tasks and lead to inconsistencies and reduced model performance. The study presents a multifocal region-based symbolic segmentation technique to automatically retain the soft-tissue region that harbors a growth pattern of an ossification center. Experimental results demonstrate an 84.5% Jaccard similarity, an 81.4% Dice coefficient, an 88.3% precision, a 90.0% recall, and a 91.5% pixel accuracy on a novel multifocal dataset of Indian inhabitants. The proposed segmentation technique outperforms U-Net, Attention U-Net, TransU-Net, DeepLabV3+, Adaptive Otsu, and Watershed segmentation in terms of accuracy, indicating strong generalizability across joints and improving reliability. Compared with 86.4% without segmentation, the proposed integration of segmentation with VGG16 classification increases the overall accuracy to 93.8%, demonstrating that target-focused-region processing reduces unnecessary computations and improves feature discrimination without sacrificing accuracy. Full article
(This article belongs to the Special Issue Machine Learning in Medical Signal and Image Processing (4th Edition))
Show Figures

Figure 1

15 pages, 7833 KB  
Article
A Physics-Constrained Method for the Precise Spatiotemporal Prediction of Rock-Damage Evolution
by Shaohong Yan, Zikun Tian, Yanbo Zhang, Xulong Yao, Zhigang Tao and Shuai Wang
Appl. Sci. 2025, 15(23), 12801; https://doi.org/10.3390/app152312801 - 3 Dec 2025
Viewed by 171
Abstract
Accurately predicting the spatiotemporal evolution of rock-damage zones is vital for underground engineering safety. Using three-dimensional data obtained from uniaxial compression–acoustic emission tests, this study addresses the key limitations of existing data-driven methods, which struggle with spatial heterogeneity and often yield predictions that [...] Read more.
Accurately predicting the spatiotemporal evolution of rock-damage zones is vital for underground engineering safety. Using three-dimensional data obtained from uniaxial compression–acoustic emission tests, this study addresses the key limitations of existing data-driven methods, which struggle with spatial heterogeneity and often yield predictions that deviate from fundamental fracture-mechanics principles. To overcome these challenges, we propose a physics-constrained spatiotemporal STConvLSTM framework that integrates a density-adaptive point cloud–voxel conversion mechanism for improved 3D representation, a composite loss incorporating structural and physics-based constraints, and a multi-level encoder–processor–decoder architecture enhanced by 3D convolutions, attention modules, and residual connections. Experimental results demonstrate superior accuracy and physical consistency, achieving 92.6% accuracy and an F1-score of 0.947, outperforming ConvLSTM and UNet3D baselines. The physics-aware constraints effectively suppress non-physical divergence and yield damage morphologies that better align with expected fracture-mechanics behavior. These findings show that coupling data-driven learning with physics-based regularization substantially enhances model reliability and interpretability. Overall, the proposed framework offers a robust and practical paradigm for 3D damage-evolution modeling, supporting more-dependable early-warning, stability assessment, and intelligent support-design applications in underground engineering. Full article
(This article belongs to the Special Issue Progress and Challenges of Rock Engineering)
Show Figures

Figure 1

21 pages, 4950 KB  
Article
Enhanced UAV-Dot for UAV Crowd Localization: Adaptive Gaussian Heat Map and Attention Mechanism to Address Scale/Low-Light Challenges
by Min Zhang, Fei Zhao and Yan Zhang
Drones 2025, 9(12), 833; https://doi.org/10.3390/drones9120833 - 1 Dec 2025
Viewed by 94
Abstract
In public safety scenarios, such as large-scale event security and urban crowd management, unmanned aerial vehicles (UAVs) serve as a vital tool for crowd localization, offering high mobility and broad coverage. However, UAV-based overhead localization faces challenges, including significant target scale variations due [...] Read more.
In public safety scenarios, such as large-scale event security and urban crowd management, unmanned aerial vehicles (UAVs) serve as a vital tool for crowd localization, offering high mobility and broad coverage. However, UAV-based overhead localization faces challenges, including significant target scale variations due to altitude changes and poor feature visibility in low-light conditions. To overcome these issues, this study enhances the UAV-Dot framework by introducing a scale prediction branch for adaptive Gaussian heatmap adjustment, embedding a CBAM attention module in the U-Net encoder to strengthen feature extraction in dim environments and optimizing post-processing via dynamic thresholding and DBSCAN clustering. Experiments on the DroneCrowd dataset show that the improved model increases parameters by only 0.36% during training and 0.29% during testing yet achieves 53.38% L-mAP—outperforming the original UAV-Dot by 2.38% and STNNet by 12.93%. The model also delivers consistent gains of approximately 2% in L-AP@10, L-AP@15, and L-AP@20. Full article
Show Figures

Figure 1

28 pages, 21313 KB  
Article
Deep Learning-Based Gravity Inversion Integrating Physical Equations and Multiple Constraints
by Wenxuan Shi, Jiapei Wang, Chongyang Shen, Shuai Zhang, Minghui Zhang, Hongbo Tan and Guangliang Yang
Appl. Sci. 2025, 15(23), 12717; https://doi.org/10.3390/app152312717 - 1 Dec 2025
Viewed by 121
Abstract
Three-dimensional gravity inversion technology involves inferring the underground density structure based on observed gravity anomaly data. In addition to gravity inversion based on physics-driven methods, deep learning, as a purely data-driven technique, is increasingly gaining attention in geophysical inversion problems. However, purely data-driven [...] Read more.
Three-dimensional gravity inversion technology involves inferring the underground density structure based on observed gravity anomaly data. In addition to gravity inversion based on physics-driven methods, deep learning, as a purely data-driven technique, is increasingly gaining attention in geophysical inversion problems. However, purely data-driven methods rely on the implicit relationships within the data during the inversion process, which results in a lack of clear physical significance. This study proposes a three-dimensional gravity inversion method that integrates physical equations with deep learning. Based on the U-Net architecture, the gravity forward equation is incorporated as a physical constraint term, and a composite loss function—comprising three-dimensional mean squared error, a depth-weighting function, and three-dimensional intersection-over-union loss—is constructed to enhance inversion accuracy. Numerical experiments indicate that this method outperforms traditional algorithms in terms of density recovery accuracy and boundary clarity. When applied to gravity anomaly data from the Tangshan earthquake region in China, this method successfully inverted the three-dimensional subsurface density structure, revealing a high-density anomaly beneath the seismic source area, which provides important evidence for understanding the regional earthquake generation mechanism. Full article
Show Figures

Figure 1

24 pages, 12853 KB  
Article
Photovoltaic Power Station Identification Based on High-Resolution Network and Google Earth Engine: A Case Study of Qinghai Province, Northwest China
by Hongling Chen, Li Zhang, Yang Yu, Chuandong Wu, Ting Hua and Chunlian Gao
Remote Sens. 2025, 17(23), 3896; https://doi.org/10.3390/rs17233896 - 30 Nov 2025
Viewed by 197
Abstract
The precise identification of photovoltaic power stations is essential for advancing the assessment of energy infrastructure and for the efficient management of land resources. To address the need for spatially explicit data on photovoltaic (PV) development in arid and semi-arid regions amid green [...] Read more.
The precise identification of photovoltaic power stations is essential for advancing the assessment of energy infrastructure and for the efficient management of land resources. To address the need for spatially explicit data on photovoltaic (PV) development in arid and semi-arid regions amid green energy transitions, particularly in the context of identification challenges induced by the widespread distribution of bare ground, this study optimized a remote sensing-based identification method integrating Principal Component Analysis (PCA), automated sampling via Google Earth Engine (GEE), and deep learning models, and applied it to Qinghai Province, one of China’s largest PV regions. The results showed that HRNetv2 (validation Dice = 0.9463) outperformed UNet (0.9328), Attention UNet (0.9399), and HRNet + OCR (0.9184) in small-sample (1871 training samples) PV segmentation; the PV installed area during 2020–2024 accounted for 63.5% of the total pre-2024 area (~607 km2), exceeding the cumulative area before 2019, with projects predominantly distributed in areas with elevation less than 2500 m and slope less than 2°; bare land dominated PV land use (88.7%), followed by grassland (6.9%) and shrubland (3.9%), and PV construction contributed to desert greening by modifying microclimates. The study concludes that its optimized method effectively supports PV spatial identification, and the revealed PV distribution and land use patterns provide scientific guidance for synergistic PV development and ecological conservation in arid regions, while acknowledging limitations in generalizability to other regions due to Qinghai-specific data, suggesting future algorithm refinement and expanded research scales. Full article
(This article belongs to the Section Ecological Remote Sensing)
Show Figures

Figure 1

13 pages, 2180 KB  
Article
Radiologist-Validated Automatic Lumbar T1-Weighted Spinal MRI Segmentation Tool via an Attention U-Net Algorithm
by Aryan Kalluvila, Ethan Wang, Michael C Hurley, Colbey Freeman and Jason M. Johnson
Diagnostics 2025, 15(23), 3046; https://doi.org/10.3390/diagnostics15233046 - 28 Nov 2025
Viewed by 277
Abstract
Background/Objectives: Spinal MRI segmentation has become increasingly important with the prevalence of disc herniation and vertebral injuries. Artificial intelligence can help orthopedic surgeons and radiologists automate the process of segmentation. Currently, there are few tools for T1-weighted spinal MRI segmentation, with most focusing [...] Read more.
Background/Objectives: Spinal MRI segmentation has become increasingly important with the prevalence of disc herniation and vertebral injuries. Artificial intelligence can help orthopedic surgeons and radiologists automate the process of segmentation. Currently, there are few tools for T1-weighted spinal MRI segmentation, with most focusing on T2-weighted imaging. This paper focuses on creating an automatic lumbar spinal MRI segmentation tool for T1-weighted images using deep learning. Methods: An Attention U-Net was employed as the main algorithm because the architecture has shown success in other segmentation applications. Segmentation loss functions were compared, focusing on the difference between BCE and MSE loss. Two board-certified radiologists scored the output of the Attention U-Net versus four other algorithms to assess clinical relevance and segmentation accuracy. Results: The Attention U-Net achieved superior results, with SSIM and DICE coefficients of 0.998 and 0.93, outperforming other architectures. Both radiologists agreed that the Attention U-Net segmented lumbar spinal images with the highest accuracy on the Likert Scale (3.7 ± 0.82). Cohen’s Kappa coefficient was measured at 0.31, indicating a fair level of agreement. MSE loss outperformed BCE with respect to both SSIM and DICE, serving as the loss function of choice. Conclusions: Qualitative observations showed that the Attention U-Net and U-Net++ were the top performing networks. However, the Attention U-Net minimized external noise and focused on internal spinal preservation, demonstrating strong segmentation performance for T1-weighted lumbar spinal MRI. Full article
(This article belongs to the Special Issue Recent Advances in Bone and Joint Imaging—3rd Edition)
Show Figures

Figure 1

64 pages, 45605 KB  
Article
SegClarity: An Attribution-Based XAI Workflow for Evaluating Historical Document Layout Models
by Iheb Brini, Najoua Rahal, Maroua Mehri, Rolf Ingold and Najoua Essoukri Ben Amara
J. Imaging 2025, 11(12), 424; https://doi.org/10.3390/jimaging11120424 - 28 Nov 2025
Viewed by 153
Abstract
In recent years, deep learning networks have demonstrated remarkable progress in the semantic segmentation of historical documents. Nonetheless, their limited explainability remains a critical concern, as these models frequently operate as black boxes, thereby constraining confidence in the trustworthiness of their outputs. To [...] Read more.
In recent years, deep learning networks have demonstrated remarkable progress in the semantic segmentation of historical documents. Nonetheless, their limited explainability remains a critical concern, as these models frequently operate as black boxes, thereby constraining confidence in the trustworthiness of their outputs. To enhance transparency and reliability in their deployment, increasing attention has been directed toward explainable artificial intelligence (XAI) techniques. These techniques typically produce fine-grained attribution maps in the form of heatmaps, illustrating feature contributions from different blocks and layers within a deep neural network (DNN). However, such maps often closely resemble the segmentation outputs themselves, and there is currently no consensus regarding appropriate explainability metrics for semantic segmentation. To overcome these challenges, we present SegClarity, a novel workflow designed to integrate explainability into the analysis of historical documents. The workflow combines visual and quantitative evaluations specifically tailored to segmentation-based applications. Furthermore, we introduce the Attribution Concordance Score (ACS), a new explainability metric that provides quantitative insights into the consistency and reliability of attribution maps. To evaluate the effectiveness of our approach, we conducted extensive qualitative and quantitative experiments using two datasets of historical document images, two U-Net model variants, and four attribution-based XAI methods. A qualitative assessment involved four XAI methods across multiple U-Net layers, including comparisons at the input level with state-of-the-art perturbation methods RISE and MiSuRe. Quantitatively, five XAI evaluation metrics were employed to benchmark these approaches comprehensively. Beyond historical document analysis, we further validated the workflow’s generalization by demonstrating its transferability to the Cityscapes dataset, a challenging benchmark for urban scene segmentation. The results demonstrate that the proposed workflow substantially improves the interpretability and reliability of deep learning models applied to the semantic segmentation of historical documents. To enhance reproducibility, we have released SegClarity’s source code along with interactive examples of the proposed workflow. Full article
(This article belongs to the Special Issue Explainable AI in Computer Vision)
Show Figures

Figure 1

29 pages, 13462 KB  
Article
Enhancing Polar Sea Ice Estimation: Deep SARU-Net for Spatiotemporal Super-Resolution Approach
by Jianxin He, Shuo Yang, Haoyu Wang, Wanshou Liu and Xiong Deng
Remote Sens. 2025, 17(23), 3839; https://doi.org/10.3390/rs17233839 - 27 Nov 2025
Viewed by 118
Abstract
Fine-scale detailed estimation of sea ice concentration (SIC) is pivotal for maritime safety, scientific exploration, and environmental surveillance. However, current datasets frequently present challenges due to their limited resolution, thereby hindering fine-scale analysis of sea ice conditions. This paper introduces a novel Deep [...] Read more.
Fine-scale detailed estimation of sea ice concentration (SIC) is pivotal for maritime safety, scientific exploration, and environmental surveillance. However, current datasets frequently present challenges due to their limited resolution, thereby hindering fine-scale analysis of sea ice conditions. This paper introduces a novel Deep Self-Attention Residual U-Net (Deep SARU-Net) architecture to address the limitations inherent in existing super-resolution estimation techniques. By harnessing distinctive multi-stage self-attention mechanisms, orthogonal rectangular convolutional kernels, and residual modules, this architecture significantly augments both the precision and generalizability of SIC super-resolution estimation tasks. Experimental results demonstrate that in the vicinity of the Chukchi Sea, the Deep SARU-Net method exhibits superior performance in terms of both RMSE and SSIM values compared to other models, showcasing its efficacy. Furthermore, generalization analyses across diverse sea regions confirm the model’s universality. Full article
Show Figures

Figure 1

27 pages, 5548 KB  
Article
Efficient and Accurate Pneumonia Detection Using a Novel Multi-Scale Transformer Approach
by Alireza Saber, Amirreza Fateh, Pouria Parhami, Alimohammad Siahkarzadeh, Mansoor Fateh and Saideh Ferdowsi
Sensors 2025, 25(23), 7233; https://doi.org/10.3390/s25237233 - 27 Nov 2025
Viewed by 250
Abstract
Pneumonia, a prevalent respiratory infection, remains a leading cause of morbidity and mortality worldwide, particularly among vulnerable populations. Chest X-rays serve as a primary tool for pneumonia detection; however, variations in imaging conditions and subtle visual indicators complicate consistent interpretation. Automated tools can [...] Read more.
Pneumonia, a prevalent respiratory infection, remains a leading cause of morbidity and mortality worldwide, particularly among vulnerable populations. Chest X-rays serve as a primary tool for pneumonia detection; however, variations in imaging conditions and subtle visual indicators complicate consistent interpretation. Automated tools can enhance traditional methods by improving diagnostic reliability and supporting clinical decision-making. In this study, we propose a novel multi-scale transformer approach for pneumonia detection that integrates lung segmentation and classification into a unified framework. Our method introduces a lightweight transformer-enhanced TransUNet for precise lung segmentation, achieving a Dice score of 95.68% on the “Chest X-ray Masks and Labels” dataset with fewer parameters than traditional transformers. For classification, we employ pre-trained ResNet models (ResNet-50 and ResNet-101) to extract multi-scale feature maps, which are then processed through a convolutional Residual Attention Module and a modified transformer module to enhance pneumonia detection. This integration of multi-scale feature extraction and lightweight attention mechanisms ensures robust performance, making our method suitable for resource-constrained clinical environments. Our approach achieves 93.75% accuracy on the “Kermany” dataset and 96.04% accuracy on the “Cohen” dataset, outperforming existing methods while maintaining computational efficiency. Full article
(This article belongs to the Special Issue Biomedical Imaging, Sensing and Signal Processing)
Show Figures

Figure 1

18 pages, 12668 KB  
Article
Water-Body Detection from SAR Images Using Connectivity Refinement Network
by Zile Gao, Jinkai Sun, Puyan Xu, Lin Wu, Yabo Huang, Ning Li, Zhuang Zhu and Qianchao Pu
Earth 2025, 6(4), 148; https://doi.org/10.3390/earth6040148 - 27 Nov 2025
Viewed by 117
Abstract
Synthetic aperture radar (SAR) is an active microwave imaging system equipped with penetration capability, enabling all-time and all-weather Earth observation, and demonstrates significant advantages in large-scale surface water-body detection. Although SAR images can provide relatively clear water-body details, they are susceptible to interference [...] Read more.
Synthetic aperture radar (SAR) is an active microwave imaging system equipped with penetration capability, enabling all-time and all-weather Earth observation, and demonstrates significant advantages in large-scale surface water-body detection. Although SAR images can provide relatively clear water-body details, they are susceptible to interference from external factors such as complex terrain and background noise, resulting in fragmented detection outcomes and poor connectivity. Therefore, a Connectivity Refinement Network (ConRNet) is proposed in this study to address the issue of fragmented water-body regions in water-body detection results, combining HISEA-1 and Chaohu-1 SAR data. ConRNet is equipped with attention mechanisms and a connectivity prediction module, combined with dual supervision from segmentation and connectivity labels. Unlike conventional attention modules that only emphasize pixel-wise saliency, the proposed Dual Self-Attention Module (DSAM) jointly captures spatial and channel dependencies. Meanwhile, the Connectivity Prediction Module (CPM) reformulates water-body connectivity as a regression problem to directly optimize structural coherence without relying on post-processing. Leveraging dual supervision from segmentation and connectivity labels, ConRNet achieves simultaneous improvements in topological consistency and pixel-level accuracy. The performance of the proposed ConRNet is evaluated by con-ducting comparative experiments with five deep learning models: FCN, U-Net, DeepLabv3+, HRNet, and MAGNet. The experimental results demonstrate that the ConRNet achieves the highest accuracy in water-body detection, with an intersection over union (IoU) of 88.59% and an F1-score of 93.87%. Full article
Show Figures

Figure 1

Back to TopTop