Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (4,958)

Search Parameters:
Keywords = U-net

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 6898 KB  
Article
MDE-UNet: A Physically Guided Asymmetric Fusion Network for Multi-Source Meteorological Data Lightning Identification
by Yihua Chen, Yuanpeng Han, Yujian Zhang, Yi Liu, Lin Song, Jialei Wang, Xinjue Wang and Qilin Zhang
Remote Sens. 2026, 18(7), 1027; https://doi.org/10.3390/rs18071027 (registering DOI) - 29 Mar 2026
Abstract
Utilizing multi-source meteorological data for lightning identification is crucial for monitoring severe convective weather. However, several key challenges persist in this field: dimensional imbalance and modal competition among multi-source heterogeneous data, model training bias caused by the extreme sparsity of lightning samples, and [...] Read more.
Utilizing multi-source meteorological data for lightning identification is crucial for monitoring severe convective weather. However, several key challenges persist in this field: dimensional imbalance and modal competition among multi-source heterogeneous data, model training bias caused by the extreme sparsity of lightning samples, and an imbalance between false alarms and missed detections resulting from complex background noise. To address these challenges, this paper proposes a lightning identification network guided by physical priors and constrained by supervision. First, to tackle the issue of modal competition in fusing satellite (high-dimensional) and radar (low-dimensional) data, a physical prior-guided asymmetric radar information enhancement mechanism is introduced. This mechanism uses radar physical features as contextual guidance to selectively enhance the latent weak radar signatures. Second, at the architectural level, a multi-source multi-scale feature fusion module and a weighted sliding window–multilayer perceptron (MLP) enhanced decoding unit are constructed. The former achieves the coupling of multi-scale physical features at a 2 km grid scale through cross-level semantic alignment, building a highly consistent feature field that effectively improves the model’s ability to detect lightning signals. The latter leverages adaptive receptive fields and the nonlinear modeling capability of MLPs to effectively smooth spatially discrete noise, ensuring spatial continuity in the reconstructed results. Finally, to address the model bias caused by severe class imbalance between positive and negative samples—resulting from the extreme sparsity of lightning events—an asymmetrically weighted BCE-DICE loss function is designed. Its “asymmetric” characteristic is implemented by assigning different penalty weights to false-positive and false-negative predictions. This loss function balances pixel-level accuracy and inter-class equilibrium while imposing high-weight penalties on false-positive predictions, achieving synergistic optimization of feature enhancement and directional suppression. Experimental results show that the proposed method effectively increases the hit rate while substantially reducing the false alarm rate, enabling efficient utilization of multi-source data and high-precision identification of lightning strike areas. Full article
15 pages, 1771 KB  
Article
Deep Learning-Based Generation of Retinal Nerve Fibre Layer Thickness Maps from Fundus Photographs: A Comparative Analysis of U-Net Architectures for Accessible Glaucoma Assessment
by Kyoung Ohn, Harin Jun, Yong-Sik Kim and Woong-Joo Whang
Life 2026, 16(4), 559; https://doi.org/10.3390/life16040559 (registering DOI) - 29 Mar 2026
Abstract
Introduction: Optical coherence tomography (OCT) is the gold standard for retinal nerve fibre layer (RNFL) assessment; its high cost and limited accessibility hinder widespread use. This study aims to develop deep learning models that generate RNFL thickness maps from fundus images, providing a [...] Read more.
Introduction: Optical coherence tomography (OCT) is the gold standard for retinal nerve fibre layer (RNFL) assessment; its high cost and limited accessibility hinder widespread use. This study aims to develop deep learning models that generate RNFL thickness maps from fundus images, providing a cost-effective alternative to OCT. Methods: A dataset of 5000 fundus-OCT image pairs from 5000 unique glaucoma patients was used to train and compare the following four U-Net-based deep learning models: ResU-Net, R2U-Net, Nested U-Net, and Dense U-Net. All models were trained for up to 1000 epochs with early stopping (patience = 50 epochs). Performance was evaluated using Mean Squared Error (MSE), Mean Absolute Error (MAE), Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Fréchet Inception Distance (FID). Results: ResU-Net demonstrated the best performance, achieving MSE = 0.00061, MAE = 0.01877, SSIM = 0.9163, PSNR = 32.19 dB, and FID = 30.08. These results represent a 108% improvement in SSIM and a 67% improvement in PSNR compared to previously published benchmark for this task. Conclusions: This study demonstrates that deep learning models, particularly ResU-Net, can generate high-fidelity RNFL thickness maps from fundus photographs, substantially outperforming prior published benchmarks. This approach represents a potential contribution toward accessible glaucoma assessment, contingent upon prospective clinical validation and regulatory evaluation. Full article
(This article belongs to the Special Issue Vision Science and Optometry: 2nd Edition)
Show Figures

Figure 1

25 pages, 4776 KB  
Article
FireMambaNet: A Multi-Scale Mamba Network for Tiny Fire Segmentation in Satellite Imagery
by Bo Song, Bo Li, Hong Huang, Zhiyong Zhang, Zhili Chen, Tao Yue and Yun Chen
Remote Sens. 2026, 18(7), 1021; https://doi.org/10.3390/rs18071021 (registering DOI) - 29 Mar 2026
Abstract
Satellite remote sensing plays an essential role in wildfire monitoring due to its large-scale observation capability. However, fire targets in satellite imagery are typically extremely small, sparsely distributed, and embedded in complex backgrounds, making accurate segmentation highly challenging for existing methods. To address [...] Read more.
Satellite remote sensing plays an essential role in wildfire monitoring due to its large-scale observation capability. However, fire targets in satellite imagery are typically extremely small, sparsely distributed, and embedded in complex backgrounds, making accurate segmentation highly challenging for existing methods. To address these challenges, this paper proposes a multi-scale Mamba-based network for tiny fire segmentation, named FireMambaNet. The network adopts a nested U-shaped encoder-decoder architecture, primarily consisting of three modules: the Cross-layer Gated Residual U-shaped module (CG-RSU), the Fire-aware Directional Context Modulation module (FDCM), and the Multi-scale Mamba Attention Module (M2AM). The CG-RSU, as the core building block, adaptively suppresses background redundancy and enhances weak fire responses by extracting multi-scale features through cross-layer gating. The FDCM explicitly enhances the network’s ability to perceive anisotropic expansion features of fire points, such as those along the wind direction and terrain orientation, by modeling multi-directional context. The M2AM model employs a Mamba state-space model to suppress background interference through global context modeling during cross-scale feature fusion, while enhancing consistency among sparsely distributed tiny fire targets. In addition, experimental validation is conducted using two subsets from the Active Fire dataset, which have significant pixel-level sparse features: Oceania and Asia4. The results show that the proposed method significantly outperforms various mainstream CNN, Transformer, and Mamba baseline models on both datasets. It achieves an IoU of 88.51% and F1 score of 93.76% on the Oceania dataset, and an IoU of 85.65% and F1 score of 92.26% on the Asia4 dataset. Compared to the best-performing CNN baseline model, the IoU is improved by 1.81% and 2.07%, respectively. Overall, the FireMambaNet demonstrates significant advantages in detecting tiny fire points in complex backgrounds. Full article
Show Figures

Figure 1

16 pages, 13705 KB  
Article
PRefiner: Enhancing Overlapped Cervical Cell Segmentation Through Progressive Refinement
by Linlin Zhu, Jiaxun Li and Jiaxi Liu
Electronics 2026, 15(7), 1418; https://doi.org/10.3390/electronics15071418 (registering DOI) - 28 Mar 2026
Abstract
Cervical cancer is one of the most prevalent and easily contracted diseases among women, significantly impacting their daily lives. Computer vision-based cervical cell morphology diagnosis technology can offer robust support for cervical cell analysis at a lower cost. However, the presence of a [...] Read more.
Cervical cancer is one of the most prevalent and easily contracted diseases among women, significantly impacting their daily lives. Computer vision-based cervical cell morphology diagnosis technology can offer robust support for cervical cell analysis at a lower cost. However, the presence of a substantial number of overlapping cells in cervical images renders existing cell segmentation methods less accurate, thereby complicating the guidance of medical diagnosis. In this paper, we introduce a tristage Progressive Refinement method (PRefiner) for overlapping cell segmentation that decouples the traditional end-to-end pipeline, with the final stage specifically correcting anomalous results to enhance precision. We achieve separable overlapping cervical cell segmentation results through a cell nucleus locator, a single-cell segmenter, and a Segmentation Result Mask Refiner. Specifically, we employ a hybrid U-Net as the primary network for the cell nucleus locator and single-cell segmenter, which determines the position of the cell nucleus and procures the initial coarse segmentation result. In the mask refiner, we incorporate a conditional generation framework to address the perception decision problem and design a local–global dual-scale discriminator to ensure that the segmentation result aligns with the prior of a single-cell mask. Experimental results on CCEDD and ISBI2015 demonstrate that PRefiner achieves optimal performance by effectively resolving abnormal segmentations. Notably, our method improves the Dice coefficient of abnormal results from five different models by an average of 2.62% (ranging from 1.0% to 5.1%). Full article
(This article belongs to the Special Issue AI-Driven Image Processing: Theory, Methods, and Applications)
Show Figures

Figure 1

50 pages, 10525 KB  
Article
Passable Area Evaluation of Tractor Road Based on Improved YOLOv5s and Multi-Factor Fusion
by Qian Zhang, Wenjie Xu, Wenfei Wu, Lizhang Xu, Zhenghui Zhao and Shaowei Liang
Agriculture 2026, 16(7), 752; https://doi.org/10.3390/agriculture16070752 (registering DOI) - 28 Mar 2026
Abstract
The tractor road, as the core scene for autonomous driving of grain transport vehicles, is unstructured, complex, and obstacle-rich, leading to poor real-time performance and accuracy of joint road and obstacle detection with existing YOLOv5s. Furthermore, the reliability of passable area evaluation is [...] Read more.
The tractor road, as the core scene for autonomous driving of grain transport vehicles, is unstructured, complex, and obstacle-rich, leading to poor real-time performance and accuracy of joint road and obstacle detection with existing YOLOv5s. Furthermore, the reliability of passable area evaluation is low solely based on environmental factors. Therefore, YOLOv5s-C2S is proposed, fusing multi-scale features, attention mechanism, and dynamic features for joint detection. Firstly, YOLOv5s-CC is proposed for road detection by fusing context and spatial details and introducing Criss-Cross attention. Secondly, YOLOv5s-SGA is proposed for obstacle detection by grouped and spatial convolution, parameter-free attention, and adaptive feature fusion. By reusing YOLOv5s-CC weights, YOLOv5s-C2S shares low-level features and decouples high-level specificity. Based on the tractor road and obstacle information, combined with vehicle factors, a weighted scoring–based comprehensive method for passable area evaluation is proposed. Finally, the method was verified through experiments with an intelligent tracked grain transport vehicle using self-constructed datasets, including VOC_Road (11,927 images) and VOC_Obstacle (21,779 images). Compared with existing YOLOv5s, Deeplabv3+, FCN, Unet and SegNet, the mAP50 of road detection by YOLOv5s-CC increased by over 1.2%. Compared with existing YOLOv5s, R-CNN, YOLOv7, SSD and YOLOv8n, the mAP50 of obstacle detection by YOLOv5s-SGA increased by over 2%. Compared with YOLOv5s-SD, the mAP50 of joint detection by YOLOv5s-C2S increased by 9.3%, and the frame rate increased by 7.0 FPS. The proposed passable area evaluation method exhibits strong robustness and reliability in complex environments, meeting the accuracy and real-time requirements in autonomous driving of grain transport vehicles. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
21 pages, 922 KB  
Article
DBCF-Net: A Dual-Branch Cross-Scale Fusion Network for Heterogeneous Satellite–UAV Change Detection
by Yan Ren, Ruiyong Li, Pengbo Zhai and Xinyu Chen
Remote Sens. 2026, 18(7), 1009; https://doi.org/10.3390/rs18071009 - 27 Mar 2026
Abstract
Heterogeneous change detection (HCD) using satellite and Unmanned Aerial Vehicle (UAV) imagery is a pivotal task in remote sensing and Earth observation. However, the effective utilization of such multi-source data is significantly hindered by extreme spatial resolution disparities and distinct radiometric characteristics. Existing [...] Read more.
Heterogeneous change detection (HCD) using satellite and Unmanned Aerial Vehicle (UAV) imagery is a pivotal task in remote sensing and Earth observation. However, the effective utilization of such multi-source data is significantly hindered by extreme spatial resolution disparities and distinct radiometric characteristics. Existing deep learning methods, often based on weight-sharing Siamese architectures, struggle to bridge these domain gaps, leading to spectral pseudo-changes and blurred detection boundaries. To address these challenges, we propose a novel Dual-Branch Cross-Scale Fusion Network (DBCF-Net) specifically tailored for heterogeneous satellite–UAV change detection. We introduce a Difference-Aware Attention Module (DAAM) to explicitly align cross-modal feature spaces and suppress domain-related noise through a hybrid local–global attention mechanism. Furthermore, an Adaptive Gated Fusion Module (AGFM) is designed to dynamically weight multi-scale interactions, ensuring the preservation of high-frequency spatial details from UAV imagery while maintaining the semantic consistency of satellite data. Extensive experiments on the Heterogeneous Satellite–UAV Dataset (HSUD) demonstrate that DBCF-Net achieves state-of-the-art performance, reaching an F1-score of 88.75% and an IoU of 80.58%. This study provides a robust technical framework for heterogeneous sensor fusion and high-precision monitoring in complex remote sensing scenarios. Full article
(This article belongs to the Section Remote Sensing Image Processing)
24 pages, 3376 KB  
Article
EMDiC: Physics-Informed Conditional Diffusion Denoising for Frequency-Domain Electromagnetic Signals
by Zhenlin Du, Miaomiao Gao, Zhijie Qu and Xiaojuan Zhang
Appl. Sci. 2026, 16(7), 3249; https://doi.org/10.3390/app16073249 - 27 Mar 2026
Abstract
Frequency-domain electromagnetic (FDEM) measurements for shallow subsurface exploration are frequently corrupted by noise, which masks weak secondary-field responses and degrades interpretation. We propose an electromagnetic diffusion CNN (EMDiC) for 1D multi-frequency FDEM denoising, where denoising is formulated as conditional diffusion-based generation. EMDiC combines [...] Read more.
Frequency-domain electromagnetic (FDEM) measurements for shallow subsurface exploration are frequently corrupted by noise, which masks weak secondary-field responses and degrades interpretation. We propose an electromagnetic diffusion CNN (EMDiC) for 1D multi-frequency FDEM denoising, where denoising is formulated as conditional diffusion-based generation. EMDiC combines an analytic frequency–spatial encoder, a Feature-wise Linear Modulation (FiLM)-conditioned convolutional hourglass backbone, and a physics-informed composite loss built on velocity loss to improve waveform reconstruction under severe noise. A reproducible synthetic dataset is constructed through layered-earth forward modeling with concentric Transmitter–Receiver (TX–RX) geometry, multiple target categories, and mixed noise waveforms. On synthetic benchmarks covering multiple noise levels and material types, EMDiC achieves the best overall performance in Root Mean Square Error (RMSE), Signal-to-Noise Ratio (SNR), and Normalized cross-correlation (NCC) among 1D U-Net, diffusion-based variants, and representative neural baselines, with the clearest gains under medium-to-strong noise and for targets with pronounced induction responses. Ablation experiments verify the complementary contributions of electromagnetic positional encoding (EMPE), FiLM conditioning, and the composite loss. Field data validation with a self-developed GEM-3 system further shows that EMDiC improves cross-frequency coherence and suppresses oscillations while preserving the main response characteristics. Full article
Show Figures

Figure 1

30 pages, 3658 KB  
Article
TB-DLossNet: Fine-Grained Segmentation of Tea Leaf Diseases Based on Semantic-Visual Fusion
by Shuqi Zheng, Hao Zhou, Ziyang Shi, Fulin Su, Wei Shi, Ruifeng Liu, Lin Li and Fangying Wan
Plants 2026, 15(7), 1035; https://doi.org/10.3390/plants15071035 - 27 Mar 2026
Abstract
Camellia oleifera is an economically vital woody oil crop. Its productivity and oil quality are severely compromised by various diseases. Implementing pixel-level lesion segmentation within complex field environments is crucial for advancing precision plant protection. Despite recent progress, existing segmentation methods struggle with [...] Read more.
Camellia oleifera is an economically vital woody oil crop. Its productivity and oil quality are severely compromised by various diseases. Implementing pixel-level lesion segmentation within complex field environments is crucial for advancing precision plant protection. Despite recent progress, existing segmentation methods struggle with three primary challenges: semantic ambiguity arising from evolving pathological stages, blurred boundaries due to overlapping lesions, and the high omission rate of micro-lesions. To address these issues, this paper presents TB-DLossNet (Text-Conditioned Boundary-Aware Network with Dynamic Loss Reweighting), a novel segmentation framework based on semantic-visual multi-modal fusion. Leveraging VMamba as the visual backbone, the proposed model innovatively integrates BERT-encoded structured text as an auxiliary modality to resolve visual ambiguities through cross-modal semantic guidance. Furthermore, a boundary enhancement branch is incorporated alongside a multi-scale deep supervision strategy to mitigate boundary displacement and ensure the topological continuity of lesion structures. To tackle the detection of small-scale targets, we designed a dynamic weight loss function conditioned on lesion area, significantly bolstering the model’s sensitivity to minute pathological features. Additionally, to alleviate the scarcity of high-quality data, we curated a comprehensive multi-modal dataset encompassing seven typical diseases of Camellia oleifera. Experimental results demonstrate that TB-DLossNet achieves a Mean Intersection over Union (mIoU) of 87.02%, outperforming the state-of-the-art unimodal VMamba and multimodal Lvit by 4.9% and 2.59%, respectively. Qualitative evaluations confirm that our model exhibits lower false-negative rates and superior boundary-fitting precision in heterogeneous field scenarios. Finally, generalization tests on an apple disease dataset further validate the robustness and transferability of the proposed framework. Full article
(This article belongs to the Special Issue Advances in Artificial Intelligence for Plant Research—2nd Edition)
Show Figures

Figure 1

15 pages, 2219 KB  
Article
One Patch Is All You Need: Joint Surface Material Reconstruction and Classification from Minimal Visual Cues
by Sindhuja Penchala, Gavin Money, Gabriel Marques, Samuel Wood, Jessica Kirschman, Travis Atkison, Shahram Rahimi and Noorbakhsh Amiri Golilarz
Sensors 2026, 26(7), 2083; https://doi.org/10.3390/s26072083 - 27 Mar 2026
Viewed by 37
Abstract
Understanding material surfaces from sparse visual cues is critical for applications in robotics, simulation and material perception. However, most existing methods rely on dense or full scene observations, limiting their effectiveness in constrained or partial view environments. This gap highlights the need for [...] Read more.
Understanding material surfaces from sparse visual cues is critical for applications in robotics, simulation and material perception. However, most existing methods rely on dense or full scene observations, limiting their effectiveness in constrained or partial view environments. This gap highlights the need for models capable of inferring surfaces’ properties from extremely limited visual information. To address this challenge, we introduce SMARC, a unified model for Surface MAterial Reconstruction and Classification from minimal visual input. By giving only a single 10% contiguous patch of the image, SMARC recognizes and reconstructs the full RGB surface while simultaneously classifying the material category. Our architecture combines a Partial Convolutional U-Net with a classification head, enabling both spatial inpainting and semantic understanding under extreme observation sparsity. We compared SMARC against five models including convolutional autoencoders, Vision Transformer (ViT), Masked Autoencoder (MAE), Swin Transformer and DETR using the Touch and Go dataset of real-world surface textures. SMARC achieves the highest performance among the evaluated methods with a PSNR of 17.55 dB and a surface classification accuracy of 85.10%. These results validate the effectiveness of SMARC in relation to surface material understanding and highlight its potential for deployment in robotic perception tasks where visual access is inherently limited. Full article
(This article belongs to the Special Issue Advanced Sensors and AI Integration for Human–Robot Teaming)
Show Figures

Figure 1

28 pages, 8120 KB  
Article
Genetic Programming Algorithm Evolving Robust Unary Costs for Efficient Graph Cut Segmentation
by Reem M. Mostafa, Emad Mabrouk, Ahmed Ayman, Hamdy Z. Zidan and Abdelmonem M. Ibrahim
Algorithms 2026, 19(4), 256; https://doi.org/10.3390/a19040256 - 27 Mar 2026
Viewed by 120
Abstract
Accurate cell and nuclei segmentation remains challenging due to the sensitivity of classical graph-cut methods to parameter tuning. While deep learning models like U-Net offer strong performance, they require large annotated datasets and substantial GPU resources. This work presents a cost-effective alternative: a [...] Read more.
Accurate cell and nuclei segmentation remains challenging due to the sensitivity of classical graph-cut methods to parameter tuning. While deep learning models like U-Net offer strong performance, they require large annotated datasets and substantial GPU resources. This work presents a cost-effective alternative: a genetic programming (GP) framework that jointly optimizes unary cost functions and regularization parameters for graph-cut segmentation, coupled with automatic seed selection. Evaluation is conducted under two distinct protocols: (1) oracle-guided per-image optimization, establishing upper-bound performance (mean Dice 0.822, IoU 0.733), and (2) true generalization via train/test split, where expressions learned on 50 images are applied to 50 unseen images (mean Dice 0.695, IoU 0.588). The fixed-model generalization still significantly outperforms the baseline graph cut (+0.158 Dice, p<0.001). Cross-dataset validation on MoNuSeg (H&E histopathology) achieves a Dice score of 0.823 with the fixed GP model, significantly outperforming the baseline (+0.272). This result uses a single fixed model—the best-performing expression from BBBC038 training—applied in a zero-shot manner to MoNuSeg without any retraining or domain adaptation. All 100 images showed non-negative improvement under oracle optimization in the experiments. The method requires no GPU training, runs in 550 s per image for oracle search, and offers interpretable symbolic cost functions. Code and annotations are provided to ensure reproducibility. This approach offers a practical, interpretable alternative in resource-constrained biomedical imaging settings. Full article
(This article belongs to the Special Issue Bio-Inspired Algorithms: 2nd Edition)
Show Figures

Figure 1

32 pages, 43453 KB  
Article
ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data
by Zhanwu Zhuang, Ning Li, Weiye Xiao, Jiawei Wu and Lei Zhou
ISPRS Int. J. Geo-Inf. 2026, 15(4), 146; https://doi.org/10.3390/ijgi15040146 - 26 Mar 2026
Viewed by 117
Abstract
Building height is a key indicator of vertical urbanization and urban morphological complexity, yet accurately mapping building height at fine spatial resolution and large spatial scales remains challenging. This study proposes an attention-based deep learning framework (ABHNet) for building height estimation at a [...] Read more.
Building height is a key indicator of vertical urbanization and urban morphological complexity, yet accurately mapping building height at fine spatial resolution and large spatial scales remains challenging. This study proposes an attention-based deep learning framework (ABHNet) for building height estimation at a 10 m spatial resolution by integrating multi-source remote sensing data and socioeconomic information. The model jointly exploits Sentinel-1 synthetic aperture radar data, Sentinel-2 multispectral imagery, and point of interest (POI) data. The proposed framework is evaluated in Shanghai, a megacity with dense and vertically complex urban structures, using Baidu Maps-derived building height data as reference information. The results demonstrate that the proposed method achieves accurate building height estimation, with a root mean squared error (RMSE) of 3.81 m and a mean absolute error (MAE) of 0.96 m for 2023, and an RMSE of 3.30 m and an MAE of 0.78 m for 2019, indicating robust performance across different time periods. Also, this model is applied in two other cities (Changzhou and Guiyang) and the results indicate good performance. In addition, the expandability of the framework is examined by incorporating higher-resolution ZY-3 imagery, for which the spatial resolution was increased to 2.5 m, highlighting the potential extension of the model to heterogeneous data sources. Overall, this study demonstrates the effectiveness of attention-based deep learning and multimodal data fusion for large-scale and fine-resolution building height estimation using open-source data. Full article
Show Figures

Figure 1

24 pages, 1740 KB  
Article
A Skip-Free Collaborative Residual U-Net for Secure Multi-Center Liver and Tumor Segmentation
by Omar Ibrahim Alirr
Eng 2026, 7(4), 151; https://doi.org/10.3390/eng7040151 - 26 Mar 2026
Viewed by 153
Abstract
Accurate liver and tumor segmentation from abdominal computed tomography (CT) scans is essential for diagnosis and treatment planning; however, centralized deep learning approaches are often constrained by privacy regulations and inter-institution data-sharing limitations. To address these challenges, we propose a skip-free feature-forward collaborative [...] Read more.
Accurate liver and tumor segmentation from abdominal computed tomography (CT) scans is essential for diagnosis and treatment planning; however, centralized deep learning approaches are often constrained by privacy regulations and inter-institution data-sharing limitations. To address these challenges, we propose a skip-free feature-forward collaborative segmentation framework called Feature-Forward Residual U-Net (FF-ResUNet), in which each institution executes the encoder locally and transmits only compact bottleneck representations to a central server. High-resolution encoder features and skip connections remain strictly within institutional boundaries, reducing privacy exposure and communication overhead. The server reconstructs segmentation masks using a multi-scale dilated residual decoder with progressive upsampling and returns lightweight updates for encoder refinement. FF-ResUNet is evaluated on the Liver Tumor Segmentation (LiTS) Challenge dataset, with cross-domain testing on 3D-IRCADb and AMOS-CT to assess robustness under distribution shifts and simulated multi-institution collaboration. On LiTS, the proposed framework achieves a liver Dice score of 0.952 ± 0.015 and a tumor Dice score of 0.737 ± 0.060, with a tumor HD95 of 10.9 ± 4.1 mm. Cross-domain experiments demonstrate stable generalization to unseen datasets, while multi-client simulations show improved performance as the number of participating institutions increases before saturation. Compared with skip-based collaborative U-Net architectures, FF-ResUNet reduces communication payload by 92–98% per training iteration while maintaining competitive segmentation accuracy. These results indicate that FF-ResUNet provides an effective balance between segmentation performance, communication efficiency, and privacy preservation evaluated under simulated multi-institution collaborative settings, supporting practical multi-center clinical deployment in bandwidth- and policy-constrained healthcare environments. Full article
Show Figures

Figure 1

34 pages, 6554 KB  
Article
Syncretic Grad-CAM Integrated ViT-CNN Hybrids with Inherent Explainability for Early Thyroid Cancer Diagnosis from Ultrasound
by Ahmed Y. Alhafdhi, Gibrael Abosamra and Abdulrhman M. Alshareef
Diagnostics 2026, 16(7), 999; https://doi.org/10.3390/diagnostics16070999 - 26 Mar 2026
Viewed by 116
Abstract
Background/Objectives: Accurate detection of thyroid cancer using ultrasound remains a challenge, as malignant nodules can be microscopic and heterogeneous, easily confused with point clusters and borderline-featured tissues. Current studies in deep learning demonstrate good performance with convolutional neural networks (CNNs) and clustering; however, [...] Read more.
Background/Objectives: Accurate detection of thyroid cancer using ultrasound remains a challenge, as malignant nodules can be microscopic and heterogeneous, easily confused with point clusters and borderline-featured tissues. Current studies in deep learning demonstrate good performance with convolutional neural networks (CNNs) and clustering; however, many approaches focus on local tissue and provide limited, non-quantitative interpretation, reducing clinical confidence. This study proposes an integrated framework combining enhanced convolutional feature encoders (DenseNet169 and VGG19) with an enhanced vision transformer (ViT-E) to integrate local feature and global relational context during learning, rather than delayed integration. Methods: The proposed framework integrates enhanced convolutional feature encoders (DenseNet169 and VGG19) with an enhanced vision transformer (ViT-E), enabling simultaneous learning of local feature representations and global relational context. This design allows feature fusion during the learning stage instead of delayed integration, aiming to improve diagnostic performance and interpretability in thyroid ultrasound image analysis. Results: The best-performing model, ViT-E–DenseNet169, achieved 98.5% accuracy, 98.9% sensitivity, 99.15% specificity, and 97.35% AUC, surpassing the robust basic hybrid model (CNN–XGBoost/ANN) and existing systems. A second contribution is improved interpretability, moving from mere illustration to validation. Gradient-weighted class activation mapping (Grad-CAM) maps demonstrated distinct and clinically understandable concentration patterns across various thyroid cancers: precise intralesional concentration for high-confidence malignancies (PTC = 0.968), edge/interface concentration for capsule risk patterns (PTC = 0.957), and broader-field activation consistent with infiltration concerns (PTC = 0.984), while benign scans showed low and diffuse activation (PTC = 0.002). Spatial audits reinforced this behavior (IoU/PAP: 0.72/91%, 0.65/78%, 0.58/62%). Conclusions: The integrated ViT-E–DenseNet169 framework provides highly accurate thyroid cancer detection while offering clinically meaningful interpretability through Grad-CAM-based spatial validation, supporting improved confidence in AI-assisted ultrasound diagnosis. Full article
(This article belongs to the Special Issue Deep Learning Techniques for Medical Image Analysis)
Show Figures

Figure 1

33 pages, 783 KB  
Systematic Review
A Systematic Review of Deep Learning Approaches for Hepatopancreatic Tumor Segmentation
by Razeen Hussain, Muhammad Mohsin, Dadan Khan and Mohammad Zohaib
J. Imaging 2026, 12(4), 147; https://doi.org/10.3390/jimaging12040147 - 26 Mar 2026
Viewed by 253
Abstract
Deep learning has advanced rapidly in medical image segmentation, yet hepatopancreatic tumor delineation remains challenging due to low contrast, small lesion size, organ variability, and limited high-quality annotations. Existing reviews are outdated or overly broad, leaving recent architectural developments, training strategies, and dataset [...] Read more.
Deep learning has advanced rapidly in medical image segmentation, yet hepatopancreatic tumor delineation remains challenging due to low contrast, small lesion size, organ variability, and limited high-quality annotations. Existing reviews are outdated or overly broad, leaving recent architectural developments, training strategies, and dataset limitations insufficiently synthesized. To address this gap, we conducted a PRISMA 2020 systematic literature review of studies published between 2021 and 2026 on deep learning-based liver and pancreatic tumor segmentation. From 2307 records, 84 studies met inclusion criteria. U-Net variants continue to dominate, achieving strong liver segmentation but inconsistent tumor accuracy, while transformer-based and hybrid models improve global context modeling at higher computational cost. Attention mechanisms, boundary-refinement modules, and semi-supervised learning offer incremental gains, yet pancreatic tumor segmentation remains notably difficult. Persistent issues, including domain shift, class imbalance, and limited generalization across datasets, underscore the need for more robust architectures, standardized benchmarks, and clinically oriented evaluation. This review consolidates recent progress and highlights key challenges that must be addressed to advance reliable hepatopancreatic tumor segmentation. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

36 pages, 1944 KB  
Article
EMAF-Net: A Lightweight Single-Stage Detector for 13-Class Object Detection in Agricultural Rural Road Scenes
by Zhixin Yao, Chunjiang Zhao, Yunjie Zhao, Xiaoyi Liu, Tuo Sun and Taihong Zhang
Sensors 2026, 26(7), 2055; https://doi.org/10.3390/s26072055 - 25 Mar 2026
Viewed by 237
Abstract
Rural road perception for agricultural machinery automation faces challenges including complex backgrounds, drastic lighting and weather variations, frequent occlusions, and high densities of small objects with significant scale variations. These factors make conventional detectors prone to missed detections and misclassifications. To address these [...] Read more.
Rural road perception for agricultural machinery automation faces challenges including complex backgrounds, drastic lighting and weather variations, frequent occlusions, and high densities of small objects with significant scale variations. These factors make conventional detectors prone to missed detections and misclassifications. To address these issues, a 4K rural road dataset with 4771 images is constructed. The dataset covers 13 object categories and includes diverse day/night conditions and multiple weather scenarios on both structured and unstructured roads. EMAF-Net, a lightweight single-stage detector based on YOLOv4-P6, is proposed. The backbone integrates an EMHA module combining EfficientNet-B1 with multi-head self-attention (MHSA) for enhanced global context modeling while preserving efficient local feature extraction. The neck adopts an Improved ASPP and a bidirectional FPN to achieve robust multi-scale feature fusion and expanded receptive fields. Meanwhile, CIoU loss is used to optimize bounding box regression accuracy. The experimental results demonstrate that EMAF-Net achieves an mAP@0.5 of 64.05% and an mAP@0.5:0.95 of 48.95% on a rural road dataset. At the same time, it maintains a lightweight design with 18.3 M parameters and a computational complexity of 38.5 GFLOPs. Ablation studies confirm the EMHA module contributes a 6.22% mAP@0.5 improvement, validating EMAF-Net’s effectiveness for real-time rural road perception in autonomous agricultural systems. Full article
(This article belongs to the Section Smart Agriculture)
Back to TopTop