Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,645)

Search Parameters:
Keywords = UNet

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 2097 KB  
Article
A Comparative Study on Ocean Front Detection in the Northwestern Pacific Using U-Net and Mask R-CNN
by Caixia Shao, Dianjun Zhang and Xuefeng Zhang
Oceans 2026, 7(2), 29; https://doi.org/10.3390/oceans7020029 (registering DOI) - 31 Mar 2026
Abstract
Ocean fronts play a vital role in modulating climate variability, driving material transport, and maintaining the stability of marine ecosystems. Therefore, accurate identification of ocean fronts is of great significance for marine environmental monitoring and resource management. This study focuses on the Northwestern [...] Read more.
Ocean fronts play a vital role in modulating climate variability, driving material transport, and maintaining the stability of marine ecosystems. Therefore, accurate identification of ocean fronts is of great significance for marine environmental monitoring and resource management. This study focuses on the Northwestern Pacific region and conducts a systematic comparison between two representative deep learning models—U-Net and Mask R-CNN—for automated ocean front detection. The objective is to evaluate the adaptability and strengths of different network architectures in handling multi-scale features, complex background conditions, and boundary delineation, thereby providing a theoretical basis for model selection and application-specific deployment. Experimental results show that U-Net achieves superior spatial consistency in large-scale frontal segmentation, with an IoU of 0.81 and a Dice coefficient of 0.76, while maintaining relatively high computational efficiency. In contrast, Mask R-CNN demonstrates stronger boundary modeling capabilities in detecting small-scale fronts and handling heterogeneous backgrounds, achieving an IoU of 0.78 and a Dice score of 0.73, though at the cost of increased computational demand. Overall, U-Net is more suitable for broad-scale automatic detection of ocean fronts, whereas Mask R-CNN exhibits greater potential in complex scene recognition. Integrating the structural advantages of both models holds promise for further enhancing the stability and accuracy of frontal detection, thereby offering robust technical support for ocean remote sensing analysis and environmental forecasting. Full article
(This article belongs to the Special Issue Recent Progress in Ocean Fronts)
Show Figures

Figure 1

32 pages, 2453 KB  
Article
An Improved MSEM-Deeplabv3+ Method for Intelligent Detection of Rock Mass Fractures
by Chi Zhang, Shu Gan, Xiping Yuan, Weidong Luo, Chong Ma and Yi Li
Remote Sens. 2026, 18(7), 1041; https://doi.org/10.3390/rs18071041 - 30 Mar 2026
Abstract
Fractures as critical discontinuous structural planes in rock masses, directly govern their stability and serve as the core controlling factor in rock mechanics engineering. Existing deep learning models for fracture extraction face persistent challenges, including imbalanced integration of deep and shallow features, limited [...] Read more.
Fractures as critical discontinuous structural planes in rock masses, directly govern their stability and serve as the core controlling factor in rock mechanics engineering. Existing deep learning models for fracture extraction face persistent challenges, including imbalanced integration of deep and shallow features, limited suppression of background noise, inadequate multi-scale feature representation, and large parameter sizes—making it difficult to strike a balance between detection accuracy and deployment efficiency. Focusing on the Wanshanshan quarry in Yunnan, this study first constructs a high-precision digital model using close-range photogrammetry and 3D real-scene reconstruction. A lightweight yet high-accuracy intelligent detection method, termed MSEM-Deeplabv3+, is then proposed for rock mass fracture extraction. The model adopts lightweight MobileNetV2 as the backbone network, incorporating inverted residual modules and depthwise separable convolutions, resulting in a parameter size of only 6.02 MB and FLOPs of 30.170 G—substantially reducing computational overhead. Furthermore, the proposed MAGF (Multi-Scale Attention Gated Fusion) and SCSA (Spatial-Channel Synergistic Attention) modules are integrated to enhance the representation of fracture details and semantic consistency while effectively suppressing multi-source and multi-scale background interference. Experimental results demonstrate that the proposed model achieves an mPA of 89.69%, mIoU of 83.71%, F1-Score of 90.41%, and Kappa coefficient of 80.81%, outperforming the classic Deeplabv3+ model by 5.81%, 6.18%, 4.53%, and 9.2%, respectively. It also significantly surpasses benchmark models such as U-Net and HRNet. The method accurately captures fine and continuous fracture details, preserves the spatial distribution of long-range continuous fractures, and maintains robust performance on the CFD cross-scene dataset, showcasing strong adaptability and generalization capability. This approach effectively mitigates the risks associated with manual high-altitude inspections and provides a lightweight, high-precision, non-contact intelligent solution for fracture detection in high-steep rock slopes. Full article
17 pages, 3863 KB  
Article
SemiWaferNet: Efficient Semi-Supervised Hybrid CNN–Transformer Models for Wafer Defect Classification and Segmentation
by Ruiwen Shi, Ruihan Liu, Zhiguo Zhou and Xuehua Zhou
Electronics 2026, 15(7), 1437; https://doi.org/10.3390/electronics15071437 (registering DOI) - 30 Mar 2026
Abstract
Wafer defect analysis is important for semiconductor manufacturing, but labeled data are limited, and class distributions are highly imbalanced. We present a semi-supervised framework with two lightweight hybrid CNN–Transformer models for wafer defect classification and segmentation. For classification, HybridCNN-ViT combines CNN-based local feature [...] Read more.
Wafer defect analysis is important for semiconductor manufacturing, but labeled data are limited, and class distributions are highly imbalanced. We present a semi-supervised framework with two lightweight hybrid CNN–Transformer models for wafer defect classification and segmentation. For classification, HybridCNN-ViT combines CNN-based local feature extraction with Transformer-based global context modeling, and adopts a three-stage progressive pseudo-labeling strategy to leverage unlabeled samples. The pseudo-label selection mechanism is systematically calibrated to improve pseudo-label reliability under limited labeled data. For segmentation, ConvoFormer-UNet integrates convolution-enhanced embeddings with Transformer blocks to balance boundary detail and global context. On the public WM-811K dataset, HybridCNN-ViT achieves 98.72% accuracy and 0.9985 macro-AUC under the semi-supervised setting for classification, while ConvoFormer-UNet reaches 99.19% IoU for segmentation with fewer parameters than several baselines. We also report efficiency on a single GPU to illustrate practical inference speed. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

22 pages, 7692 KB  
Article
SSF-TransUnet: Fine-Grained Crop Classification via Cross-Source Spatial Spectral Fusion
by Jian Yan, Xueke Chen, Rongrong Ren, Xiaofei Mi, Zhanliang Yuan, Jian Yang, Xianhong Meng, Zhenzhao Jiang, Hongbo Zhu and Yong Liu
Remote Sens. 2026, 18(7), 1034; https://doi.org/10.3390/rs18071034 - 30 Mar 2026
Abstract
Accurate exploitation of spatial structures and spectral characteristics is essential for fine-grained crop classification using remote sensing imagery. Although multi-source remote sensing data provide complementary information, most existing methods implicitly assume homogeneous data sources with consistent spatial resolution. In practice, high spatial resolution [...] Read more.
Accurate exploitation of spatial structures and spectral characteristics is essential for fine-grained crop classification using remote sensing imagery. Although multi-source remote sensing data provide complementary information, most existing methods implicitly assume homogeneous data sources with consistent spatial resolution. In practice, high spatial resolution and rich spectral information are usually provided by different sensors, making cross-source spatial–spectral fusion a non-trivial challenge. To address this issue, we propose SSF-TransUnet, a dual-branch spatial–spectral joint modeling framework for fine crop classification. The proposed network explicitly decouples spatial structure extraction and spectral discriminability learning by jointly utilizing high spatial resolution imagery and multi-spectral observations acquired from different satellite sensors within a unified architecture. To support model training and evaluation, we construct SSCR-Agri, a spatial–spectral complementary resolution agricultural dataset integrating meter-level GF-2 imagery and multi-spectral Sentinel-2 data from five representative agricultural regions in northern China, covering five crop categories including corn, rice, wheat, potato, and others. Extensive experiments demonstrate that SSF-TransUnet consistently outperforms representative CNN-based and hybrid CNN–Transformer models. The proposed method achieves an overall accuracy (OA) of 81.84% and a mean Intersection over Union (mIoU) of 0.6954 in fine-grained crop classification, effectively distinguishing crops. These results highlight the effectiveness of spatial–spectral joint modeling for high-resolution crop mapping and demonstrate its potential for precision agriculture and large-scale agricultural monitoring applications, and shows a promising mechanism when combined with multi-temporal observations. Full article
Show Figures

Figure 1

26 pages, 4917 KB  
Article
A Comprehensive Clinical Decision Support System for the Early Diagnosis of Axial Spondyloarthritis: Multi-Sequence MRI, Clinical Risk Integration, and Explainable Segmentation
by Fatih Tarakci, Ilker Ali Ozkan, Musa Dogan, Halil Ozer, Dilek Tezcan and Sema Yilmaz
Diagnostics 2026, 16(7), 1037; https://doi.org/10.3390/diagnostics16071037 - 30 Mar 2026
Abstract
Background/Objectives: This study aims to develop a comprehensive Clinical Decision Support System (CDSS) that integrates multi-sequence sacroiliac joint (SIJ) MRIs with rheumatological, clinical, and laboratory findings into the decision-making process for the early diagnosis of axial spondyloarthritis (axSpA), incorporating segmentation-supported explainability. Methods: Multi-sequence [...] Read more.
Background/Objectives: This study aims to develop a comprehensive Clinical Decision Support System (CDSS) that integrates multi-sequence sacroiliac joint (SIJ) MRIs with rheumatological, clinical, and laboratory findings into the decision-making process for the early diagnosis of axial spondyloarthritis (axSpA), incorporating segmentation-supported explainability. Methods: Multi-sequence SIJ MRI data (T1-WI, T2-WI, STIR, and PD-WI) were analysed from 367 participants (n = 193 axSpA; n = 174 non-axSpA controls). Sequence-based classification was performed using VGG16, ResNet50, DenseNet121, and InceptionV3 models; additionally, a lightweight and parameter-efficient SacroNet architecture was developed. Slice-level probability scores were converted to patient-level scores using the Dynamic Top-K Averaging method. Image-based scores were combined with a logistic regression-based clinical risk score using weighted linear integration (0.60 image/0.40 clinical) and a conservative threshold (τ = 0.70). Grad-CAM was applied for visual interpretability. Furthermore, to support the diagnostic outcomes with precise spatial data, active inflammation in STIR and T2-WI sequences was segmented. For this purpose, the MDC-UNet model was employed and compared with baseline U-Net derivatives. Results: Sequence-specific analysis showed VGG16 performing best on T1-WI (AUC = 0.920; Accuracy = 0.878) and DenseNet121 on STIR (AUC = 0.793; Accuracy = 0.771). The SacroNet architecture provided competitive classification performance at the patient level despite its low number of parameters (~110 K). Furthermore, MDC-UNet successfully segmented active inflammation, yielding Dice scores of 0.752 (HD95: 19.25) for STIR and 0.682 (HD95: 26.21) for T2-WI. Conclusions: The findings demonstrate that patient-level decision integration based on multi-sequence MRI, when used in conjunction with clinical risk scoring and segmentation-assisted interpretability, can provide a feasible and interpretable DSS framework for the early diagnosis of axSpA. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

29 pages, 6898 KB  
Article
MDE-UNet: A Physically Guided Asymmetric Fusion Network for Multi-Source Meteorological Data Lightning Identification
by Yihua Chen, Yuanpeng Han, Yujian Zhang, Yi Liu, Lin Song, Jialei Wang, Xinjue Wang and Qilin Zhang
Remote Sens. 2026, 18(7), 1027; https://doi.org/10.3390/rs18071027 - 29 Mar 2026
Abstract
Utilizing multi-source meteorological data for lightning identification is crucial for monitoring severe convective weather. However, several key challenges persist in this field: dimensional imbalance and modal competition among multi-source heterogeneous data, model training bias caused by the extreme sparsity of lightning samples, and [...] Read more.
Utilizing multi-source meteorological data for lightning identification is crucial for monitoring severe convective weather. However, several key challenges persist in this field: dimensional imbalance and modal competition among multi-source heterogeneous data, model training bias caused by the extreme sparsity of lightning samples, and an imbalance between false alarms and missed detections resulting from complex background noise. To address these challenges, this paper proposes a lightning identification network guided by physical priors and constrained by supervision. First, to tackle the issue of modal competition in fusing satellite (high-dimensional) and radar (low-dimensional) data, a physical prior-guided asymmetric radar information enhancement mechanism is introduced. This mechanism uses radar physical features as contextual guidance to selectively enhance the latent weak radar signatures. Second, at the architectural level, a multi-source multi-scale feature fusion module and a weighted sliding window–multilayer perceptron (MLP) enhanced decoding unit are constructed. The former achieves the coupling of multi-scale physical features at a 2 km grid scale through cross-level semantic alignment, building a highly consistent feature field that effectively improves the model’s ability to detect lightning signals. The latter leverages adaptive receptive fields and the nonlinear modeling capability of MLPs to effectively smooth spatially discrete noise, ensuring spatial continuity in the reconstructed results. Finally, to address the model bias caused by severe class imbalance between positive and negative samples—resulting from the extreme sparsity of lightning events—an asymmetrically weighted BCE-DICE loss function is designed. Its “asymmetric” characteristic is implemented by assigning different penalty weights to false-positive and false-negative predictions. This loss function balances pixel-level accuracy and inter-class equilibrium while imposing high-weight penalties on false-positive predictions, achieving synergistic optimization of feature enhancement and directional suppression. Experimental results show that the proposed method effectively increases the hit rate while substantially reducing the false alarm rate, enabling efficient utilization of multi-source data and high-precision identification of lightning strike areas. Full article
15 pages, 1771 KB  
Article
Deep Learning-Based Generation of Retinal Nerve Fibre Layer Thickness Maps from Fundus Photographs: A Comparative Analysis of U-Net Architectures for Accessible Glaucoma Assessment
by Kyoung Ohn, Harin Jun, Yong-Sik Kim and Woong-Joo Whang
Life 2026, 16(4), 559; https://doi.org/10.3390/life16040559 (registering DOI) - 29 Mar 2026
Abstract
Introduction: Optical coherence tomography (OCT) is the gold standard for retinal nerve fibre layer (RNFL) assessment; its high cost and limited accessibility hinder widespread use. This study aims to develop deep learning models that generate RNFL thickness maps from fundus images, providing a [...] Read more.
Introduction: Optical coherence tomography (OCT) is the gold standard for retinal nerve fibre layer (RNFL) assessment; its high cost and limited accessibility hinder widespread use. This study aims to develop deep learning models that generate RNFL thickness maps from fundus images, providing a cost-effective alternative to OCT. Methods: A dataset of 5000 fundus-OCT image pairs from 5000 unique glaucoma patients was used to train and compare the following four U-Net-based deep learning models: ResU-Net, R2U-Net, Nested U-Net, and Dense U-Net. All models were trained for up to 1000 epochs with early stopping (patience = 50 epochs). Performance was evaluated using Mean Squared Error (MSE), Mean Absolute Error (MAE), Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Fréchet Inception Distance (FID). Results: ResU-Net demonstrated the best performance, achieving MSE = 0.00061, MAE = 0.01877, SSIM = 0.9163, PSNR = 32.19 dB, and FID = 30.08. These results represent a 108% improvement in SSIM and a 67% improvement in PSNR compared to previously published benchmark for this task. Conclusions: This study demonstrates that deep learning models, particularly ResU-Net, can generate high-fidelity RNFL thickness maps from fundus photographs, substantially outperforming prior published benchmarks. This approach represents a potential contribution toward accessible glaucoma assessment, contingent upon prospective clinical validation and regulatory evaluation. Full article
(This article belongs to the Special Issue Vision Science and Optometry: 2nd Edition)
Show Figures

Figure 1

16 pages, 13705 KB  
Article
PRefiner: Enhancing Overlapped Cervical Cell Segmentation Through Progressive Refinement
by Linlin Zhu, Jiaxun Li and Jiaxi Liu
Electronics 2026, 15(7), 1418; https://doi.org/10.3390/electronics15071418 - 28 Mar 2026
Viewed by 13
Abstract
Cervical cancer is one of the most prevalent and easily contracted diseases among women, significantly impacting their daily lives. Computer vision-based cervical cell morphology diagnosis technology can offer robust support for cervical cell analysis at a lower cost. However, the presence of a [...] Read more.
Cervical cancer is one of the most prevalent and easily contracted diseases among women, significantly impacting their daily lives. Computer vision-based cervical cell morphology diagnosis technology can offer robust support for cervical cell analysis at a lower cost. However, the presence of a substantial number of overlapping cells in cervical images renders existing cell segmentation methods less accurate, thereby complicating the guidance of medical diagnosis. In this paper, we introduce a tristage Progressive Refinement method (PRefiner) for overlapping cell segmentation that decouples the traditional end-to-end pipeline, with the final stage specifically correcting anomalous results to enhance precision. We achieve separable overlapping cervical cell segmentation results through a cell nucleus locator, a single-cell segmenter, and a Segmentation Result Mask Refiner. Specifically, we employ a hybrid U-Net as the primary network for the cell nucleus locator and single-cell segmenter, which determines the position of the cell nucleus and procures the initial coarse segmentation result. In the mask refiner, we incorporate a conditional generation framework to address the perception decision problem and design a local–global dual-scale discriminator to ensure that the segmentation result aligns with the prior of a single-cell mask. Experimental results on CCEDD and ISBI2015 demonstrate that PRefiner achieves optimal performance by effectively resolving abnormal segmentations. Notably, our method improves the Dice coefficient of abnormal results from five different models by an average of 2.62% (ranging from 1.0% to 5.1%). Full article
(This article belongs to the Special Issue AI-Driven Image Processing: Theory, Methods, and Applications)
Show Figures

Figure 1

50 pages, 10525 KB  
Article
Passable Area Evaluation of Tractor Road Based on Improved YOLOv5s and Multi-Factor Fusion
by Qian Zhang, Wenjie Xu, Wenfei Wu, Lizhang Xu, Zhenghui Zhao and Shaowei Liang
Agriculture 2026, 16(7), 752; https://doi.org/10.3390/agriculture16070752 - 28 Mar 2026
Viewed by 22
Abstract
The tractor road, as the core scene for autonomous driving of grain transport vehicles, is unstructured, complex, and obstacle-rich, leading to poor real-time performance and accuracy of joint road and obstacle detection with existing YOLOv5s. Furthermore, the reliability of passable area evaluation is [...] Read more.
The tractor road, as the core scene for autonomous driving of grain transport vehicles, is unstructured, complex, and obstacle-rich, leading to poor real-time performance and accuracy of joint road and obstacle detection with existing YOLOv5s. Furthermore, the reliability of passable area evaluation is low solely based on environmental factors. Therefore, YOLOv5s-C2S is proposed, fusing multi-scale features, attention mechanism, and dynamic features for joint detection. Firstly, YOLOv5s-CC is proposed for road detection by fusing context and spatial details and introducing Criss-Cross attention. Secondly, YOLOv5s-SGA is proposed for obstacle detection by grouped and spatial convolution, parameter-free attention, and adaptive feature fusion. By reusing YOLOv5s-CC weights, YOLOv5s-C2S shares low-level features and decouples high-level specificity. Based on the tractor road and obstacle information, combined with vehicle factors, a weighted scoring–based comprehensive method for passable area evaluation is proposed. Finally, the method was verified through experiments with an intelligent tracked grain transport vehicle using self-constructed datasets, including VOC_Road (11,927 images) and VOC_Obstacle (21,779 images). Compared with existing YOLOv5s, Deeplabv3+, FCN, Unet and SegNet, the mAP50 of road detection by YOLOv5s-CC increased by over 1.2%. Compared with existing YOLOv5s, R-CNN, YOLOv7, SSD and YOLOv8n, the mAP50 of obstacle detection by YOLOv5s-SGA increased by over 2%. Compared with YOLOv5s-SD, the mAP50 of joint detection by YOLOv5s-C2S increased by 9.3%, and the frame rate increased by 7.0 FPS. The proposed passable area evaluation method exhibits strong robustness and reliability in complex environments, meeting the accuracy and real-time requirements in autonomous driving of grain transport vehicles. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
24 pages, 3376 KB  
Article
EMDiC: Physics-Informed Conditional Diffusion Denoising for Frequency-Domain Electromagnetic Signals
by Zhenlin Du, Miaomiao Gao, Zhijie Qu and Xiaojuan Zhang
Appl. Sci. 2026, 16(7), 3249; https://doi.org/10.3390/app16073249 - 27 Mar 2026
Viewed by 176
Abstract
Frequency-domain electromagnetic (FDEM) measurements for shallow subsurface exploration are frequently corrupted by noise, which masks weak secondary-field responses and degrades interpretation. We propose an electromagnetic diffusion CNN (EMDiC) for 1D multi-frequency FDEM denoising, where denoising is formulated as conditional diffusion-based generation. EMDiC combines [...] Read more.
Frequency-domain electromagnetic (FDEM) measurements for shallow subsurface exploration are frequently corrupted by noise, which masks weak secondary-field responses and degrades interpretation. We propose an electromagnetic diffusion CNN (EMDiC) for 1D multi-frequency FDEM denoising, where denoising is formulated as conditional diffusion-based generation. EMDiC combines an analytic frequency–spatial encoder, a Feature-wise Linear Modulation (FiLM)-conditioned convolutional hourglass backbone, and a physics-informed composite loss built on velocity loss to improve waveform reconstruction under severe noise. A reproducible synthetic dataset is constructed through layered-earth forward modeling with concentric Transmitter–Receiver (TX–RX) geometry, multiple target categories, and mixed noise waveforms. On synthetic benchmarks covering multiple noise levels and material types, EMDiC achieves the best overall performance in Root Mean Square Error (RMSE), Signal-to-Noise Ratio (SNR), and Normalized cross-correlation (NCC) among 1D U-Net, diffusion-based variants, and representative neural baselines, with the clearest gains under medium-to-strong noise and for targets with pronounced induction responses. Ablation experiments verify the complementary contributions of electromagnetic positional encoding (EMPE), FiLM conditioning, and the composite loss. Field data validation with a self-developed GEM-3 system further shows that EMDiC improves cross-frequency coherence and suppresses oscillations while preserving the main response characteristics. Full article
Show Figures

Figure 1

15 pages, 2219 KB  
Article
One Patch Is All You Need: Joint Surface Material Reconstruction and Classification from Minimal Visual Cues
by Sindhuja Penchala, Gavin Money, Gabriel Marques, Samuel Wood, Jessica Kirschman, Travis Atkison, Shahram Rahimi and Noorbakhsh Amiri Golilarz
Sensors 2026, 26(7), 2083; https://doi.org/10.3390/s26072083 - 27 Mar 2026
Viewed by 178
Abstract
Understanding material surfaces from sparse visual cues is critical for applications in robotics, simulation and material perception. However, most existing methods rely on dense or full scene observations, limiting their effectiveness in constrained or partial view environments. This gap highlights the need for [...] Read more.
Understanding material surfaces from sparse visual cues is critical for applications in robotics, simulation and material perception. However, most existing methods rely on dense or full scene observations, limiting their effectiveness in constrained or partial view environments. This gap highlights the need for models capable of inferring surfaces’ properties from extremely limited visual information. To address this challenge, we introduce SMARC, a unified model for Surface MAterial Reconstruction and Classification from minimal visual input. By giving only a single 10% contiguous patch of the image, SMARC recognizes and reconstructs the full RGB surface while simultaneously classifying the material category. Our architecture combines a Partial Convolutional U-Net with a classification head, enabling both spatial inpainting and semantic understanding under extreme observation sparsity. We compared SMARC against five models including convolutional autoencoders, Vision Transformer (ViT), Masked Autoencoder (MAE), Swin Transformer and DETR using the Touch and Go dataset of real-world surface textures. SMARC achieves the highest performance among the evaluated methods with a PSNR of 17.55 dB and a surface classification accuracy of 85.10%. These results validate the effectiveness of SMARC in relation to surface material understanding and highlight its potential for deployment in robotic perception tasks where visual access is inherently limited. Full article
(This article belongs to the Special Issue Advanced Sensors and AI Integration for Human–Robot Teaming)
Show Figures

Figure 1

28 pages, 8120 KB  
Article
Genetic Programming Algorithm Evolving Robust Unary Costs for Efficient Graph Cut Segmentation
by Reem M. Mostafa, Emad Mabrouk, Ahmed Ayman, Hamdy Z. Zidan and Abdelmonem M. Ibrahim
Algorithms 2026, 19(4), 256; https://doi.org/10.3390/a19040256 - 27 Mar 2026
Viewed by 208
Abstract
Accurate cell and nuclei segmentation remains challenging due to the sensitivity of classical graph-cut methods to parameter tuning. While deep learning models like U-Net offer strong performance, they require large annotated datasets and substantial GPU resources. This work presents a cost-effective alternative: a [...] Read more.
Accurate cell and nuclei segmentation remains challenging due to the sensitivity of classical graph-cut methods to parameter tuning. While deep learning models like U-Net offer strong performance, they require large annotated datasets and substantial GPU resources. This work presents a cost-effective alternative: a genetic programming (GP) framework that jointly optimizes unary cost functions and regularization parameters for graph-cut segmentation, coupled with automatic seed selection. Evaluation is conducted under two distinct protocols: (1) oracle-guided per-image optimization, establishing upper-bound performance (mean Dice 0.822, IoU 0.733), and (2) true generalization via train/test split, where expressions learned on 50 images are applied to 50 unseen images (mean Dice 0.695, IoU 0.588). The fixed-model generalization still significantly outperforms the baseline graph cut (+0.158 Dice, p<0.001). Cross-dataset validation on MoNuSeg (H&E histopathology) achieves a Dice score of 0.823 with the fixed GP model, significantly outperforming the baseline (+0.272). This result uses a single fixed model—the best-performing expression from BBBC038 training—applied in a zero-shot manner to MoNuSeg without any retraining or domain adaptation. All 100 images showed non-negative improvement under oracle optimization in the experiments. The method requires no GPU training, runs in 550 s per image for oracle search, and offers interpretable symbolic cost functions. Code and annotations are provided to ensure reproducibility. This approach offers a practical, interpretable alternative in resource-constrained biomedical imaging settings. Full article
(This article belongs to the Special Issue Bio-Inspired Algorithms: 2nd Edition)
Show Figures

Figure 1

32 pages, 43453 KB  
Article
ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data
by Zhanwu Zhuang, Ning Li, Weiye Xiao, Jiawei Wu and Lei Zhou
ISPRS Int. J. Geo-Inf. 2026, 15(4), 146; https://doi.org/10.3390/ijgi15040146 - 26 Mar 2026
Viewed by 164
Abstract
Building height is a key indicator of vertical urbanization and urban morphological complexity, yet accurately mapping building height at fine spatial resolution and large spatial scales remains challenging. This study proposes an attention-based deep learning framework (ABHNet) for building height estimation at a [...] Read more.
Building height is a key indicator of vertical urbanization and urban morphological complexity, yet accurately mapping building height at fine spatial resolution and large spatial scales remains challenging. This study proposes an attention-based deep learning framework (ABHNet) for building height estimation at a 10 m spatial resolution by integrating multi-source remote sensing data and socioeconomic information. The model jointly exploits Sentinel-1 synthetic aperture radar data, Sentinel-2 multispectral imagery, and point of interest (POI) data. The proposed framework is evaluated in Shanghai, a megacity with dense and vertically complex urban structures, using Baidu Maps-derived building height data as reference information. The results demonstrate that the proposed method achieves accurate building height estimation, with a root mean squared error (RMSE) of 3.81 m and a mean absolute error (MAE) of 0.96 m for 2023, and an RMSE of 3.30 m and an MAE of 0.78 m for 2019, indicating robust performance across different time periods. Also, this model is applied in two other cities (Changzhou and Guiyang) and the results indicate good performance. In addition, the expandability of the framework is examined by incorporating higher-resolution ZY-3 imagery, for which the spatial resolution was increased to 2.5 m, highlighting the potential extension of the model to heterogeneous data sources. Overall, this study demonstrates the effectiveness of attention-based deep learning and multimodal data fusion for large-scale and fine-resolution building height estimation using open-source data. Full article
Show Figures

Figure 1

24 pages, 1740 KB  
Article
A Skip-Free Collaborative Residual U-Net for Secure Multi-Center Liver and Tumor Segmentation
by Omar Ibrahim Alirr
Eng 2026, 7(4), 151; https://doi.org/10.3390/eng7040151 - 26 Mar 2026
Viewed by 182
Abstract
Accurate liver and tumor segmentation from abdominal computed tomography (CT) scans is essential for diagnosis and treatment planning; however, centralized deep learning approaches are often constrained by privacy regulations and inter-institution data-sharing limitations. To address these challenges, we propose a skip-free feature-forward collaborative [...] Read more.
Accurate liver and tumor segmentation from abdominal computed tomography (CT) scans is essential for diagnosis and treatment planning; however, centralized deep learning approaches are often constrained by privacy regulations and inter-institution data-sharing limitations. To address these challenges, we propose a skip-free feature-forward collaborative segmentation framework called Feature-Forward Residual U-Net (FF-ResUNet), in which each institution executes the encoder locally and transmits only compact bottleneck representations to a central server. High-resolution encoder features and skip connections remain strictly within institutional boundaries, reducing privacy exposure and communication overhead. The server reconstructs segmentation masks using a multi-scale dilated residual decoder with progressive upsampling and returns lightweight updates for encoder refinement. FF-ResUNet is evaluated on the Liver Tumor Segmentation (LiTS) Challenge dataset, with cross-domain testing on 3D-IRCADb and AMOS-CT to assess robustness under distribution shifts and simulated multi-institution collaboration. On LiTS, the proposed framework achieves a liver Dice score of 0.952 ± 0.015 and a tumor Dice score of 0.737 ± 0.060, with a tumor HD95 of 10.9 ± 4.1 mm. Cross-domain experiments demonstrate stable generalization to unseen datasets, while multi-client simulations show improved performance as the number of participating institutions increases before saturation. Compared with skip-based collaborative U-Net architectures, FF-ResUNet reduces communication payload by 92–98% per training iteration while maintaining competitive segmentation accuracy. These results indicate that FF-ResUNet provides an effective balance between segmentation performance, communication efficiency, and privacy preservation evaluated under simulated multi-institution collaborative settings, supporting practical multi-center clinical deployment in bandwidth- and policy-constrained healthcare environments. Full article
Show Figures

Figure 1

33 pages, 783 KB  
Systematic Review
A Systematic Review of Deep Learning Approaches for Hepatopancreatic Tumor Segmentation
by Razeen Hussain, Muhammad Mohsin, Dadan Khan and Mohammad Zohaib
J. Imaging 2026, 12(4), 147; https://doi.org/10.3390/jimaging12040147 - 26 Mar 2026
Viewed by 276
Abstract
Deep learning has advanced rapidly in medical image segmentation, yet hepatopancreatic tumor delineation remains challenging due to low contrast, small lesion size, organ variability, and limited high-quality annotations. Existing reviews are outdated or overly broad, leaving recent architectural developments, training strategies, and dataset [...] Read more.
Deep learning has advanced rapidly in medical image segmentation, yet hepatopancreatic tumor delineation remains challenging due to low contrast, small lesion size, organ variability, and limited high-quality annotations. Existing reviews are outdated or overly broad, leaving recent architectural developments, training strategies, and dataset limitations insufficiently synthesized. To address this gap, we conducted a PRISMA 2020 systematic literature review of studies published between 2021 and 2026 on deep learning-based liver and pancreatic tumor segmentation. From 2307 records, 84 studies met inclusion criteria. U-Net variants continue to dominate, achieving strong liver segmentation but inconsistent tumor accuracy, while transformer-based and hybrid models improve global context modeling at higher computational cost. Attention mechanisms, boundary-refinement modules, and semi-supervised learning offer incremental gains, yet pancreatic tumor segmentation remains notably difficult. Persistent issues, including domain shift, class imbalance, and limited generalization across datasets, underscore the need for more robust architectures, standardized benchmarks, and clinically oriented evaluation. This review consolidates recent progress and highlights key challenges that must be addressed to advance reliable hepatopancreatic tumor segmentation. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

Back to TopTop