Abstract
Background/Objectives: Accurate nuclear segmentation is a fundamental step in computational pathology, enabling reliable estimation of cellularity and nuclear morphology. However, segmentation models are typically evaluated under ideal imaging conditions, while real-world microscopy data are affected by staining variability, noise, and image degradation. This study aims to comparatively evaluate three representative convolutional architectures for nuclei segmentation, with emphasis on robustness and clinical relevance under perturbed imaging conditions. Methods: U-Net, Attention U-Net, and U-Net++ were trained and evaluated on the BBBC038 nuclei microscopy dataset using fixed train–validation–test splits. Robustness was assessed under three types of synthetic perturbations: Gaussian blur, additive noise, and color jitter. Segmentation performance was quantified using the Dice coefficient and Intersection-over-Union (IoU). Paired Wilcoxon signed-rank tests with Holm correction and Cliff’s delta were used for statistical comparison. In addition, clinically relevant nuclear descriptors—nuclear count, median nuclear area, area interquartile range (IQR), and nuclear density—were extracted from predicted masks, and descriptor stability was analyzed as relative deviation from clean conditions. Results: Under clean imaging conditions, Attention U-Net achieved the highest mean Dice score, while paired statistical analysis indicated that U-Net++ exhibited the most consistent performance across test samples. Under image perturbations, Attention U-Net demonstrated greater robustness to blur and noise, whereas U-Net++ showed superior stability under color variations. Descriptor-based analysis further indicated that U-Net++ preserved nuclear count and density most reliably under chromatic perturbations, while U-Net exhibited larger instability in nuclear count and density, particularly under noise. Conclusions: Architectural design choices strongly influence not only pixel-level segmentation accuracy but also the stability of clinically relevant nuclear morphology descriptors. Robustness evaluation under multiple perturbation types reveals important trade-offs between architectures that are not captured by clean-image benchmarks alone. These findings highlight the necessity of multi-level evaluation strategies combining overlap metrics, statistical testing, robustness analysis, and descriptor stability assessment for future benchmarking and clinically reliable deployment of nuclei segmentation systems.