Multiple Differential Convolution and Local-Variation Attention UNet: Nucleus Semantic Segmentation Based on Multiple Differential Convolution and Local-Variation Attention

Sun, Xiaoming; Li, Shilin; Chen, Yongji; Chen, Junxia; Geng, Hao; Sun, Kun; Zhu, Yuemin; Su, Bochao; Zhang, Hu

doi:10.3390/electronics14061058

Open AccessArticle

Multiple Differential Convolution and Local-Variation Attention UNet: Nucleus Semantic Segmentation Based on Multiple Differential Convolution and Local-Variation Attention

by

Xiaoming Sun

¹

,

Shilin Li

¹,

Yongji Chen

¹,

Junxia Chen

¹,

Hao Geng

¹,

Kun Sun

¹,

Yuemin Zhu

²

,

Bochao Su

^3,*

and

Hu Zhang

⁴

¹

Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China

²

CREATIS, UMR 5220, U1294, Inserm, CNRS, University Claude Bernard Lyon 1, INSA Lyon, 69100 Lyon, France

³

Tech X Academy, ShenZhen PolyTechnic University, Shenzhen 518055, China

⁴

Xishi (Xiamen) Technology Co., Ltd., Xiamen 361000, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(6), 1058; https://doi.org/10.3390/electronics14061058

Submission received: 20 January 2025 / Revised: 4 March 2025 / Accepted: 4 March 2025 / Published: 7 March 2025

(This article belongs to the Special Issue Feature Papers in "Computer Science & Engineering", 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Nucleus accurate segmentation is a crucial task in biomedical image analysis. While convolutional neural networks (CNNs) have achieved notable progress in this field, challenges remain due to the complexity and heterogeneity of cell images, especially in overlapping regions of nuclei. To address the limitations of current methods, we propose a mechanism of multiple differential convolution and local-variation attention in CNNs, leading to the so-called multiple differential convolution and local-variation attention U-Net (MDLA-UNet). The multiple differential convolution employs multiple differential operators to capture gradient and direction information, improving the network’s capability to detect edges. The local-variation attention utilizes Haar discrete wavelet transforms for level-1 decomposition to obtain approximate features, and then derives high-frequency features to enhance the global context and local detail variation of the feature maps. The results on the MoNuSeg, TNBC, and CryoNuSeg datasets demonstrated superior segmentation performance of the proposed method for cells having complex boundaries and details with respect to existing methods. The proposed MDLA-UNet presents the ability of capturing fine edges and details in feature maps and thus improves the segmentation of nuclei with blurred boundaries and overlapping regions.

Keywords:

cell nucleus segmentation; U-Net; difference operator; wavelet transform

Graphical Abstract

1. Introduction

Deep learning has significantly boosted advancements in image processing, driven in part by the increasing capabilities of computer hardware, which has also accelerated its application in the biomedical field [1]. Deep learning becomes a vital tool in the recognition and quantification of patterns in medical images [2,3,4,5]. Microscopic medical images are crucial for analyzing cellular features, such as the number, size, morphology, and position of cells or nuclei, which provide key insights into cellular structure and function, and offer a foundation for medical diagnoses and treatment [6,7].

The segmentation of nuclei in cell images, known as nuclear segmentation, faces a persistent challenge in dealing with the segmentation and recognition accuracy of fuzzy boundaries [8]. In contrast to traditional methods, deep learning excels at handling complex nuclear morphologies and texture details, improving both segmentation accuracy and robustness, and offering essential preconditions for reliable nuclear segmentation. Through an end-to-end approach, deep learning demonstrates immense potential in accurately analyzing nuclear information. Its rapid development has had a profound impact on the sophisticated segmentation of medical images acquired using optical microscopes [9,10].

Advances in nuclei image semantic segmentation were initiated with the introduction of fully convolutional networks (FCNs) [11]. By substituting fully connected layers with convolutional layers, FCNs allow pixel-level predictions for images of any size. However, FCNs struggle with capturing global contextual information, which hinders segmentation accuracy. Although some improvements have been made to enhance FCNs’ multi-level contextual understanding, inherent limitations remain.

In comparison, the U-Net architecture proposed by Ronneberger and colleagues [12] significantly extends the FCN model. U-Net employs a U-shaped architecture, where the decoder retains more detailed information during upsampling by incorporating high-resolution feature maps from the encoder. Such approach not only optimizes the balance between localization and context but also substantially enhances segmentation accuracy. U-Net is particularly adept at handling complex tasks like the segmentation of cell images, which often lack clear boundaries and consistent morphology. Its superior segmentation performance is widely recognized, and tools such as the ImageJ plugin (Unet_Segmentation.jar-20181112152803) developed by Falk et al. [13] have made U-Net a powerful tool for non-experts in machine learning to perform cellular analysis. Based on the excellent architecture of U-Net, some researchers have made structural improvements to improve its performance or adapt it to specific tasks [14,15,16,17,18,19].

Additionally, several researchers have combined convolutional neural networks with traditional algorithms for more accurate segmentation. Chen et al. [20] proposed a method that extracts features to predict masks, then refines the boundaries and separates the nuclei. Kowal et al. [21] integrated convolutional neural networks with watershed transformation to segment nuclei in breast cancer cytology images. The method preprocesses the images using color deconvolution to enhance the contrast of hematoxylin-stained nuclei, applies a convolutional neural network to identify nuclear, cytoplasmic, edge, and background regions, and finally uses seed-based watershed segmentation to separate overlapping nuclear clusters. Although these methods can improve the segmentation accuracy to some extent, these methods still rely on specific image processing techniques.

Attention mechanisms have become commonly applied in many computer vision tasks, to effectively guide the network to focus on key regions. Attention mechanisms were then integrated into U-Net, thus significantly enhancing segmentation performance [22]. Zeng et al. [23] proposed the RIC-Unet, which combines residual blocks, multi-scale processing, and channel attention mechanisms to enhance kernel segmentation accuracy. Dogar et al. [24] combined spatial and channel attention mechanisms to enhance the model’s learning ability, and segmented the nucleus by watershed transformation. Ali et al. [25] proposed the MSAL-Net with dense expanding convolutional blocks and a decoder integrated with channel attention and boundary optimization, thereby better learning spatial details and accurately predicting and further refining the boundaries of nuclear units. Wang et al. [26] proposed UDTransNet, which combines the advantages of the Dual Attention Transformer module and Decoder-guided Recalibration Attention module, solving the semantic gap between different level features and improving skip connections in the current U-shaped segmentation model. Ghosh et al. [27] proposed Morph-UNet, which designed three multi-scale morphological modules and integrated them into the UNet architecture to solve the problems of the irregular shape and varying size of the regions of interest in medical images. Tan et al. [28] proposed FSCA-Net, which uses the Parallel Attention Transformer to enhance feature extraction in skip connections, the Cross-Attention Bridge Layer to compensate for down-sampling loss, and the Dual-Path Channel Attention module to guide feature filtering, addressing the inefficiency in capturing spatial and channel information.

In this paper, we propose a multiple differential convolution and local-variation attention U-Net (MDLA-UNet) method. By embedding the difference operator and wavelet transform from traditional image processing algorithms into the U-Net backbone, our approach improves the model’s capability to capture directional and local-variation information, significantly improving its performance across nuclear segmentation tasks; its accuracy is better than other comparison models. The primary contribution of our work is the creation of a highly robust image segmentation framework that makes progress in capturing fine-grained structural details in complex images.

This paper is structured as follows: Section 2 outlines the proposed MDLA-UNet framework and its technical components. The experiments are presented in Section 3, followed by a detailed analysis of the results in Section 4. Finally, Section 5 wraps up the paper and explores future directions for improving the proposed method.

2. Materials and Methods

2.1. Network Structure

The proposed MDLA-UNet model is built upon the U-Net architecture, in which we designed and incorporated a multiple differential convolution module (MDC) and a local-variation attention block (LVA) in convolutional blocks of both the encoder and decoder. The MDC block leverages differential convolutions in multiple directions to extract fine-grained directional feature information, enabling the model to more precisely capture and restore local details and edge structures. Meanwhile, the LVA block aims to focus on the extraction and enhancement of features from various frequency components, making it more effective at capturing crucial local-variation information during the segmentation process. Figure 1 depicts the network architecture of the MDLA-UNet model.

2.2. Multiple Differential Convolution Block (MDC)

The MDC block calculates image gradients using differential operations convolutions to capture edge information in different directions, thereby improving the accuracy of nuclear segmentation. This is because the difference operator emphasizes the changes and edges in the image, while the convolution provides global context information, which can better capture the local changes and edge information in the image, and then improve the overall segmentation performance of the model. It increases the model’s sensitivity to local variations, enabling the extraction of more discriminative features. With a focus on high-frequency information, the model allows sharper nuclear contours to be preserved.

The MDC block consists of six parallel branches (Figure 2): (a) four differential convolution layers with a central differential convolution (CD) layer (CDC), a horizontal differential convolution (HD) layer (HDC), a vertical differential convolution (VD) layer (VDC), and a diagonal differential convolution (DD) layer (DDC); (b) one vanilla convolution layer (VC); (c) one max-pooling layer. Such design allows the network to effectively capture local differences in the image, enhancing feature extraction. The four differential convolution kernels extract gradient and edge information from different directions. These convolution layers work synergistically to improve the model’s feature representation by learning multi-directional gradient information. The vanilla convolution layer extracts intensity information from the image, complementing the differential convolution kernels to enhance the model’s overall feature representation. Max pooling retains the most significant information from local feature maps, assisting in the extraction of global features. After the convolution operations, batch normalization and the ReLU activation function are utilized. Batch normalization reduces internal covariate shifts, stabilizing training and accelerating model convergence, whereas the ReLU activation function improves the network’s capacity to capture complex features by introducing nonlinearity. Finally, the outputs from all convolutional and pooling layers are concatenated, the residual connection is introduced [29], and the number of channels is restored.

Specifically, CDC uses a central differential operator (Laplace operator) to emphasize differences between the central pixel and surrounding pixels, whereas HDC, VDC, and DDC use Sobel operators to capture horizontal, vertical, and diagonal edge information, respectively. These differential operators capture multi-dimensional edge and gradient information, refining the object’s boundary features. The configuration of the four operators is as follows:

C D = (\begin{matrix} 0 & 1 & 0 \\ 1 & - 4 & 1 \\ 0 & 1 & 0 \end{matrix}) H D = (\begin{matrix} 1 & 0 & - 1 \\ 2 & 0 & - 2 \\ 1 & 0 & - 1 \end{matrix}) V D = (\begin{matrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ - 1 & - 2 & - 1 \end{matrix}) D D = (\begin{matrix} 1 & 1 & 0 \\ 1 & 0 & - 1 \\ 0 & - 1 & - 1 \end{matrix})

(1)

In the implementation of the MDC block, differential operators are explicitly integrated into the convolutional layer structure. This is performed by multiplying the convolution kernel with the differential operator; the result is then directly applied to the input features. Taking horizontal differential convolution as an example (Figure 3), the operation preserves the structure of traditional convolution kernels while capturing directional features. The process is further enhanced by learning weights, which optimize feature extraction.

2.3. Local-Variation Attention Block (LVA)

Haar Discrete Wavelet Transform (Haar DWT) is a multiresolution and orthogonal technique, commonly used in image decomposition [30,31]. Its main advantage in image processing is efficient and non-redundant decomposition. As illustrated in Figure 4, Haar wavelets extract both low- and high-frequency features from images. We choose to use Haar DWT to construct the attention mechanism mainly due to its high efficiency in frequency feature extraction. We are able to obtain information about global features and local changes, further enhancing the model’s focus on important features.

The feature extraction and fusion block LVA is based on wavelet transforms and convolutional neural networks, designed to enhance image representation by leveraging local-variation features. It incorporates an attention mechanism in the local variation, with a focus on extracting and amplifying features from different frequency components. By leveraging Haar DWT, the block can extract frequency features and use them to adjust the weights of the input features, allowing the model to focus on both global context and local detail variation. This local-variation attention mechanism aims at capturing and enhancing crucial local-variation information during the nuclear segmentation process, thereby assisting the MDC block in boosting both accuracy and robustness of the segmentation. As illustrated in Figure 5, the block combines wavelet transforms, feature fusion, attention mechanisms, and the Convolutional Block Attention Module (CBAM) [32] to enhance features across various frequency components and apply adaptive feature weighting.

In the present study, we retained only the low-frequency approximation, and after restoring the image size using transposed convolution, we subtracted low-frequency features from the input feature map to obtain high-frequency features. That helped reduce computational complexity while ensuring that critical features are preserved. The adjustment of the feature weights is performed through two scale coefficients that are generated by using them in conjunction with a Sigmoid activation function. Through these scale coefficients, the importance is assigned to global features and local detail features. These weighted feature maps are then added and processed by a convolutional layer for feature fusion, ensuring that local-variation information is fully exploited in both spatial and channel dimensions. We also incorporated a convolutional attention module, which generates attention maps to adaptively adjust the importance of fused features across spatial locations, allowing key regions of the input feature map to be focused, which improves feature discrimination. Building on this, an integrated CBAM module was employed, applying attention weighting across both spatial and channel dimensions to enhance the focus on critical information. To prevent information loss during feature extraction, a residual connection mechanism was used to add the original feature map directly to the fused feature map. Such operation preserves the original feature map’s information while strengthening feature representation, aiding in mitigating the vanishing gradient.

3. Experiment

3.1. Datasets and Preprocessing

This study utilized three publicly available pathology datasets, MoNuSeg [33,34], TNBC [35], and CryoNuSeg [36]. The MoNuSeg dataset includes H&E-stained tissue images from tumor patients across various organs and hospitals, with finely annotated nuclei. Due to the diversity of its images and differences in staining protocols, this dataset significantly contributes to the development of robust and widely applicable nuclear segmentation techniques. The TNBC dataset focuses on breast cancer tissues and includes annotations for the nuclei of various cell types, such as epithelial breast cells and invasive cancer cells, which have been marked and reviewed by multiple experts. CryoNuSeg is a dataset of cryosectioned and H&E-stained nuclei. It includes images from 10 human organs that have not been used in other publicly available datasets.

The MoNuSeg dataset comprises a training set of 37 images of size 1000 × 1000, along with a test set consisting of 14 images with the same resolution. Given the resolution, directly feeding the images into the model would introduce computational challenges. To address this, we cropped the images into smaller 256 × 256 patches. We only split the training set into training and validation sets in a 4:1 ratio for performance evaluation. The TNBC dataset contains 50 images, each with dimensions of 512 × 512 pixels. These images were similarly cropped into 256 × 256 patches. Since this dataset does not have an official test set, we split the data into training, validation, and test sets with a ratio of 6:2:2 to ensure scientific rigor and reliable model evaluation. The CryoNuSeg dataset contains 30 images, each with dimensions of 512 × 512 pixels. The treatment and partitioning of the dataset was the same as for the TNBC dataset. During data preprocessing, we applied standardization and normalization to both datasets to reduce scale differences between features, speeding up model convergence and enhancing prediction accuracy. To ensure that only the models are different when comparing experiments, we do not perform random data enhancement on the datasets to exclude interference factors.

3.2. Evaluation Metrics

For assessing the segmentation efficacy of the proposed framework, and validate its adaptability, we adopted four widely adopted evaluation metrics: Accuracy (ACC), Sensitivity (SE), Jaccard Similarity (JS), and Dice Coefficient (DC). ACC measures the proportion of pixels in which the prediction matches the true value, indicating the overall correctness of the segmentation. SE measures the model’s capacity to precisely identify and segment positive areas, which is equivalent to commensurate with the proportion of pixels that are precisely segmented as integral parts of the object among all the pixels in the true area. JS assesses the similarity between the predicted results and the ground truth by calculating the fraction of the intersection and union, reflecting the extent of overlap between the two. DC reflects the degree of congruence between the predicted result and the truth, and is a frequently employed metric for assessing the result of image segmentation. They are computed as follows:

A C C = \frac{T P + T N}{T P + T N + F P + F N}

(2)

S E = \frac{T P}{T P + F N}

(3)

J S = \frac{T P}{T P + F P + F N}

(4)

D C = \frac{2 \times T P}{2 \times T P + F P + F N}

(5)

where

T P

denotes the true positives,

T N

the true negatives,

F P

the false positives, and

F N

the false negatives.

All metrics were computed based on a fixed threshold of 0.5. Specifically, in the probability map generated by the model, any value greater than 0.5 is classified as positive, whereas values below 0.5 are classified as negative. This binarization simplifies the comparison between the model’s predicted outcomes and the ground truth, allowing for the straightforward calculation of evaluation metrics.

3.3. Loss Function

This study used the Binary Cross Entropy loss function to calculate the error of the model’s predicted outcomes and the truth. This loss function is commonly used in binary classification tasks. It guides the model to produce outputs that align more closely with the truth. The loss formula is defined by

L_{B C E} (S P, G T) = - (G T \log (S P) + (1 - G T) \log (1 - S P)

(6)

where

S P

denotes the probability after applying the Sigmoid function, and

G T

denotes the ground truth. Minimizing the binary cross-entropy loss enables the model to make more precise segmentation predictions.

3.4. Training Strategy and Parameter Settings

The experiments were carried out on an Ubuntu 18.04 system using the PyTorch 2.0.0+cu118 deep learning framework. The code was implemented in Python 3.8.10 and developed in the Visual Studio Code IDE. The system used 90 GB of RAM, a 12-core Intel^® Xeon^® Platinum 8352V processor at 2.10 GHz, and an NVIDIA RTX 4090 GPU featuring 24 GB of VRAM.

For training, we used the Adam optimizer with momentum of 0.9. The Adam optimizer adjusts learning rates for each parameter, speeding up model convergence and improving training results. To address instability in early training stages, we implemented the cosine annealing learning strategy, dynamically adjusting the learning rate from a maximum of 0.0001 to a minimum of 0.000001. The epoch and batch size were configured as 100 and 4.

3.5. Detection Performance Comparative Experiment

To assess the performance of MDLA-UNet in microscopy image segmentation, we conducted comparisons with models such as U-Net, R2U-Net, Attention U-Net, UNet++, Swin-Unet, Morph-UNet-EfficinetNetB4, UDTransNet, and FSCA-Net on the MoNuSeg, TNBC, and CryoNuSeg datasets. The comparative experimental results are displayed in Figure 6, where we selected two representative images from each dataset, covering challenging scenarios such as blurred boundaries, overlapping nuclei, and low image contrast. The areas highlighted in red illustrate MDLA-UNet’s superior performance in nucleus segmentation.

Table 1 provides more quantitative results, showing a notable overall performance improvement with the proposed MDLA-UNet method. MDLA-UNet outperformed all other models, achieving the highest DC. R2U-Net showed low performance, whereas enhanced models such as Attention U-Net and UNet++ yielded some improvement. Although newer models like Swin-Unet, UDTransNet, Morph-UNet-EfficinetNetB4, and FSCA-Net have been proposed in recent years, they present large improvements on the MoNuSeg dataset, but do not perform well on the TNBC and CryoNuSeg datasets. In addition, the three models, Swin-Unet, UDTransNet and morphe-unet-efficinetnetb4, have been trained with or without loading pre-training weights. According to the data, this approach does not work on all model architectures or datasets. But our MDLA-UNet performed best on three datasets. Specifically, on the MoNuSeg dataset, ACC reached 92.38, SE reached 85.17, JS reached 68.72, and DC reached 81.36; on the TNBC dataset, ACC reached 95.92, SE reached 84.43, JS reached 71.73, and DC reached 83.46; on the CryoNuSeg dataset, ACC reached 90.75, SE reached 84.52, JS reached 68.00, and DC reached 80.73. Compared to other models, MDLA-UNet stands out in both multi-organ nuclei and single-organ cell nuclei, showing better generalization ability and higher segmentation accuracy.

Table 2 presents the total parameters and test speed for the comparison model. The MDLA-UNet model has a greater parameter count, and its test time is slightly longer. From the horizontal comparison of the existing improved models, the parameter scale and running speed of the MDLA-UNet model are still within a reasonable range, neither the largest nor the slowest, but achieving the best result.

The ablation experimental findings are displayed in Figure 7, which shows the impact of removing individual modules from the MDLA-UNet model. The figure further illustrates that the MDC module, in the absence of the LVA module, results in over-segmentation. The inclusion of the LVA module effectively mitigates this issue, aiding in its correction.

Table 3 provides more quantitative results; ticks in the table signify the presence of corresponding components. The complete MDLA-UNet demonstrates the highest performance, validating the effective combination of the MDC and the LVA blocks. On the MoNuSeg, TNBC, and CryoNuSeg datasets, when the MDC block was removed, the DC dropped by 1.27, 1.61, and 1.52, respectively. Although the LVA block continues to extract features related to local variations, the overall performance on both datasets drops, underscoring the crucial role of the MDC block in capturing gradient and edge information. Similarly, when the LVA block was removed, the DC dropped by 1.69, 0.26, and 0.07, respectively. The block’s capacity to handle features related to local variations diminishes, resulting in fewer extracted edge features in overlapping regions and a slight drop in segmentation accuracy. This demonstrates the unique contribution of the LVA block in enhancing local variation processing. When both blocks were removed, the model reverted to standard U-Net, resulting in DC drops of 5.26, 2.35, and 1.62, respectively. U-Net has relative difficulty accurately segmenting regions with complex nuclear boundaries and overlapping regions, which MDLA-UNet is able to do correctly. Ablation experiments demonstrate the effectiveness of the integration of MDC and LVA blocks, confirming their significant contribution to enhancing segmentation performance and improving the robustness of the model.

4. Discussion

MDLA-UNet demonstrates exceptional capability in regions with blurred nuclear boundaries and severe overlap, significantly reducing the misidentification of non-nuclear regions as nuclei and boundary segmentation errors. This performance is due to the introduction of MDC block kernels and the local-variation feature fusion block, which allow the model to capture subtle details and edge information, thus enhancing segmentation accuracy and robustness. The MDC block integrates multiple traditional differential operators into convolutional blocks to optimize the U-Net architecture. This improves the model’s capability to extract features from different directions, thereby enhancing the accuracy of detecting complex boundaries and fine details. The LVA block uses Haar DWT to decompose global features and local-variation features, fusing them and feeding them into the CBAM block. The model adaptively adjusts the importance of both high and low frequencies, and both global and local features, thereby improving its robustness and reliability when dealing with overlapping regions of nuclei. MDLA-UNet can help pathologists automatically segment nuclear regions in microscope images and quickly obtain critical information. By setting and optimizing the evaluation indicators, and then incorporating the feedback of clinicians to further optimize the performance of the model, its usability and interpretability in real-world settings can be improved, and the quality of clinical decision-making can ultimately be improved [37]. The results of this automation are combined with the judgment of the pathologist, so as to improve the efficiency of diagnosis and assist early diagnosis in diseases such as cancer.

This study tackled the problem of imprecise segmentation in existing microscopy image segmentation models. Compared to other models, MDLA-UNet shows outstanding performance in nuclear image segmentation, highlighting its promising applications in biomedical image analysis and precision medicine. Experimental findings demonstrated that the DC values for the MDLA-UNet on the MoNuSeg, TNBC, and CryoNuSeg datasets are 81.36, 83.46, and 80.73, which are 1.86, 1.13, and 0.32 higher than the suboptimal model. Although this method has excellent performance in complex kernel region segmentation, and the number of parameters is relatively large because the MDC block uses multiple convolution kernels, it will continue to be improved in the future.

5. Conclusions and Future Work

The proposed MDLA-UNet method significantly enhanced segmentation performance, particularly for complex nuclear boundaries and overlapping regions. By introducing MDC and LVA blocks, MDLA-UNet is able to capture details and edge information more precisely, significantly improving segmentation performance. The findings indicate that MDLA-UNet surpasses the existing SOTA models on the MoNuSeg, TNBC, and CryoNuSeg datasets, demonstrating excellent performance in nuclear segmentation, which suggests its broad potential for practical applications.

Future work will concentrate on refining training strategies by integrating prior knowledge and advanced techniques to enhance robustness. We intend to reduce model parameters or design more efficient architectures for better computational efficiency, making the model suitable for real-time applications and deployment in resource-constrained environments. Additionally, morphological post-processing will be applied to refine predicted masks and achieve more accurate segmentation.

Author Contributions

Conceptualization, X.S.; Data curation, S.L.; Formal analysis, S.L.; Funding acquisition, B.S.; Investigation, H.G.; Methodology, S.L. and Y.C.; Project administration, X.S.; Resources, X.S.; Software, S.L. and H.Z.; Supervision, X.S.; Validation, Y.C., J.C. and H.G.; Visualization, J.C.; Writing—original draft, S.L.; Writing—review and editing, X.S., K.S., Y.Z. and B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Scientific Research Startup Fund for Shenzhen High-Caliber Personnel of SZPT, No.6023330002K. Supported by General Higher Education Project of Guangdong Provincial Education Department, No.2023KCXTD077. Supported by Guangdong Provincial General University Innovation Team Project, No.2020KCXTD047. Supported by College Start-up Fund of ShenZhen PolyTechnic University, No.6022312031K.

Data Availability Statement

The original data presented in this study are openly available in the MoNuSeg dataset at [https://monuseg.grand-challenge.org/Data/, accessed on 1 July 2024] and the TNBC dataset at [https://zenodo.org/record/1175282#.YMisCTZKgow, accessed on 1 July 2024].

Conflicts of Interest

Author Hu Zhang was employed by the company Xishi (Xiamen) Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Gheisari, M.; Ebrahimzadeh, F.; Rahimi, M.; Moazzamigodarzi, M.; Liu, Y.; Pramanik, P.K.D.; Heravi, M.A.; Mehbodniya, A.; Ghaderzadeh, M.; Feylizadeh, M.R.; et al. Deep Learning: Applications, Architectures, Models, Tools, and Frameworks: A Comprehensive Survey. CAAI Trans. Intell. Technol. 2023, 8, 581–606. [Google Scholar] [CrossRef]
Najjar, R. Redefining Radiology: A Review of Artificial Intelligence Integration in Medical Imaging. Diagnostics 2023, 13, 2760. [Google Scholar] [CrossRef] [PubMed]
Koetzier, L.R.; Mastrodicasa, D.; Szczykutowicz, T.P.; van der Werf, N.R.; Wang, A.S.; Sandfort, V.; van der Molen, A.J.; Fleischmann, D.; Willemink, M.J. Deep Learning Image Reconstruction for CT: Technical Principles and Clinical Prospects. Radiology 2023, 306, e221257. [Google Scholar] [CrossRef]
Chakrabarty, N.; Mahajan, A. Imaging Analytics Using Arti Fi Cial Intelligence in Oncology: A Comprehensive Review. Clin. Oncol. 2024, 36, 498–513. [Google Scholar] [CrossRef]
Gadermayr, M.; Tschuchnig, M. Multiple Instance Learning for Digital Pathology: A Review of the State-of-the-Art, Limitations & Future Potential. Comput. Med. Imaging Graph. 2024, 112, 102337. [Google Scholar] [CrossRef]
Wang, N.; Zhang, C.; Wei, X.; Yan, T.; Zhou, W.; Zhang, J.; Kang, H.; Yuan, Z.; Chen, X. Harnessing the Power of Optical Microscopy for Visualization and Analysis of Histopathological Images. Biomed. Opt. Express 2023, 14, 5451–5465. [Google Scholar] [CrossRef]
Xing, F.; Xie, Y.; Su, H.; Liu, F.; Yang, L. Deep Learning in Microscopy Image Analysis: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 4550–4568. [Google Scholar] [CrossRef]
Basu, A.; Senapati, P.; Deb, M.; Rai, R.; Dhal, K.G. A Survey on Recent Trends in Deep Learning for Nucleus Segmentation from Histopathology Images. Evol. Syst. 2024, 15, 203–248. [Google Scholar] [CrossRef] [PubMed]
Zinchuk, V.; Grossenbacher-Zinchuk, O. Machine Learning for Analysis of Microscopy Images: A Practical Guide and Latest Trends. Curr. Protoc. 2023, 3, e819. [Google Scholar] [CrossRef]
Melanthota, S.K.; Gopal, D.; Chakrabarti, S.; Kashyap, A.A.; Radhakrishnan, R.; Mazumder, N. Deep Learning-Based Image Processing in Optical Microscopy. Biophys. Rev. 2022, 14, 463–481. [Google Scholar] [CrossRef]
Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Falk, T.; Mai, D.; Bensch, R.; Cicek, O.; Abdulkadir, A.; Marrakchi, Y.; Boehm, A.; Deubner, J.; Jaeckel, Z.; Seiwald, K.; et al. U-Net: Deep Learning for Cell Counting, Detection, and Morphometry. Nat. Methods 2019, 16, 67–70. [Google Scholar] [CrossRef] [PubMed]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Stoyanov, D., Taylor, Z., Carneiro, G., Syeda-Mahmood, T., Martel, A., Maier-Hein, L., Tavares, J.M.R.S., Bradley, A., Papa, J.P., Belagiannis, V., et al., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–11. [Google Scholar]
Long, F. Microscopy Cell Nuclei Segmentation with Enhanced U-Net. BMC Bioinf. 2020, 21, 8. [Google Scholar] [CrossRef] [PubMed]
Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.-W.; Wu, J. UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1055–1059. [Google Scholar] [CrossRef]
Alom, M.Z.; Yakopcic, C.; Hasan, M.; Taha, T.M.; Asari, V.K. Recurrent Residual U-Net for Medical Image Segmentation. J. Med. Imaging 2019, 6, 014006. [Google Scholar] [CrossRef]
Jafari, M.; Auer, D.; Francis, S.; Garibaldi, J.; Chen, X. DRU-Net: An Efficient Deep Convolutional Neural Network for Medical Image Segmentation. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 1144–1148. [Google Scholar] [CrossRef]
Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In Proceedings of the Computer Vision—ECCV 2022 Workshops; Karlinsky, L., Michaeli, T., Nishino, K., Eds.; Springer Nature: Cham, Switzerland, 2023; pp. 205–218. [Google Scholar] [CrossRef]
Chen, K.; Zhang, N.; Powers, L.; Roveda, J. Cell Nuclei Detection and Segmentation for Computational Pathology Using Deep Learning. In Proceedings of the 2019 Spring Simulation Conference (SpringSim), Tucson, AZ, USA, 29 April–2 May 2019; pp. 1–6. [Google Scholar] [CrossRef]
Kowal, M.; Żejmo, M.; Skobel, M.; Korbicz, J.; Monczak, R. Cell Nuclei Segmentation in Cytological Images Using Convolutional Neural Network and Seeded Watershed Algorithm. J. Digit. Imaging 2020, 33, 231–242. [Google Scholar] [CrossRef]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.C.H.; Heinrich, M.P.; Misawa, K.; Mori, K.; McDonagh, S.G.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
Zeng, Z.; Xie, W.; Zhang, Y.; Lu, Y. RIC-Unet: An Improved Neural Network Based on Unet for Nuclei Segmentation in Histology Images. IEEE Access 2019, 7, 21420–21428. [Google Scholar] [CrossRef]
Dogar, G.M.; Fraz, M.M.; Javed, S. Feature Attention Network for Simultaneous Nuclei Instance Segmentation and Classification in Histology Images. In Proceedings of the 2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2), Islamabad, Pakistan, 20–21 May 2021; pp. 1–6. [Google Scholar] [CrossRef]
Ali, H.; ul Haq, I.; Cui, L.; Feng, J. MSAL-Net: Improve Accurate Segmentation of Nuclei in Histopathology Images by Multiscale Attention Learning Network. BMC Med. Inf. Decis. Making 2022, 22, 90. [Google Scholar] [CrossRef]
Ghosh, S.; Das, S. Multi-Scale Morphology-Aided Deep Medical Image Segmentation. Eng. Appl. Artif. Intell. 2024, 137, 109047. [Google Scholar] [CrossRef]
Wang, H.; Cao, P.; Yang, J.; Zaiane, O. Narrowing the Semantic Gaps in U-Net with Learnable Skip Connections: The Case of Medical Image Segmentation. Neural Netw. 2024, 178, 106546. [Google Scholar] [CrossRef]
Tan, D.; Hao, R.; Zhou, X.; Xia, J.; Su, Y.; Zheng, C. A Novel Skip-Connection Strategy by Fusing Spatial and Channel Wise Features for Multi-Region Medical Image Segmentation. IEEE J. Biomed. Health Inf. 2024, 28, 5396–5409. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Zhang, D. Wavelet Transform. In Fundamentals of Image Data Mining. TCS; Springer: Cham, Switzerland, 2019; pp. 35–44. [Google Scholar] [CrossRef]
Xu, G.; Liao, W.; Zhang, X.; Li, C.; He, X.; Wu, X. Haar Wavelet Downsampling: A Simple but Effective Downsampling Module for Semantic Segmentation. Pattern Recognit. 2023, 143, 109819. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar] [CrossRef]
Kumar, N.; Verma, R.; Anand, D.; Zhou, Y.; Onder, O.F.; Tsougenis, E.; Chen, H.; Heng, P.-A.; Li, J.; Hu, Z.; et al. A Multi-Organ Nucleus Segmentation Challenge. IEEE Trans. Med. Imaging 2020, 39, 1380–1391. [Google Scholar] [CrossRef] [PubMed]
Kumar, N.; Verma, R.; Sharma, S.; Bhargava, S.; Vahadane, A.; Sethi, A. A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology. IEEE Trans. Med. Imaging 2017, 36, 1550–1560. [Google Scholar] [CrossRef]
Naylor, P.; Lae, M.; Reyal, F.; Walter, T. Segmentation of Nuclei in Histopathology Images by Deep Regression of the Distance Map. IEEE Trans. Med. Imaging 2019, 38, 448–459. [Google Scholar] [CrossRef]
Mahbod, A.; Schaefer, G.; Bancher, B.; Löw, C.; Dorffner, G.; Ecker, R.; Ellinger, I. CryoNuSeg: A Dataset for Nuclei Instance Segmentation of Cryosectioned H&E-Stained Histological Images. Comput. Biol. Med. 2021, 132, 104349. [Google Scholar] [CrossRef]
Lin, T.-L.; Lu, C.-T.; Karmakar, R.; Nampalley, K.; Mukundan, A.; Hsiao, Y.-P.; Hsieh, S.-C.; Wang, H.-C. Assessing the Efficacy of the Spectrum-Aided Vision Enhancer (SAVE) to Detect Acral Lentiginous Melanoma, Melanoma In Situ, Nodular Melanoma, and Superficial Spreading Melanoma. Diagnostics 2024, 14, 1672. [Google Scholar] [CrossRef]

Figure 1. The network architecture of the MDLA-UNet.

Figure 2. The multiple differential convolution block.

Figure 3. HD operators for horizontal differential convolution blocks.

Figure 4. Illustration of Haar DWT.

Figure 5. The local-variation attention block.

Figure 6. The result of the comparison experiments with state-of-the-art models conducted on the MoNuSeg, TNBC, and CryoNuSeg datasets.

Figure 7. The findings from the ablation experiments conducted on the MoNuSeg, TNBC, and CryoNuSeg datasets.

Table 1. Analysis of the findings from the comparison experiments with SOTA models conducted on the MoNuSeg, TNBC, and CryoNuSeg datasets.

Datasets	Networks		ACC (%)	SE (%)	JS (%)	DC (%)
MoNuSeg	U-Net [12]		89.67	86.15	62.29	76.10
	R2U-Net [18]		89.04	84.02	60.57	75.27
	Attention U-Net [23]		91.40	79.59	64.02	77.72
	UNet++ [14]		90.91	84.99	64.62	77.98
	Swin-Unet [20]	w/o pretrain	91.05	80.23	63.66	77.57
	Swin-Unet [20]	w/ pretrain	90.36	82.68	62.46	76.44
	Morph-UNet- EfficinetNetB4 [27]	w/o pretrain	91.29	79.35	64.00	77.92
	Morph-UNet- EfficinetNetB4 [27]	w/ pretrain	89.67	75.33	58.85	73.57
	UDTransNet [28]	w/o pretrain	91.28	80.52	64.34	78.19
	UDTransNet [28]	w/ pretrain	91.86	81.04	66.01	79.41
	FSCA-Net [29]		91.84	83.41	66.34	79.50
	MDLA-UNet (Ours)		92.38	85.17	68.72	81.36
TNBC	U-Net		95.41	81.50	68.34	81.11
	R2U-Net		92.16	55.82	45.89	62.37
	Attention U-Net		95.29	83.66	67.93	80.84
	UNet++		95.58	85.39	70.07	82.33
	Swin-Unet	w/o pretrain	94.07	79.68	61.97	76.39
	Swin-Unet	w/ pretrain	94.78	81.34	65.19	78.83
	Morph-UNet- EfficinetNetB4	w/o pretrain	94.50	77.78	58.83	74.02
	Morph-UNet- EfficinetNetB4	w/ pretrain	91.50	72.35	51.77	67.89
	UDTransNet	w/o pretrain	93.43	78.47	62.88	77.14
	UDTransNet	w/ pretrain	95.47	81.60	68.21	81.04
	FSCA-Net		95.55	81.24	68.81	81.40
	MDLA-UNet (Ours)		95.92	84.43	71.73	83.46
CryoNuSeg	U-Net		90.15	81.13	65.71	79.11
	R2U-Net		87.49	75.32	58.62	73.61
	Attention U-Net		89.83	83.93	65.82	79.15
	UNet++		90.42	84.36	67.18	80.17
	Swin-Unet	w/o pretrain	89.54	83.01	64.83	78.42
	Swin-Unet	w/ pretrain	90.63	83.00	67.43	80.41
	Morph-UNet- EfficinetNetB4	w/o pretrain	87.82	78.38	60.05	74.86
	Morph-UNet- EfficinetNetB4	w/ pretrain	85.14	75.95	54.60	70.49
	UDTransNet	w/o pretrain	88.35	76.49	60.61	75.32
	UDTransNet	w/ pretrain	89.71	78.47	64.39	78.25
	FSCA-Net		90.42	82.50	66.88	79.94
	MDLA-UNet (Ours)		90.75	84.52	68.00	80.73

Table 2. Comparison of model complexity and test speed.

Networks	Params (M)	Test Speed (img/s)
U-Net	8.64	1115
R2U-Net	9.78	454
Attention U-Net	8.73	846
UNet++	10.20	715
Swin-Unet	41.34	193
Morph-UNet-EfficinetNetB4	0.42	237
UDTransNet	33.80	200
FSCA-Net	43.36	178
MDLA-UNet (Ours)	39.08	181

Table 3. Analysis result of ablation experiments conducted on the MoNuSeg, TNBC, and CryoNuSeg datasets.

Datasets	Network			ACC (%)	SE (%)	JS (%)	DC (%)
Datasets	U-Net	MDC	LVA	ACC (%)	SE (%)	JS (%)	DC (%)
MoNuSeg	√			89.67	86.15	62.29	76.10
	√	√		91.71	84.96	66.42	79.67
	√		√	91.80	84.73	66.99	80.09
	√	√	√	92.38	85.17	68.72	81.36
TNBC	√			95.41	81.50	68.34	81.11
	√	√		95.94	83.50	71.36	83.20
	√		√	95.55	83.06	69.36	81.85
	√	√	√	95.92	84.43	71.73	83.46
CryoNuSeg	√			90.15	81.13	65.71	79.11
	√	√		90.74	83.61	67.87	80.66
	√		√	89.77	84.32	65.85	79.21
	√	√	√	90.75	84.52	68.00	80.73

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, X.; Li, S.; Chen, Y.; Chen, J.; Geng, H.; Sun, K.; Zhu, Y.; Su, B.; Zhang, H. Multiple Differential Convolution and Local-Variation Attention UNet: Nucleus Semantic Segmentation Based on Multiple Differential Convolution and Local-Variation Attention. Electronics 2025, 14, 1058. https://doi.org/10.3390/electronics14061058

AMA Style

Sun X, Li S, Chen Y, Chen J, Geng H, Sun K, Zhu Y, Su B, Zhang H. Multiple Differential Convolution and Local-Variation Attention UNet: Nucleus Semantic Segmentation Based on Multiple Differential Convolution and Local-Variation Attention. Electronics. 2025; 14(6):1058. https://doi.org/10.3390/electronics14061058

Chicago/Turabian Style

Sun, Xiaoming, Shilin Li, Yongji Chen, Junxia Chen, Hao Geng, Kun Sun, Yuemin Zhu, Bochao Su, and Hu Zhang. 2025. "Multiple Differential Convolution and Local-Variation Attention UNet: Nucleus Semantic Segmentation Based on Multiple Differential Convolution and Local-Variation Attention" Electronics 14, no. 6: 1058. https://doi.org/10.3390/electronics14061058

APA Style

Sun, X., Li, S., Chen, Y., Chen, J., Geng, H., Sun, K., Zhu, Y., Su, B., & Zhang, H. (2025). Multiple Differential Convolution and Local-Variation Attention UNet: Nucleus Semantic Segmentation Based on Multiple Differential Convolution and Local-Variation Attention. Electronics, 14(6), 1058. https://doi.org/10.3390/electronics14061058

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multiple Differential Convolution and Local-Variation Attention UNet: Nucleus Semantic Segmentation Based on Multiple Differential Convolution and Local-Variation Attention

Abstract

1. Introduction

2. Materials and Methods

2.1. Network Structure

2.2. Multiple Differential Convolution Block (MDC)

2.3. Local-Variation Attention Block (LVA)

3. Experiment

3.1. Datasets and Preprocessing

3.2. Evaluation Metrics

3.3. Loss Function

3.4. Training Strategy and Parameter Settings

3.5. Detection Performance Comparative Experiment

4. Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI