1. Introduction
Cervical cancer, primarily caused by persistent human papillomavirus (HPV) infection, represents a major health concern for women worldwide [
1,
2]. It ranks as the fourth most common cancer among women, with HPV [
3] playing a significant role in its development. Additional contributing factors include early pregnancy, oral contraceptive use, poor menstrual hygiene, and prolonged cervical bleeding [
4]. Unfortunately, cervical cancer remains a highly lethal malignancy, claiming the lives of many women each year [
5]. However, it is a preventable disease through effective screening tests and HPV vaccination. Among the screening options available, the Pap smear test has been widely adopted for the detection of precancerous and cancerous abnormalities in the cervical region [
6,
7,
8]. During a Pap smear examination, cervical cell samples are collected and stained on a glass slide, allowing pathologists to visually examine the cells under a microscope for any signs of precancerous changes. Precancerous lesions are classified into different categories, such as normal, CIN1, CIN2, and CIN3, based on the degree of dysplasia exhibited by the squamous epithelium [
6]. In 2018 alone, there were 570,000 reported cases of cervical cancer, resulting in 311,000 deaths [
9,
10].
The significance of the Pap smear test lies in its ability to detect abnormal cellular changes in the cervix, allowing for timely medical intervention and improved patient outcomes. By identifying precancerous lesions, clinicians can initiate appropriate treatments, such as excisional procedures or close monitoring, preventing the progression of the disease to invasive cervical cancer [
11]. Furthermore, the test has proven to be a cost-effective screening tool, particularly in resource-constrained settings where more advanced diagnostic methods may not be readily available. However, despite its efficacy, the Pap smear test is not without challenges, particularly in the presence of noisy Pap smear images. Noisy images, arising from various sources such as equipment limitations, digitization artifacts, or image acquisition conditions, pose significant obstacles to accurate analysis and interpretation. Notably, one prevalent type of noise that affects Pap smear images is Poisson noise, which stems from the nature of light and photon interactions during image capture.
With the advent of digital medical imaging equipment, there has been a growing interest in utilizing image processing techniques for medical image analysis. Biomedical image analysis has been an active area of research, aiming to develop automated diagnostic systems. These systems typically employ three main approaches: segmentation, feature extraction, and classification. Precise segmentation is crucial in biomedical imaging as it highlights regions of interest for further investigation. In the case of Pap smear analysis, segmentation is performed to accurately analyze the nucleus and cytoplasm of cervical cells. Image segmentation, a subfield of computer vision and image processing, aims to group similar areas or regions within an image [
12]. It can be categorized into different types, including semantic segmentation, instance segmentation, and panoptic segmentation. Manual pathological observations often require examining numerous cell nuclei within a single slide image, whereas deep-learning-based pixel-level segmentation techniques enable the simultaneous identification of various cell structures, facilitating the localization of areas of interest.
In Pap smear tests, both the cytoplasm and nucleus are frequently seen together in a single image. However, it is important to note that they are not mutually exclusive, meaning they can exist together within the same cell without one excluding the other. The main objective of the proposed study is to accurately segment both the cytoplasm and nucleus, considering their boundaries and characteristics. Segmenting these structures separately can facilitate quantitative analysis and feature extraction, enhancing the diagnosis of cervical cancer. Although both often coexist in the same image, the segmentation of cytoplasm and nucleus as distinct entities allows for a more comprehensive understanding of cellular morphology and can provide valuable information for diagnostic purposes. By precisely segmenting these regions, features specific to each component can be extracted, aiding in the identification and classification of abnormal cells or early signs of malignancy.
Traditional segmentation methods, such as Gaussian mixture models, Gray-level co-occurrence matrix (GLCM), Wavelet transform, Watershed, Otsu, and Voronoi, have been employed for cervical segmentation [
13]. However, deep learning methods, including U-Net, MaskRCNN, and convolutional neural networks (CNNs), have shown remarkable success in biomedical image segmentation [
14]. Previous approaches relying on conventional techniques and mathematical algorithms have limitations when dealing with complex cell structures due to noise contamination, overstaining, and poor resolution. In recent years, machine learning techniques, particularly deep learning, have emerged as promising approaches for medical image segmentation, offering improved accuracy [
15,
16]. In this study, the effectiveness of the U-Net architecture, a widely recognized deep learning model for biomedical image segmentation, for Pap smear image segmentation has been investigated.
Figure 1 highlights the common types of cervical cancer screening tests.
One of the key advantages of U-Net is its ability to achieve precise segmentation while preserving critical edges and considering specific information related to different cell patterns. Additionally, U-Net requires a relatively small number of training samples, making it highly applicable in practical scenarios. The focus of this study is to accurately segment both the cytoplasm and nucleus, considering their boundaries and characteristics. This enables enhanced quantitative analysis and feature extraction, ultimately aiding in the diagnosis of cervical cancer. Segmenting the cytoplasm and nucleus as distinct entities enables a comprehensive understanding of cellular morphology and efficient extraction of features specific to each component. This facilitates the identification and classification of abnormal cells or early signs of malignancy. The proposed U-Net architecture aims to address the challenges posed by noise in Pap smear images, ultimately achieving accurate segmentation of both the cytoplasm and nucleus, contributing to improved diagnostic capabilities. One significant challenge is the presence of noise and artifacts resulting from sample preparation and image acquisition, which can adversely affect segmentation accuracy. Additionally, the heterogeneity in cell morphology, including variations in size, shape, and staining intensity, poses difficulties in developing a generalized segmentation method. Moreover, the densely packed and overlapping cells in Pap smear images add complexity to the segmentation task. This paper aims to propose a robust deep learning approach that addresses these challenges and achieves accurate segmentation, contributing to improved cervical cancer screening and early diagnosis of cell abnormalities. The different types of deep learning segmentation approaches are presented in
Figure 2.
1.1. Noise in Pap Smear Images
Noise in images can arise from various sources, resulting in random distortions such as color imbalances or brightness issues. In the context of Pap smear images, the presence of noise can significantly impact the accuracy and reliability of differentiating between nuclear and non-nuclear features, leading to potential misclassifications and decreased performance in cytoplasm and nucleus segmentation. Examples of different types of noise are shown in
Figure 3 [
17,
18,
19,
20]. Biomedical images commonly exhibit different types of noise, including Poisson noise, speckle noise, Gaussian noise, and impulse noise.
Poisson Noise: Poisson noise occurs when there are an insufficient number of energy-carrying particles, such as electrons or photons, in an electronic or photosensitive device. Factors contributing to Poisson noise can include low levels of radiation, inefficient detectors, or high background radiation.
Speckle Noise: Speckle noise is a granular and multiplicative noise that degrades image quality by reducing resolution, contrast, and pixel information. It is caused by the constructive and destructive interference of coherent waves interacting with a rough or textured surface or interface. In medical imaging, such as ultrasound, speckle noise is a result of the interaction of ultrasound waves with small structures or tissue interfaces that have different acoustic impedance values.
Gaussian Noise: Gaussian noise, also known as random or white noise, is additive in nature and follows a normal distribution. It arises from random variations in pixel intensities due to factors such as electronic noise in the imaging system, photon noise during acquisition, or natural tissue variation.
Impulse Noise: Impulse noise, also known as salt and pepper noise, manifests as isolated bright or dark pixels or bursts of random noise in medical images. It can obscure important details or create false features, making image interpretation challenging. Impulse noise arises from random spikes or drops in pixel intensity values, often due to sensors malfunctioning.
1.2. Pathologist’s Perspective on Image Noise in Pap Smear Analysis
Understanding image noise in Pap smear analysis from the pathologist’s perspective is vital for improving the accuracy and reliability of cervical cancer screening. Pathologists adopt a systematic approach to assess and quantify image noise by evaluating various visual characteristics that impact the quality of Pap smear images. This evaluation aids in identifying noise-related issues and ensuring the integrity of diagnostic interpretations. Factors considered by pathologists include focus quality, background artifacts, staining quality, cellularity, non-cellular artifacts, uniformity, and clarity of cellular features.
Focus Quality: Pathologists examine the sharpness and clarity of Pap smear images to determine if they are properly focused. Blurry or out-of-focus images may indicate the presence of noise or other technical issues that can affect the accuracy of interpretation.
Background Artifacts: An evaluation of the image background is performed to identify irregularities, distractions, or artifacts that may contribute to image noise. These artifacts can include uneven staining, debris, air bubbles, or unrelated stains that impact the overall image quality.
Staining Quality: The quality of staining on the slide is assessed as it directly affects the clear visualization of cellular details. Adequate staining is crucial for minimizing noise and facilitating accurate interpretation of the image.
Cellularity: Pathologists consider the density of cells in the image. Insufficient cellularity or excessive debris can hinder proper examination and identification of abnormalities, leading to noisy images.
Non-Cellular Artifacts: The identification and assessment of non-cellular artifacts introduced during sample preparation or imaging are essential. These artifacts, such as folds, scratches, or stains unrelated to cellular components, can contribute to increased noise levels and obscure relevant information.
Uniformity: The uniformity of cell distribution across the image is evaluated to identify any irregular or patchy distribution. Such irregularities may indicate sampling or preparation issues that contribute to noise and impact the overall image quality.
Clarity of Cellular Features: Pathologists analyze the clarity and distinctness of cellular features, including nuclear morphology and cytoplasmic characteristics. The presence of noise can be inferred if these features are poorly defined or difficult to discern, highlighting the importance of noise reduction for accurate interpretation.
Considering these factors and the pathologist’s perspective on image noise in Pap smear analysis is crucial for developing strategies and techniques to enhance the quality of images, reduce noise, and ultimately improve the effectiveness of cervical cancer screening.
Figure 4 illustrates examples of both noisy and noise-free images. The noise-free images are represented by (a)–(c), whereas (d)–(f) display the corresponding noisy images which are corrupted by Poisson noise.
The sections of this paper are organized as follows:
Section 2 presents motivation and contribution.
Section 3 provides an overview of previous research on Pap smear segmentation, encompassing conventional image processing techniques and deep learning approaches, with a specific focus on the U-Net architecture.
Section 4 presents the methodology, including the dataset used, pre-processing techniques, and the proposed architecture. The experimental results are outlined in
Section 5.
Section 6 outlines the conclusion of the research.
2. Motivation and Contributions
The primary goal of this work is to apply deep learning algorithms to ensure efficient segmentation of the cytoplasm and nucleus from the noisy Pap smear images of cervical cells, which could be further used for the development of intelligent diagnostic tools for the timely and accurate diagnosis of cervical cancer. While several studies have been conducted on the segmentation of cell organelles in normal images using conventional image-processing-based algorithms, none have focused on the use of deep-learning-based algorithms for the segmentation of cell organelles in noisy images. The hypothesis of this work is to investigate whether a robust deep learning approach can accurately distinguish between nuclear and non-nuclear regions in noisy Pap smear images. By utilizing a state-of-the-art deep learning architecture, such as the U-Net model, and training it on enough annotated Pap smear data, highly accurate segmentation of the cytoplasm and nucleus regions can be achieved, even in the presence of the noise and variations commonly encountered in Pap smear images. This approach aims to improve the accuracy and efficiency of cytoplasm and nucleus segmentation, which is crucial for various medical applications and disease diagnosis.
The contributions of this paper are as follows:
1. Implementation of the U-Net Architecture on Various Pap Smear Cell Images: In this study, the U-Net architecture has been successfully implemented on a diverse set of Pap smear images. This implementation allows the comparison of the performance of the U-Net model across different classes of cervical cells obtained from Pap smear images. Utilizing this state-of-the-art deep learning model enhances both the accuracy and efficiency of cell segmentation and classification, which are pivotal for ensuring precise diagnostic analysis.
2. Addressing the Impact of Poisson Noise on Cytoplasmic and Nuclear Membranes: A key aspect that has been inadequately explored in previous studies is the significant impact of Poisson noise on the distortion of cytoplasmic and nuclear membranes in Pap smear cell images. In this research, a comprehensive investigation of this phenomenon has been undertaken, offering insights into the challenges posed by Poisson noise and its implications for accurate cell membrane segmentation and cellular structure analysis.
3. Performance Evaluation and Model Comparison Metrics: In this research, the performance of the U-Net model has been quantitatively evaluated and compared with other segmentation techniques. The evaluation includes multiple metrics such as IOU, Specificity, Sensitivity, and Dice coefficient. These metrics provide a comprehensive and rigorous assessment of the model’s segmentation capabilities, enabling a fair comparison with existing methods.
By making these contributions, this work aims to advance the field of cell image analysis and pave the way for improved diagnostic accuracy in Pap smear screenings. The insights gained from this study have the potential to aid research in the domain of medical image analysis in detecting abnormalities more efficiently, thus enhancing the effectiveness of cervical cancer screening and early intervention.
3. Related Work
A comprehensive examination of the existing literature focuses on nuclei and cytoplasm segmentation, which can be classified into three main categories: (1) image processing strategies, (2) machine learning methodologies, and (3) deep learning approaches. The research conducted on Pap smear analysis in the literature primarily investigates feature extraction, classification, and segmentation of cervical cells.
Mayala et al. (2022) investigated an Otsu method for cytoplasm and nucleus segmentation from white blood cells [
21]. This approach utilizes the local minima of estimated function values, automatically calculating the threshold. Various quantitative metrics, such as the Jaccard index and Dice similarity coefficient, were employed for technique evaluation, which experimentally demonstrated good segmentation results compared to other state-of-the-art techniques. Balaji et al. (2021) introduced Boykov’s graph cuts method to segment the cytoplasm for cervical cancer diagnosis. The introduced approach calculated the image information through the formulation of objective functions using a synergy cloud model. The experimental results revealed that the classification accuracy increased by 14% compared to state-of-the-art approaches [
22]. Zhao et al. (2021) investigated selective edge enhancement to segment the nucleus in Pap smear images for cervical cancer diagnosis. The presented approach combined mathematical operators with selective search to automatically avoid repeated segmentation and remove non-nuclei regions while segmenting entire slide cervical images into small regions of interest (ROIs) [
23]. Additionally, a method for selectively enhancing the nucleus edge based on the canny operator and mathematical morphology is proposed to extract edge information as a weight. The major advantage of the proposed approach lies in segmenting the images properly, even in low-contrast mode with unclear edges [
24]. Riana et al. (2018) combined the k-means and Otsu method to perform segmentation in overlapping cells of Pap smear images. The major advantage of the presented approach is that it considers the color features that previous researchers have not used [
25]. Patil et al. (2020) proposed a method for structurally analyzing the glandular region, locating and segmenting nuclei from uterine biopsy images. Certain morphological operations and a deconvolution algorithm followed by a threshold are used to achieve the results. In this study, thresholding is applied to convert the nuclei into binary images. The properties of nuclei, including their size, shape, and roundness, can be effectively analyzed to detect malignant cells once successfully separated from each other [
26].
Arya et al. (2020) investigated three distinct techniques: automated seed-growing region, modified moving k-means algorithm, and extended edge-based detection for segmenting nuclei in Pap smear images in the presence of debris. The features extracted from Pap smear images using these techniques included standard deviation and the number of objects for both normal and abnormal cells. The experimental analysis indicated that the modified moving k-means algorithm performed well compared to other approaches [
27]. Zhu et al. (2021) introduced an edge-tracking algorithm to perform the instance segmentation of overlapped cervical cells. The U-Net architecture performs semantic segmentation and divides the Pap smear image into four distinct segments: cytoplasm, nucleus, overlapped area, and background. Semantic segmentation achieved 92.8% accuracy, whereas instance segmentation achieved 95.04% accuracy for cervical cancer diagnosis [
28]. Mulmule et al. (2022) addressed the challenges of manual inspection in Papanicolaou test screening and presented an AI-based tool for detecting cervical cancer dysplasia. The authors explored both the segmentation and classification of Pap smear images, generating feature vectors from filtered images, which included 163 features related to edges, noise, membrane detection, color, and more. Various classifiers, such as artificial neural networks, support vector machine, and random forest, were considered for analysis [
29]. Song et al. (2019) proposed a robust method for segmenting overlapped cytoplasmic cervical cells in Pap smear tests. The authors used contour information, intensity data, curvature details, and shape priors to accurately delineate obscured borders in challenging segments. The method was trained and validated on the ISBI 2015 dataset and a primary dataset from Shenzhen Sixth People’s Hospital in China; the proposed approach outperformed other techniques [
30].
Hussain et al. (2020) designed a fully convolutional neural network for instance segmentation and cervical nuclei classification, surpassing the performance of U-Net and Mask_RCNN. The models were compared based on a Zijdenbos similarity index of 97% and achieved a classification accuracy of 98.8%. The training datasets included liquid-based cytology and the Herlev dataset [
31]. Desiani et al. (2021) introduced the concept of simultaneous segmentation and classification of Pap smear images. Segmentation was achieved using a convolutional neural network, while classification was performed using artificial neural networks and k-nearest neighbors. The segmentation accuracy reached 77%, followed by a 90% accuracy in classification [
32]. Zhang et al. (2020) presented a robust method for extracting the nucleus and cytoplasm from overlapped cells of cervical cells for cervical cancer diagnosis. Huang et al. (2019) presented a U-Net model segmentation for overlapped cervical cancer cells from Pap smear images. The dataset for the model training was taken from the ISBI 2014 [
33]. Shi et al. (2021) developed an automatic deep learning segmentation method, a variant of the 3D U-Net, for the radiation treatment of cancer-affected cells. This process improved the delineation of target radiation boundaries for proper diagnosis of a disease [
14]. The study was conducted on 462 cancer patients from 2017 to 2019. This approach is better than the traditional approach, taking only 2 min instead of 30 min, thus reducing the time by 28 min per patient for radiation therapy of cervical cancer. Song et al. (2015) segmented cytoplasm and nucleus from cervical cells for cervical cancer diagnosis. The authors utilized multiscale convolutional neural networks and graph partitioning architectures and trained the model using a dataset from the Sixth People’s Hospital of Shenzhen. The experimental results were evaluated using the Dice similarity coefficient, demonstrating superior performance compared to existing approaches. [
15]. Siddique et al. (2021) highlighted a review of U-Net and its variants (3D U-Net, attention U-Net, and ensemble U-Net) for medical image segmentation. The authors explored U-Net on different medical modalities, it was observed that U-Net and U-Net variants are precisely segmenting the region of interest for proper identification of a disease [
34]. Arum et al. (2021) developed unique U-Net variants for squamous columnar junction segmentation in cervical cancer diagnosis. Using the visual inspection with acetic acid (VIA) method, a common cervical precancer screening tool, the authors achieved excellent performance in metrics like accuracy, mean IoU, mean accuracy, Dice coefficient, precision, and sensitivity. This was the first U-Net-based segmentation approach for squamous columnar junction [
35].
In 2021, Park et al. compared machine learning and deep learning for cervical cancer screening. It was observed that SVM and ResNet-50 outperformed other machine learning models [
36]. Four CNN models were investigated by Mousser et al. in 2019 to extract deep features from Pap smear pictures for the investigation of cervical cancer. One of the most challenging issues in public health today is cervical cancer. Researchers’ top priority is early detection with a Pap smear test, even though it is one of the most avoidable cancers. To identify whether the structures are healthy or diseased, cytopathologists use hand-crafted features, which is more time consuming. Thus, the authors used deep learning approaches to speed up the feature extraction from Pap smear images. ResNet-50 performed better than VGG and InceptionV3 according to the experimental findings on the DTU/HERLEV database, obtaining 89% accuracy [
37]. Machine learning and deep learning techniques were utilized by Da et al. (2021) to diagnose cervical cancer. RetinaNet (ResNet-50) identified abnormal areas and nuclei, while SVM performed the final categorization [
38]. Nahida et al. (2022) presented an extensive review of deep learning approaches in cervical cancer diagnosis. The techniques used were CNN, VGGNet, ResNet, GoogleNet, AlexNet, GAN, and other deep learning techniques. It was observed that deep learning approaches have outperformed other AI techniques [
39].
4. Material and Methods
4.1. General Workflow of the Proposed Methodology
The proposed methodology aims to achieve highly accurate segmentation of the cytoplasm and nucleus through a comprehensive approach consisting of three distinct phases.
The first phase involved meticulous sample collection from a hospital setting. The objective was to create a representative dataset that accurately reflected the diversity and characteristics of the target population. Careful consideration was given to sample selection to ensure an adequate representation of various cytoplasm and nucleus variations, thus enhancing the effectiveness of the research. In the second phase, the U-Net model, renowned for its exceptional performance in image segmentation tasks, was selected as the underlying architecture for the segmentation process. The U-Net model was specifically chosen for its ability to effectively segment cytoplasm. It was tailored to address the segmentation requirements of six distinct types of cytoplasm, each corresponding to specific cell types of interest. The focus of the third phase shifted towards nucleus segmentation and understanding the impact of noise on the segmentation process. The workflow for achieving cytoplasm and nucleus segmentation is visually illustrated in
Figure 5, providing a clear representation of the methodology.
The segmentation process commenced by inputting the images into the architecture twelve times, along with their corresponding ground truth images. The architecture was implemented individually for each of the six types of cytoplasm, including superficial squamous cells, intermediate squamous cells, parabasal cells, basal squamous cells, LSIL, and HSIL. Subsequently, the architecture underwent six cycles of training and testing for the segmentation of the six types of nuclei, each associated with its respective cytoplasm. During the testing phase, ground truth images were not required as they had already served as an integral part of the architecture’s training process. The accurate mapping of ground truth images with the input images was crucial to ensure precise segmentation. Any inaccuracies in the mapping process could result in segmentation occurring in the wrong areas of the images. To address this, the Label Studio platform was employed for the preparation of accurate ground truth images.
4.2. Dataset Description
The dataset utilized in this research comprises Pap smear slides obtained from the Department of Pathology at the “Sher-i-Kashmir Institute of Medical Sciences Soura” in Kashmir. A total of 110 slides were collected, out of which 30 slides were deemed unsuitable due to issues such as overstaining or the absence of appropriate cells for automatic diagnosis of cervical cancer. The remaining 70 slides underwent meticulous analysis by expert pathologists following the Bethesda system of classification to ensure accurate cell categorization. The examination of the Pap smear slides was carried out using an LX 300 microscope with 240 volts and the MICAPS software. The dataset consisted of 2100 cervical cells obtained from a Pap smear test captured using a MICAPS camera at a magnification of 100×. All cells were consistently maintained at a resolution of 2540 × 2540 pixels, which facilitated easy cropping of individual cells during the model training process. To create a comprehensive database, a collaborative effort was undertaken with pathologists. This involved the meticulous categorization of the cervical cells based on specific characteristics. Subsequently, a well-curated database was established, comprising a total of 2100 cells representing six distinct types of cervical cells. Care was taken to maintain a balanced distribution of 300 cells per class to assess the impact of noise on the segmentation process.
Table 1 provides a detailed description of the various cell types present in the Pap smear test. The prepared dataset consists of 700 abnormal images and 1400 normal images, ensuring a balanced representation of different cell types for further analysis and model development. The creation of the primary dataset involved several steps, as depicted in
Figure 6.
The dataset used in this research serves as a critical foundation for training and evaluating the segmentation model. Its careful curation, balanced representation of cell types, and accurate categorization contribute to the reliability and robustness of the subsequent analyses and model development. The following section will present the implementation of the proposed methodology for precise segmentation of the cytoplasm and nucleus in Pap smear images, followed by the results and analysis obtained from this process.
The database includes images of both normal and abnormal cells, contributing valuable insights into the cytological characteristics of cervical tissue samples. The examination of these cell types is crucial for cervical cancer screening, as it allows for the identification of potential precancerous or cancerous lesions.
Figure 7 shows some of the samples from the database.
Normal Cells:
- (a)
Superficial Squamous Cells: The topmost layer of non-keratinizing epithelium is represented by superficial squamous cells. They have a polygonal shape, and due to nuclear degeneration, nuclear details are not visible. During the late proliferative and ovulatory phases of the menstrual cycle, an abundance of superficial squamous cells can be observed.
- (b)
Intermediate Squamous Cells: These cells are polygonal in shape and range from 1256–1618 µm in size [
40]. Their nuclei are vesicular and noticeably larger. In an oestrogenized cervix, both superficial and intermediate layer cells have increased cytoplasm as they age and contain high levels of glycogen.
- (c)
Parabasal Squamous Cells: The basal layer of the squamous epithelium contains parabasal squamous cells. These cells are frequently observed in patients lacking estrogen, such as premenstrual, postpartum, those using estrogen-restricting drugs, or post-menopausal individuals. Parabasal squamous cells are characterized by a 50 µm nucleus enclosed in a dense, uniform basophilic cytoplasm [
41].
- (d)
Basal Squamous Cells: These cells are present in the lower layer of the epidermis, which is the outermost layer of the skin. If these cells become cancerous, they can lead to a type of skin cancer known as basal cell carcinoma. This is the most common form of skin cancer, typically appearing as a small, shiny bump on the skin. While basal cell carcinoma is usually slow-growing and rarely spreads to other parts of the body, it can cause significant damage to surrounding tissue if left untreated [
42].
Abnormal Cells:
- (e)
Low-Grade Squamous Intraepithelial Lesions/Mild Dysplasia (LSILs): An LSIL represents a region with abnormal cells enriched with abundant cytoplasm. The nucleus in an LSIL is three times larger than normal intermediate cells, and the nuclear membrane appears highly irregular.
- (f)
High-Grade Squamous Intraepithelial Lesions/Cervical Intraepithelial Neoplasia (HSILs/CIN): An HSIL is a squamous cell aberration associated with the human papillomavirus (HPV). Though not all HSIL cases develop into cancer, it is considered a precancerous lesion and is usually effectively treated. HSILs are categorized into three grades—CIN 1, CIN 2, and CIN 3. CIN 1 is a low-grade lesion with a relatively low risk of developing into cancer, affecting approximately one-third of the epithelial cells. CIN 2 is characterized by moderate dysplasia (high-grade lesion), affecting the upper two-thirds of the epithelium. CIN 3 is severe dysplasia, impacting the upper two-thirds of the epithelium.
Table 1 represents the total number of cervical cells used for training and testing purposes.
4.3. Pre-Processing
Data pre-processing plays a crucial role in deep learning models as it involves cleaning and transforming raw data into a suitable format. This initial step is essential to prepare the data for model building. Pre-processing typically encompasses tasks such as data integration, data transformation, data reduction, and data cleaning. One of the notable advantages of our approach is that it requires minimal pre-processing steps. The only pre-processing step performed in this study was resizing the images to ensure uniformity. Specifically, all 2100 images were resized to a consistent height and width of 128 × 128 pixels. This resizing step facilitates efficient model training and ensures that the input data has a standardized format. Maintaining a uniform size for all images eliminates potential variations in image dimensions that could adversely affect the model’s performance. This pre-processing step simplifies the subsequent stages of the model development process, allowing for a focus on the core aspects of segmentation and analysis. The limited pre-processing requirements of the proposed approach save computational resources and streamline the workflow, enabling more efficient utilization of the dataset for model training and evaluation. The subsequent sections of the paper will delve into the implementation of the methodology, the experimental results, and the analysis derived from the segmentation of the cytoplasm and nucleus in Pap smear images.
4.4. U-Net Architecture for Segmentation of Cytoplasm and Nucleus
The U-Net architecture, initially introduced by Olaf Ronneberger in 2015 [
41], is a convolutional neural network (CNN) that consists of two interconnected paths: the encoding path and the decoding path. The encoding path, situated on the left, comprises a series of encoders responsible for contracting the input data. On the right, the decoding path consists of decoders to expand and recover the spatial information. Pixel-level classification is performed at each level to maintain the input–output size balance, and bottleneck layers are employed to produce final feature maps. Key components of the U-Net architecture include max-pooling layers, which reduce the dimensions by selecting the maximum value within a specific region, ReLU (rectified linear unit) as the nonlinear activation function and skip connections that enable direct connections between certain layers, providing inputs for subsequent levels. The pooling layers can employ either average pooling or max pooling, while the ReLU function returns 0 for negative inputs and the input value for positive inputs.
The primary objective of the U-Net architecture is to achieve precise semantic image segmentation, particularly for biomedical images. Originally proposed with 572 × 572 input images and 64 feature channels for biomedical image segmentation [
41], the U-Net model was modified to 128 × 128 through pre-processing (resizing). Additionally, the feature space was reduced to 16 channels instead of the default 64. The top left layer of the network serves as the input layer, accepting colored 2D images of size 128 × 128 × 3 (height, width, and RGB value of an image). The output of the first layer results in a feature map of size 128 × 128 × 16. The convolution operations use 3 × 3 matrices with a kernel size of 3 × 3. To maintain a balance between the input and output image sizes, one extra pixel was added to the edges of the image. Subsequent operations include a 2 × 2 max-pooling layer, which selects the maximum value within each 2 × 2 region, thereby reducing the dimensions by half (64 × 64) and generating 32 feature channels. This process continues with the 3rd and 4th contracting blocks, resulting in configurations of 32 × 32 × 64 and 16 × 16 × 128, respectively. After the contraction stage, the configurations become 8 × 8 × 128 and 8 × 8 × 256. Next, an up-sampling operation from the 8 × 8 × 256 configuration to 16 × 16 × 128 is performed, followed by the 4th block of the decoder, achieving a 32 × 32 × 64 configuration. The 3rd block of the decoder produces a configuration of 64 × 64 × 32, and finally, the output layer yields a segmentation map of size 128 × 128 × 1, representing the cytoplasmic segmentation of Pap smear cells from their respective nuclei. Each contract block consists of two convolutional layers using 3 × 3 kernels, followed by a max-pooling layer with a size of 2 × 2 and a dropout layer with a dropout rate. An expand block includes a transposed convolutional layer using 3 × 3 kernels, a concatenation layer that incorporates spatial information from the contract blocks, a dropout layer with a dropout, and two convolutional layers using 3 × 3 kernels.
Figure 8 and
Figure 9 provide diagrammatic representations of how the U-Net architecture is implemented for the segmentation of cytoplasm and nucleus from Pap smear images.
4.5. Training and Validation
The training and validation process of the U-Net model involves the careful tuning of various hyper-parameters to optimize its learning capabilities and enhance performance. These hyper-parameters encompass crucial aspects such as the ReLU activation function, kernel initializer, data normalization, padding, dropout rate, loss function, and the number of epochs. The model employs the ReLU (rectified linear unit) activation function, which introduces nonlinearity by mapping negative values to zero and preserving positive values. This choice of activation function simplifies and improves the training process. For initializing the model’s weights, the “he normal” initializer is introduced, generating values from a truncated normal distribution centered around zero. This initializer aids in achieving better convergence and improved learning throughout the training process. To balance the sizes of the input and output images, padding equal to “same” is used, ensuring that the output image maintains the same spatial dimensions as the input image. Data normalization is applied to ensure consistent scaling across the dataset. A lambda function is used to convert input integer values to floating-point values, typically by dividing the input by 255. This normalization process brings the pixel values within a suitable range, facilitating effective model training.
To prevent overfitting, a dropout rate of 10% is implemented. Dropout randomly sets a certain percentage of input units to zero during each training step, reducing the model’s reliance on specific features and enhancing its generalization ability. For the final layer of the model, the binary cross-entropy loss function is adopted, coupled with a sigmoid activation function. This combination is well suited for binary classification tasks, effectively predicting the presence or absence of certain features in the images. During the training phase, the model undergoes 100 epochs, representing the number of complete passes through the entire dataset. This iterative process enables the model to gradually learn and improve its performance over multiple iterations. Optimizing the model’s parameters during training is achieved using the Adam optimizer. This adaptive learning rate optimization algorithm combines the strengths of both the AdaGrad and RMSProp algorithms, resulting in improved convergence and faster training.
The experiments in this research were conducted using the Spider IDE. The hardware setup used for training, testing, and validation of the U-Net model was an NVIDIA GeForce MX450 GPU, which provided the necessary computational power for training and evaluating the U-Net model. To support GPU acceleration, CuDNN version 8.1 and CUDA version 11.2 were utilized. CuDNN offers optimized implementations of activation layers, convolution operations, and normalization techniques, enhancing the efficiency of deep learning computations. CUDA, as an API, enables parallel execution, further leveraging the capabilities of the GPU. For capturing the images, a microscope LX 300 was employed, followed by a MICAPS SONY CMOS camera mounted on the microscope. These devices allowed for the acquisition of high-quality images, required for the segmentation task. The MICAPS software was utilized in conjunction with the microscope and camera to capture and process the images effectively. To generate the ground truth images, the Label Studio platform was adopted. Label Studio provides a comprehensive solution for annotating and labeling images. Label Studio offers a user-friendly interface and tools for creating accurate ground truth images, which are crucial for training and evaluating the U-Net model.
5. Experimental Results
In this section, the experimental results of the proposed robust deep learning approach for accurate segmentation of the cytoplasm and nucleus in noisy Pap smear images are presented. The method was thoroughly evaluated on a diverse dataset, and the segmentation performance was assessed using several key metrics, including IoU, sensitivity, specificity, and Dice coefficient (Dice Coeff). IoU stands for intersection over union, and it is a common evaluation metric used in image segmentation tasks. It measures the overlap between the predicted segmentation mask and the ground truth mask. Sensitivity measures the ability to correctly identify cytoplasm and nucleus regions, while specificity gauges the model’s proficiency in recognizing the background pixels accurately. The Dice coefficient provides a combined measure of precision and recall, reflecting the overall segmentation accuracy. Through extensive experimentation, the proposed approach demonstrated remarkable robustness and efficacy in accurately segmenting cellular structures in noisy cytological images, showcasing its potential to enhance the precision and reliability of cervical cancer screenings.
In this study, a dataset of 2100 images of cervical cells obtained from 110 Pap smear slides was used. The images in the dataset had non-uniform sizes, so were resized to a standardized dimension of 128 × 128 pixels. This pre-processing step ensured consistency in the input size for the segmentation process. The experimental results demonstrated the effectiveness of the U-Net-based segmentation approach.
Table 2 provides an overview of the segmentation performance over various performance metrics. Among the different cell types, intermediate squamous cells exhibited the highest accuracy in segmenting both the cytoplasm and nucleus, followed by other types of cervical cells. This superior performance can be attributed to the presence of varying degrees of noise in the other cell types, affecting the segmentation results. The experimental results highlight the high accuracy and reliability of the presented approach.
Table 2 clearly shows that intermediate squamous cells perform better segmentation compared to the other five types of cells. The Otsu method is an image thresholding technique that finds the optimal intensity value to separate foreground and background regions in a grayscale image. It calculates the threshold that maximizes the variance between the two classes, allowing for automatic image segmentation. From the above results, it is evident that the conventional Otsu method struggles to accurately segment the cytoplasm and nucleus, as depicted in
Figure 10. In contrast, the U-Net approach yields improved results, with better preservation of both cytoplasm and nucleus boundaries. These observations suggest that U-Net outperforms the Otsu approach in preserving the boundaries of the cytoplasm and nucleus.
The qualitative analysis conducted in this study revealed that noise exerted a significant impact on the segmentation of the cytoplasm and nucleus in Pap smear images. Higher levels of noise posed challenges in accurately delineating the boundaries, resulting in deviations from the ground truth annotations. However, the U-Net algorithm showcased the potential for mitigating the effects of noise with advanced processing techniques. This suggests opportunities for further refinement and improvement in handling noisy images.
Comparatively, the U-Net model demonstrated superior performance in preserving cell boundaries compared to the Otsu method for cytoplasm and nucleus segmentation. The U-Net architecture incorporates skip connections between corresponding encoder and decoder layers, enabling the model to retain and propagate fine-grained spatial information throughout the segmentation process. This allows the U-Net model to effectively capture both local and global context, enabling it to learn complex cell boundary patterns and accurately capture nuanced variations in shape and texture. In contrast, the Otsu method relies on global thresholding, treating each pixel independently without considering contextual information. This approach may struggle to accurately differentiate between the cytoplasm and nucleus regions, particularly when they exhibit similar intensity distributions. The U-Net model’s ability to leverage contextual information and its architectural design tailored specifically for segmentation tasks make it better equipped to preserve cell boundaries and produce more accurate and precise segmentation results. The findings suggest that the U-Net model holds promise for improving the segmentation of the cytoplasm and nucleus in Pap smear images, particularly in the presence of noise. The incorporation of contextual information and the architecture’s ability to capture spatial details contribute to its superior performance compared to traditional methods like the Otsu method. This highlights the potential of deep learning approaches in enhancing the accuracy and reliability of cytoplasm and nucleus segmentation in medical imaging applications.