Cervical Cancer Diagnosis Based on Multi-Domain Features Using Deep Learning Enhanced by Handcrafted Descriptors

: Cervical cancer, among the most frequent adverse cancers in women, could be avoided through routine checks. The Pap smear check is a widespread screening methodology for the timely identiﬁcation of cervical cancer, but it is susceptible to human mistakes. Artiﬁcial Intelligence-reliant computer-aided diagnostic (CAD) methods have been extensively explored to identify cervical cancer in order to enhance the conventional testing procedure. In order to attain remarkable classiﬁcation results, most current CAD systems require pre-segmentation steps for the extraction of cervical cells from a pap smear slide, which is a complicated task. Furthermore, some CAD models use only hand-crafted feature extraction methods which cannot guarantee the sufﬁciency of classiﬁcation phases. In addition, if there are few data samples, such as in cervical cell datasets, the use of deep learning (DL) alone is not the perfect choice. In addition, most existing CAD systems obtain attributes from one domain, but the integration of features from multiple domains usually increases performance. Hence, this article presents a CAD model based on extracting features from multiple domains not only one domain. It does not require a pre-segmentation process thus it is less complex than existing methods. It employs three compact DL models to obtain high-level spatial deep features rather than utilizing an individual DL model with large number of parameters and layers as used in current CADs. Moreover, it retrieves several statistical and textural descriptors from multiple domains including spatial and time–frequency domains instead of employing features from a single domain to demonstrate a clearer representation of cervical cancer features, which is not the case in most existing CADs. It examines the inﬂuence of each set of handcrafted attributes on diagnostic accuracy independently and hybrid. It then examines the consequences of combining each DL feature set obtained from each CNN with the combined handcrafted features. Finally, it uses principal component analysis to merge the entire DL features with the combined handcrafted features to investigate the effect of merging numerous DL features with various handcrafted features on classiﬁcation results. With only 35 principal components, the accuracy achieved by the quatric SVM of the proposed CAD reached 100%. The performance of the described CAD proves that combining several DL features with numerous handcrafted descriptors from multiple domains is able to boost diagnostic accuracy. Additionally, the comparative performance analysis, along with other present studies, shows the competing capacity of the proposed CAD.


Introduction
Cervical cancer continues to be the fourth most common malignant tumor in women worldwide, accounting for 6.6 percent of the total of all female cancers diagnosed in 2018 [1].Every year, more than 500 thousand women are diagnosed with cervical cancer, with less than 300 thousand dying from the disease worldwide [2].In 2020, around 604,000 women were diagnosed with cervical cancer.Nearly 90% of almost 342,000 cervical cancer deaths in 2020 occurred in low-and middle-income countries [3].More than 85% of cervical cancer patients live in developing countries, with Africa having the most occurrences.Cancer has a greater chance of being fatal in these countries [4].This is due to a lack of knowledge about the disease and restricted healthcare availability.Developed countries, on the other hand, have strategies in place to permit accurate and efficient screening tools, allowing precancerous lesions to be discovered and treated at an initial phase [5].It is widely accepted that early identification and treatment of premalignant lesions could indeed inhibit cancer progression in nearly 90% of cervical cancer patients.As a result, detecting cervical cancer in its early stages is critical.
Based on detailed microscopic examination, the Pap smear test is regarded as a wellknown screening tool for the detection of cervical pre-cancerous lesions or premalignant cells.Cervical cancer is diagnosed using either the standard approach or liquid-based cytology (LBC), which is strongly suggested by a clinician based on subjective clinical examination.An excellent Pap smear test report determines the level of the tumour, if there is any, and subsequently affirms the cervix cancer identification type based on The Bethesda System (TBS) [6].Because LBC can produce a relatively clean and more homogeneous slide for microstructural analysis than the traditional techniques, it is confirmed to be a much more effective and convenient method than the standard approach [7,8].Pap smear cell films may be characterized into various subgroups.The most difficult aspect of identifying these cells is that several of the cell groupings appear to be identical in terms of the size of the cell nucleus appearance.A thorough examination of such cells for tumor identification is liable to specialists' expertise and cancer aetiology, leading to misdiagnosis in certain contexts and postponed treatment [9].Consequently, professionals found routine examination is exhausting and prone to human error.To automate the classification process, a superior resolution to this issue is required.
Recent developments in computer technology have made it possible for pathologists and doctors to detect and diagnose several medical tumors and diseases using computeraided diagnostic (CAD) systems.These automated systems are composed of two groups of families.The first group is the traditional CAD models based on standard machine learning approaches.Feature extraction is a critical step in any traditional CAD model to diagnose and classify abnormalities in medical images.Nevertheless, various groups of features, such as texture features, statistical features, and shape descriptors, must be extracted for better classification of normal and infected pap smear images.Texture features of an image give statistics about the spatial arrangement of occurrences.Texture descriptors, such as grey level co-occurrence matrix (GLCM), Gabor wavelets (GW), Discrete wavelet transform (DWT), and others, can be used to extract texture features [10].Since those techniques focus solely on a single descriptor of the photo, they could fail with various kinds of images.A method that retrieves shape-based attributes, for instance, might not be capable of extracting other descriptors in the image that includes texture information.
The more recent group of CAD systems is based on modern deep learning (DL) techniques which are capable of automatically extracting deep features from medical images.DL techniques could be employed for various sorts of image attributes and characteristics.DL-based methods, in addition to traditional feature extraction techniques, can be used to obtain discriminatory features from raw image data.However, when it comes to classification issues, DL-based algorithms perform much better than conventional approaches [11].Although some studies showed that co6mbining DL features with traditional handcrafted features could improve diagnostic performance.Among DL architectures, Convolutional neural networks (CNNs) have attained significant results in a wide range of health [12,13] and medical imaging applications in recent years [14,15], particularly in mammogram [16], facial images [17], histopathology [18][19][20][21][22][23], magnetic resonance imaging [24], fundus imaging [25], computed tomography scanning [26][27][28].Motivated by the CNNs' great success in several medical and health domains, they have been adopted in several CAD models for cervical cancer diagnosis.Large amounts of data are required for CNN models in order to avoid over-fitting and poor generalization.Because labeling cervical cell photos are challenging, transfer learning (TL) is intended to share the knowledge out of a source domain to a target domain in order to prevent over-fitting.CNN previously trained on ImageNet with TL can be applied to cell images [29][30][31].
This study proposes a CAD system based on multiple DL features and traditional handcrafted features.It employs three compact CNNs to obtain DL features from models of distinct structures.The introduced CAD also extracts statistical features, as well as textural descriptors including GLCM, DWT, and Gabor.It examines the impact of each group of features independently on diagnostic accuracy.Then it investigates the influence of fusing each DL feature set attained from every CNN with the combined handcrafted features.Finally, it merges the whole DL features with the combined handcrafted features, using principal component analysis to explore the effect of merging multiple DL features with various handcrafted features on classification accuracy.
The main contributions and novelty of the proposed CAD are as follows: • Developing an effective CAD based on multiple compact CNNs with lower deep layers and less parameters, and several handcrafted feature extraction approaches instead of using a single methodology like the existing CADs which employs a single CNN having huge deep layers and parameters or one handcrafted approach.

•
The proposed CAD does not need any pre-segmentation or enhancement steps which are required by several existing CADs.

•
Merging features from multiple domains including spatial DL features and texture features from the time-frequency domain such as DWT and Gabor Wavelets (GW) rather than utilizing one type of feature extraction method extracted from a single domain thus improving classification accuracy.

•
The proposed CAD also obtains texture GLCM features as well as statistical features from the time/spatial domain.

•
Examining the influence of blending multiple handcrafted features with each DL feature set retrieved from every single CNN independently which is not commonly used on existing CADs.

•
Aggregating the multiple DL feature sets with numerous handcrafted features via a feature reduction technique such as PCA to diminish the size of the features and lower the training duration.

Literature Review
An overview of relevant CAD systems that are utilized to analyze the pap smear images for diagnosing cervical cancer will be discussed in this section.This section is will first illustrate conventional CAD for cervical cancer diagnosis.Then it will discuss DL based CAD for cervical cancer diagnosis.Finally, it will demonstrate hybrid-based CADs.

Conventional CADs for Cervical Cancer Diagnosis
As mentioned before, conventional CAD models have utilized a classical ML approach that depends on extracting handcrafted descriptors from pap smear slides.Among them, the study [32] employed discrete cosine transform (DCT) and discrete wavelet transform (DWT) to retrieve features.Then, the fractional coefficient approach is used to reduce the dimension of these merged features.Finally, these reduced features are fed to seven ML classifiers to differentiate between different subgroups of cervical cancer leading to an accuracy of 81.11%.In another study [33], the authors used C-means clustering to segment cervical cells and then extracted texture features, including GLCM and geometrical descriptors from these cells.Subsequently, the authors used principal component analysis (PCA) to decrease the size of the features.Later, KNN was employed to classify cervical cells reaching an accuracy of 94.86%.Similarly, the research article [34] used C-means clustering to segment cervical cells and then attained shape and textural features such as the binary histogram Fourier algorithm (BHF).Next, the quantum-based grasshopper computing algorithm (QGH) was utilized to select features and apply these selected features as inputs to classifiers.Whereas the article [35] presented a CAD based on two phases.The first phase's goal was to extract texture descriptors from the cytoplasm and nucleolus together.
The pap smear slides were segmented using a thresholding method.Then, to describe the local textural features, a texture descriptor called modified uniform local ternary patterns (MULTP) was proposed.Second, these descriptors were fed to an artificial neural network where its parameters were optimized using a genetic algorithm attaining accuracy of 98.9%.

DL-Based CADs for Cervical Cancer Diagnosis
ML-based CAD models are more efficient and have lesser computational costs, but their accuracy is typically limited.Primary components and complementary clues may be neglected by extracting features and then selecting among them [36].Given the difficult detection tasks of abnormal pap smear slides [37], focusing solely on hand-crafted features could be insufficient to obtain the interconnections of cell attributes.DL-based CAD methods, apart from ML-based techniques, are not hampered by drawbacks in extracting features and selecting descriptors.CNN approaches are the most commonly used DL techniques for image analysis [38].A CAD model was presented [39], it is divided into three sections: cervical cell segmentation, DL-based cervical cell identification, and envisioned human-assisted classification.Images are first segmented employing sped-up robust features (SURF) and Otsu thresholding methodologies to retrieve cell images.Such photos are then forwarded to CompactVGG.Lastly, the envisioned human-assisted diagnosis layer accomplishes classification by incorporating the visualization performance and the classification results of all cell images.Conversely, using morphological operations, the study [40] image contents of cervical cancer slides.The fragmented photos were then subjected to the dual-tree complex wavelet transform (DTCWT).The DTCWT output was inputted into an altered ResNet-18 model, which achieved 97.98% accuracy.In contrast, the research article [41] presented an adapted firefly optimization technique with a DL algorithm.The suggested framework initially utilized a filtering method to eliminate noise.Furthermore, the influenced areas are identified via an entropy-based segmentation technique.The EfficientNet model was also employed to develop features.Finally, an image was classified using a Stacked Sparse Denoising Autoencoder (SSDA) method.Alternatively, the authors created a novel network for pap smear image analysis based on an adaptive pruning deep transfer learning approach [30].The network was enhanced by altering the convolution layer and removing some convolution kernels that could interact with the intended classification problem.The highest level of accuracy achieved was 98%.

Hybrid Based CADs for Cervical Cancer Diagnosis
Conversely, other studies employed hybrid CNNs to perform the classification task.Likewise, the CAD proposed in [42] obtained deep features from four CNNs including ResNet-50, VGG-16, VGG-19, and Xception, and then concatenated them.While the study [8] employed six distinct CNN structures and combined predictions of the best three CNNs to form an ensemble classification system achieving an AUC of 97%.Alternatively, the study [43] collected high-level features from ShuffleNet and a custom-designed network named Cervical Net.Then, canonical component analysis (CCA) was employed to combine these features, resulting in 544 attributes.Such attributes were then used as inputs to several classifiers, resulting in a 99.1% accuracy.On the other hand, the study [44] implemented a CAD and named it CVM-Cervix, which merged DL features of Xception with a visual transformer and used them to train a multilayer perceptron classifier achieving an accuracy of 91.72%.
Most existing CAD relied on a single classification structure, such as DL or handcrafted methods, which have heavy computational complexity and low accuracy.Few studies have combined both DL features with handcrafted descriptors.Among these studies, the paper [45] combined automated features obtained by VGG-16 and handcrafted features involving geometric and texture features.The study employed a correlation feature selection method to decrease features dimension and then an SVM classifier reaching an accuracy of 98.7%.Likewise, the study [46] merged 29 different attributes from several domains with deep features to improve classification results.The experimental results showed that merging features from multiple domains including handcrafted and deep features was capable of improving the F1-score by 3.2%.

Motivation
Despite the fact that DL models produce competitive results, the extracted features lack clear significance and artificial foreknowledge.The high reliance on massive data and manual labels increases the complexities in medical evaluation.Additionally, DL requires countless parameters to be updated and fine-tuned.Furthermore, the current methods are often solitary models operating in a single domain; however, hybrid systems operating in multiple domains that can accomplish the classification process more effectively are not widely used in the cervical cancer diagnosis literature [45,47,48], which inspired our work to develop a hybrid CAD based on DL and handcrafted descriptors for accurate cervical cancer diagnosis.Moreover, the majority of current methods relied on DL models that have many deep layers with a huge number of parameters that need high computational ability and extremely long training duration.Nevertheless, the proposed CAD is based on lightweight CNNs models with fewer layers and parameters.Furthermore, many existing CADs depend on obtaining deep features from a single CNN, however, retrieving deep features from CNN of various architectures is superior.Thus, the proposed CAD extracts multiple DL features from three CNNs having different structures.In addition, it employs a feature reduction method to fuse features obtained from the multiple domains (DL spatial features and handcrafted features) resulting in a lower number of features.In contrast to existing CAD, the proposed CAD does not depend on any image pre-segmentation or enhancement steps.

Mendeley LBC Pap Smear Slides Dataset
Among the popular cervical screening tools is LBC.The dataset acquired in Mendeley LBC [49] includes 963 photos split into 4 classes that represent the different classes of precancerous and cancerous tumors of cervical cancer according to The Bethesda System.The no intraepithelial malignancy, or normal, class provides 613 photos, whereas the anomalous category contains the remainder of 350 photos (as shown in Figure 1).These pap smear slides are acquired from 460 infected cases at 40× magnification factor and afterward gathered and arranged utilizing LBC methodology.
Appl.Sci.2023, 13, x FOR PEER REVIEW 5 of 23 features involving geometric and texture features.The study employed a correlation feature selection method to decrease features dimension and then an SVM classifier reaching an accuracy of 98.7%.Likewise, the study [46] merged 29 different attributes from several domains with deep features to improve classification results.The experimental results showed that merging features from multiple domains including handcrafted and deep features was capable of improving the F1-score by 3.2%.

Motivation
Despite the fact that DL models produce competitive results, the extracted features lack clear significance and artificial foreknowledge.The high reliance on massive data and manual labels increases the complexities in medical evaluation.Additionally, DL requires countless parameters to be updated and fine-tuned.Furthermore, the current methods are often solitary models operating in a single domain; however, hybrid systems operating in multiple domains that can accomplish the classification process more effectively are not widely used in the cervical cancer diagnosis literature [45,47,48], which inspired our work to develop a hybrid CAD based on DL and handcrafted descriptors for accurate cervical cancer diagnosis.Moreover, the majority of current methods relied on DL models that have many deep layers with a huge number of parameters that need high computational ability and extremely long training duration.Nevertheless, the proposed CAD is based on lightweight CNNs models with fewer layers and parameters.Furthermore, many existing CADs depend on obtaining deep features from a single CNN, however, retrieving deep features from CNN of various architectures is superior.Thus, the proposed CAD extracts multiple DL features from three CNNs having different structures.In addition, it employs a feature reduction method to fuse features obtained from the multiple domains (DL spatial features and handcrafted features) resulting in a lower number of features.In contrast to existing CAD, the proposed CAD does not depend on any image pre-segmentation or enhancement steps.

Mendeley LBC Pap Smear Slides Dataset
Among the popular cervical screening tools is LBC.The dataset acquired in Mendeley LBC [49] includes 963 photos split into 4 classes that represent the different classes of pre-cancerous and cancerous tumors of cervical cancer according to The Bethesda System.The no intraepithelial malignancy, or normal, class provides 613 photos, whereas the anomalous category contains the remainder of 350 photos (as shown in Figure 1).These pap smear slides are acquired from 460 infected cases at 40× magnification factor and afterward gathered and arranged utilizing LBC methodology.

Design of the Introduced CAD
The introduced CAD has a series of five steps involving pap smear image preparation, DL feature extraction, handcrafted descriptors mining, multi-domains feature combination and reduction, and diagnosis.In the first step, pap smear slides passed through several preparation phases, such as dimension alteration and augmentation.Subsequently, three compact pre-trained CNNs were constructed and retrained using these im-

Design of the Introduced CAD
The introduced CAD has a series of five steps involving pap smear image preparation, DL feature extraction, handcrafted descriptors mining, multi-domains feature combination and reduction, and diagnosis.In the first step, pap smear slides passed through several preparation phases, such as dimension alteration and augmentation.Subsequently, three compact pre-trained CNNs were constructed and retrained using these images.Spatial DL features were obtained from these CNNs.In the meantime, these images were also employed to retrieve handcrafted features, including several texture features and statistical features from multiple domains.Next, those handcrafted features were integrated along with the multiple DL features using PCA to lower their dimension in the multi-domains feature combination and reduction step.Ultimately, multiple SVM classifiers were applied to these decreased descriptors to perform the diagnosis procedure.The workflow of the introduced CAD is displayed in Figure 2.

Design of the Introduced CAD
The introduced CAD has a series of five steps involving pap smear image preparation, DL feature extraction, handcrafted descriptors mining, multi-domains feature combination and reduction, and diagnosis.In the first step, pap smear slides passed through several preparation phases, such as dimension alteration and augmentation.Subsequently, three compact pre-trained CNNs were constructed and retrained using these images.Spatial DL features were obtained from these CNNs.In the meantime, these images were also employed to retrieve handcrafted features, including several texture features and statistical features from multiple domains.Next, those handcrafted features were integrated along with the multiple DL features using PCA to lower their dimension in the multi-domains feature combination and reduction step.Ultimately, multiple SVM classifiers were applied to these decreased descriptors to perform the diagnosis procedure.The workflow of the introduced CAD is displayed in Figure 2.

Pap Smear Image Preparation
Pap smear training images were initially augmented to increase the amount of data available in the training procedure.Augmentation is an important procedure that is usually performed to improve the learning procedure of the DL models and prevent overfitting.There are numerous methods for augmentation, in the introduced CAD, flipping, rotation, scaling, and shearing techniques were employed.Next, the dimensions of the augmented images, as well as the original images of the entire Mendeley LBC dataset, were changed to fit the input layers size of the three CNNs which is equal to 224 × 224 × 3.These CNNs included MobileNet, ShuffleNet, and ResNet-18.CNN models' effectiveness is because of the complexity and intensity of their structure.The quantity of parameters included in the model grows in proportion to the model's complexity [50].During the learning phase, presented CNN models involved huge amounts of hyperparameter adjustments.Nevertheless, the substantial number of parameters could lower the network's generalization performance and end up causing overfitting [51].The reduction of parameters and layers by using compact DL models are ways to prevent overfitting caused by the model's complexity [22,50].As a result, three compact DL models were utilized in this study.
MobileNet has been used in this study since it is a compact CNN with fewer parameters and deeper layers that can accomplish accurate results despite being compact.The depth-wise separable convolution [52] is the fundamental basis of the MobileNet layout.The main benefit of depth separable convolution compared to standard convolution is the requirement for less computation time when dealing with huge and complex convolutional networks [53].Similarly, ShuffleNet is a compact CNN, however, it employs channel shuffle and pointwise group convolution to decrease computation expense while retaining precision.When training ShuffleNet with the ImageNet dataset, it attained less top-1 error than the MobileNet and obtained a 13× real increase in speed over AlexNet while preserving similar performance [54].On the other hand, ResNet-18 basic element is the residual module.These residual blocks are using a skip linkage or "shortcut" among every two layers in addition to direct links between all layers.The above enables the network to consider taking activation through one layer and give it into another deep layer in the CNN, thereby enduring the network's learning hyper-parameters in deeper layers.

DL Feature Extraction
Building a CNN from scratch requires a huge amount of data and updating this network's enormous number of parameters, which increases the complexity of the training procedure.Instead, transfer learning (TL) can be used to solve these challenges.TL is a prevalent ML method that permits the reusability of an effectual CNN model constructed to deal with one problem using a huge dataset like ImageNet, as a preliminary step for handling some other classification issue in a relevant area.TL could significantly reduce the requirement for enormous computation power and model development time [55].For this reason, TL was employed to reuse three CNNs that were formally trained on ImageNet to tackle the problem of a cervical cancer diagnosis.Initially, TL was utilized to alter the total sum of fully connected (FC) layers of each of the MobileNet, ShuffleNet, and ResNet-18 compact CNNs to be equal to 4 corresponding to the categories of the Mendeley LBC dataset.Afterward, these CNNs were relearned with the slides of the Mendeley LBC dataset.When the relearning process was done, TL was further applied to extract DL features from the final FC layers of each DL structure.Each CNN was made up of several deep layers; preliminary layers discovered basic components from a photo, while fairly late deep layers learned high-level detailed characteristics out of the photo.As a result, the very last FC layers before the softmax layer was chosen to retrieve feature representations.The length of features obtained from every CNN was 4.

Handcrafted Descriptors Mining
Several textural descriptors were mined from pap smear photos from the spatial and time-frequency domains.Spatial domain features include GLCM, whereas time-frequency features involve DWT as well as GW.Additionally, numerous statistical features were obtained from the spatial domain.This section will describe the methods of feature extraction.

Spatial Statistical and Texture Features
A technique for extracting statistical features from a signal or image is known as statistical explanatory feature extraction.Nine statistical features, including variance, Root Mean Square (RMS), kurtosis, entropy, mean, skewness, Inverse Difference Moment (IDM) [56], smoothness, and standard deviation (std), were among the statistical variables.Moreover, four texture features were calculated from the spatial domain involving contrast, energy, correlation, and homogeneity.Equations used to extract these features from the pap smear slides are shown below (1)- (17). 2  (2) where A(i,j) is the pixel value at location i and j in an image, µ is the mean, G is the number of grey levels.pr i is the probability of a pixel having gray level g, N and M are the length and width of the image.

Grey Level Co-Occurrence Matrix Texture Features
The GLCM strategy is a 2nd order statistical method that calculates the intensity of adjacent pixels in an image that has identical grey-levels and applies extra information acquired from spatial pixel relations [57].A co-occurrence matrix was used to retrieve textural details about grey-level transfers among two pixels.This co-occurrence matrix illustrated the common distribution of gray-level pairs of adjacent pixels given a spatial association described between pixels in a texture.As a result, changing the spatial correlation yields matrices with various data (different directions or distances between pixels).Such matrices were used to extract descriptors.The dimension of the co-occurrence matrix was determined solely by the texture's grey levels and not by the image size [58].The four orientations used in this study were 0, 45, 90, and 135, and the number of grey levels was 8. Four GLCM texture features were determined involving contrast, correlation, energy, and homogeneity; the same statistical features were also calculated from the co-occurrence matrix of GLCM.
Appl.Sci.2023, 13, 1916 where P(i,j) is a marginal joint probability gray-level co-occurrence matrix.x and y are two adjacent pixels.

Discrete Wavelet Transform Textural Features
The discrete wavelet transforms (DWT) a popular image processing technique, analyses images in both time-frequency domains.DWT utilizes filter banks made up of several filters to break down images into low and high pass elements [59].The low pass portion includes details about slowly changing image attributes, whereas the high pass part provides details about dramatic shifts in image characteristics.The coefficients obtained by applying low-pass filtering to both the rows and columns of the image are called low-low (LL).Such coefficients reflect the entire energy within photos.Whereas, if low pass filtration is imposed on the row values and high pass filtration is implemented to the column values, the resulting coefficients are named high-low HL which include the image's vertical information.While the low-high (LH) coefficients are produced by high pass filtering of rows and low pass filtering of columns that encompass the image's horizontal description.Finally, high pass filtration, including the row and column qualities yields the HH coefficients, that hold the image's diagonal description.To obtain the following stiffer scale of wavelet coefficients, decomposition was performed on sub-band LL.In the current study, the "Haar" wavelet function was employed, and the number of decomposition levels is 4. The fourth LL sub-band was further analyzed using GLCM and then the previous 13 features were calculated after this analysis (4 GLCM features and 9 statistical features).

Gabor Wavelet Transform Textural Features
The Gabor wavelet (GW) transform is a widely known feature extraction methodology.The outcomes of the GW transform include both real and imaginary components.Because it can yield discriminatory details in a variety of orientations and dimensions, GW is commonly utilized in the area of medical image analysis.By converging the Gabor kernels with the image, GW sub-bands showing different levels and directions are produced [60,61], which is its main advantage.Therefore, GW transform was used in this study to analyze pap smear images and extract textural descriptors.The number of features obtained after GW was 42.

Multi-Domains Feature Combination and Reduction
The multidomain features obtained in the previous step were combined in three scenarios.First, the whole handcrafted descriptors were concatenated.Next, the combined handcrafted descriptors were concatenated independently with each DL feature set acquired from every CNN.Later, the three DL feature sets retrieved from the three CNNs were fused with the merged handcrafted features using PCA.PCA is an unsupervised statistical method for obtaining information from multivariate sets of data.Such procedure is accomplished by determining the principal components (PC), that represent the linear mixtures of the genuine attributes.The initial principal component depicts the greatest variability of the authentic multivariate dataset, while the second describes the highest variances of the remaining data set.Whereas the third would then explain the most significant variability in the subsequent leftover dataset, and so on.In multidimensional data space, the eigenvalues of the entire PCs are orthonormal to each other, based on the hypothesis of least squares.For this reason, PCA was employed to fuse multi-domain features where a reduced set of features were generated after PCA.

Diagnosis
For the diagnosis step, several SVM classifiers of different kernels were adapted to classify pap smear images.These kernels include linear, quadratic, cubic, and gaussian.
Note that, the diagnosis procedure was executed in three configurations.The former configuration corresponded to using individual and combined handcrafted features to train the SVMs.Afterward, the following configuration involved the procedure where SVM was fed with each DL feature set individually concatenated with the combined handcrafted features.In the final configuration, the SVMs were constructed with the fused features of PCA obtained after combining all DL features along with the combined handcrafted features, 5-fold cross-validation methodologywais employed to validate the performance of the proposed CAD.

Networks Hyper-Parameters
The three CNNs had some hyper-parameters that were adjusted before the re-training procedure.The mini-batch dimension was chosen to be 4, the iterations of epochs were 30, the learning pace was 0.0001, and, finally, the frequency of validation was 169.The stochastic gradient descent with momentum approach was utilized to learn the three CNNs.In reference [62], it was revealed that increasing the batch size diminished the CNN model's effectiveness, as measured by the network's generalisability.In both the testing and training procedures, large batch dimensions usually corresponded to sharp minimizers.Sharp minima decreased the generalization of the results.Conversely, a very small size commonly converged to soft minimizers and generally attained the highest generalization performance [63], so it was selected to be only 30.While functioning toward the least error function, the learning rate indicated the step size at every iteration.High learning rates permitted the model to train speedily, but at the expense of an unsatisfactory final set of weights.Lower learning rates, on the other hand, may have allowed the model to comprehend a slightly more optimal, or even globally optimal, set of weightsresulting in a reasonably long training period.Besides that, high learning rates will lead to significant weight updates, changing the model's efficiency remarkably over the training phase.Weight deviation causes varying performance.Slow learning rates, on the other hand, may never converge or even become able to stick at a suboptimal remedy.As a result, the learning rate was set to 0.0001 in the experimental observations, a value that is not too low or too high.Furthermore, the validation frequency was adjusted to 169 to determine the validation error only once at the end of each training epoch.

Results
This section illustrates the three diagnostic configurations executed in the diagnosis step.First, the results of SVM classifiers fed with the individuals and combined handcrafted feature sets were demonstrated (Configuration I).Next, the performance of the SVM classifiers separately trained with each DL feature set obtained for every CNN was compared to that when concatenated with the combined handcrafted features (Configuration II).Finally, the whole DL feature sets of the three CNNs were fused with the combined handcrafted features using PCA to lower their dimension.

Configuration I SVM Classifiers Performance
Configuration I SVM classifiers' performance is illustrated in this section.Table 1 shows a comparison among four SVM classifiers constructed with each handcrafted descriptor set independently with the fusion of all handcrafted features obtained from the multiple domains.The table demonstrates that among individually handcrafted descriptors, the SVM classifiers learned with the statistical features obtained from the spatial domain achieved the highest accuracy of 85.7%, 92.8%, 91.0%, and 90.2% for linear, quadratic, cubic, and gaussian SVMs, respectively.In contrast to features extracted using DWT, GLCM, and GW, features based on statistics yielded far fewer relevant, non-redundant, easy to interpret, and distinctive features [64], therefore statistical features obtained higher results.However, these accuracies were boosted to reach 87.8%,94.2%,95.1%, and 91.3% using the same classifiers when trained with the combined handcrafted descriptors.The studies [33,34] showed that combining multiple handcrafted features can lead to the enhancement of the performance of classification.Thus, these results verify that fusing handcrafted features from multiple domains can improve the results and is superior to using features from a single domain.

Configuration II SVM Classifiers Performance
The results of concatenating each DL feature set with the combined handcrafted features are shown in Table 2.The results in Table 2 indicate that merging the DL features of each CNN with the combined handcrafted features enhanced the diagnostic performance.This is clear as the accuracies attained after integrating both feature types were 99.1%, 99.2%, 99.2%, and 98.8% for linear, quadratic, cubic, and gaussian SVMs, respectively, (trained with the DL features of MobileNet and the combined handcrafted features) which are better than those attained before fusion.Similarly, when fusing the DL features of DarkNet-19 and the combined handcrafted features, the accuracies reached 99.2%, 99.6%, 99.6%, and 97.9 % for the same classifiers.Additionally, when incorporating the DL features of ResNet-18 and the combined handcrafted features, both the quadric and cubic SVMs have higher accuracies than that obtained by using either group of features.Note these accuracies are attained with 85 features.As indicated in studies [65][66][67], fusing DL features of an individual CNN with multiple handcrafted features improved classification accuracy.For this reason, the results in Table 2 confirm this statement of combining deep features obtained from one CNN with ensemble handcrafted features boosts classification performance.

Configuration III SVM Classifiers Performance
The performance of the SVMs trained with the fusion of the three DL feature sets with the combined handcrafted features using PCA is shown in this section.An ablation study was made to show the number of PC versus the diagnostic accuracy and is displayed in Table 3.As illustrated in Table 3, for the linear SVM the highest accuracy of 99.7% was attained with 20 PCs, whereas for quadratic SVM, the peak accuracy of 100% was achieved with 35 PCs.On the other hand, for the cubic SVM, the maximum accuracy of 99.9% was accomplished by 40 PCs.In the case of gaussian SVM, 99.3% peak accuracy was reached with 10 PCs.These accuracies were greater than those obtained in the previous scenarios, except for the gaussian SVM.The studies [45,68] showed that combining multiple CNN features with multi-domain handcrafted features is capable of enhancing the diagnostic performance, therefore the results demonstrated in Table 3 were better than those obtained in Tables 1 and 2, which were based on either using handcrafted features only, or merging deep features attained from a single CNN with handcrafted features.This is because using hand-crafted features may only prevent the classification model from mining the interrelatedness of cervical cancer features.Furthermore, the current CADs are mostly relying on single models working in the time/spatial domain, while hybrid approaches operating in multiple domains can attain better performance [45].On the other hand, despite the fact that DL models produce promising results, depending on deep features alone has limitations.This is because the obtained deep features lose physical significance and arbitrary previous understanding.The CNN structure has many parameters and is heavily reliant on massive data and manual labels, which adds extra to the challenges in medical applications [45].The confusion matrices for the linear, quadratic, cubic, and gaussian SVMs trained with the fused DL features and the combined handcrafted features via PCA are shown in Figure 3.The receiving operating characteristics (ROC) curve and the area under ROC (AUC) for the quadratic SVM classifier trained with the 35 PCs are shown in Figure 4. Figure 4 shows that the AUC is equal to 1.
More evaluation indices were used to access the performance of the presented CAD and are revealed in Table 4.Such indices involve sensitivity, specificity, accuracy, F1-score, precision, and MCC.Table 4 demonstrates that all of these metrics were equal to 1 for the quadratic SVM.These results prove that the proposed system is reliable.only, or merging deep features attained from a single CNN with handcrafted features.This is because using hand-crafted features may only prevent the classification model from mining the interrelatedness of cervical cancer features.Furthermore, the current CADs are mostly relying on single models working in the time/spatial domain, while hybrid approaches operating in multiple domains can attain better performance [45].On the other hand, despite the fact that DL models produce promising results, depending on deep features alone has limitations.This is because the obtained deep features lose physical significance and arbitrary previous understanding.The CNN structure has many parameters and is heavily reliant on massive data and manual labels, which adds extra to the challenges in medical applications [45].
The confusion matrices for the linear, quadratic, cubic, and gaussian SVMs trained with the fused DL features and the combined handcrafted features via PCA are shown in Figure 3.The receiving operating characteristics (ROC) curve and the area under ROC (AUC) for the quadratic SVM classifier trained with the 35 PCs are shown in Figure 4. Figure 4 shows that the AUC is equal to 1.More evaluation indices were used to access the performance of the presented CAD and are revealed in Table 4.Such indices involve sensitivity, specificity, accuracy, F1score, precision, and MCC.Table 4 demonstrates that all of these metrics were equal to 1 for the quadratic SVM.These results prove that the proposed system is reliable.

Comparison between Configurations Results
This section displays a comparison between the highest accuracy reached in each configuration along with the number of features used to train the SVMs. Figure 5 shows that the maximum accuracy attained in configuration II was higher than the one obtained in configuration I.These results prove that fusing multiple handcrafted features acquired from several domains with a single DL feature set could improve diagnostic performance.In addition, the peak accuracy attained in configuration III was greater than that achieved in configuration II.This confirms that fusing numerous DL features with multiple handcrafted descriptors mined from numerous domains with PCA was superior to employing either one individual DL feature set of a single CNN or one set of handcrafted features.The performance of configuration III also verifies that PCA is capable of reducing the dimension of features with an enhancement in diagnostic accuracy.

Comparison between Configurations Results
This section displays a comparison between the highest accuracy reached in each configuration along with the number of features used to train the SVMs. Figure 5 shows that the maximum accuracy attained in configuration II was higher than the one obtained in configuration I.These results prove that fusing multiple handcrafted features acquired from several domains with a single DL feature set could improve diagnostic performance.In addition, the peak accuracy attained in configuration III was greater than that achieved in configuration II.This confirms that fusing numerous DL features with multiple handcrafted descriptors mined from numerous domains with PCA was superior to employing either one individual DL feature set of a single CNN or one set of handcrafted features.The performance of configuration III also verifies that PCA is capable of reducing the dimension of features with an enhancement in diagnostic accuracy.

Discussion
This study proposes a CAD for the automated diagnosis of cervical cancer from pap smear LBC data.The presented CAD system is built via multiple CNNs as an alternative to utilizing individual CNN which benefit from their distinct constructions advantages.Instead of using features from a single domain.The introduced CAD extracts several descriptors from multiple domains.It retrieves multiple DL features from the spatial domain.Furthermore, it obtains numerous statistical and textural handcrafted features from spatial and time-frequency domains including DWT, GW transform, and GLCM.The diagnosis procedure is carried out in three configurations.The first one corresponds to training the SVMs with individual and combined handcrafted features.After that, the second configuration involves feeding SVM with each DL feature set individually concatenated with the combined handcrafted features.In the final configuration, the SVMs are built with the fused PCA features obtained by combining all DL features as well as the combined handcrafted features.

Comparative Performance Analysis
To prove the efficiency of the introduced CAD and its competing capacity, a comparison was executed to compare its performance with the state-of-the-art methods for cervical cancer diagnosis using the Mendeley LBC dataset.The methods described in Table 5 are summarized as follows.The study [69] cropped each photo in the database and used such clipped photos to learn DarkNet-19 and DarkNet-53 individually.Because the attributes extracted out of these CNNs were large, they were fed separately to neighborhood component analysis (NCA) to reduce their sizes.Lastly, the above lowered features were then utilized to construct an SVM classifier, which achieved accuracies of 98.26% and 99.47% for two datasets.In addition, the authors of reference [70] proposed a CAD that utilized a two-step dimension reduction strategy that utilized PCA and the grey wolf optimizer (GWO) that diminished features extracted from numerous CNN architectures.The lowered attributes were used to train an SVM classifier, which generated final predictions.Whilst the reference [71] procured confidence values from the Inception, MobileNet, and Incep-tionResNet CNNs, and then gathered such ratings using a fuzzy distance-based hybrid approach with multiple distance measures.On the other hand, the study [72] utilized three CNNs, including Inception V3, MobileNet V2, and Inception ResNet V2, with extra layers to discover data-specific attributes.The authors suggest a new ensemble methodology based on the reducing of error values to combine the results of such models using three distance measures.To calculate the final predictions, the authors defuzzified the above distance metrics using the product rule reaching an accuracy of 99.23%.Conversely, in [73], a cervical cell image creation model (CCG-taming transformers) and a classification model that used Tokens-to-Token Vision Transformers (T2T-ViT) with transfer learning was introduced, which attained 98.89% accuracy.Whereas the study [74] employed GoogleNet and ResNet Individually to extract features.Genetic algorithm was then utilized to select features.The 720 selected features were then used to train SVM classifier, reaching an accuracy of 99.07%.The results included in Table 5 confirm the superior capacity of the introduced CAD over other state-of-the-art methods.it is obvious from the table that the proposed CAD achieved greater performance measures than all of the competing approaches.The su-periority of the proposed CAD is due to using multiple compact CNNs to extract deep features which is not the case in the studies [69][70][71][72][73][74], which employed individual CNNs with more deep layers and huge parameters.Additionally, it combined features from multiple domains including time/spatial and time-frequency domains in contrast to methods that relied on extracting features from a single domain [69][70][71][72][73][74].In addition, it did not necessitate the use of any pre-segmentation process to reach accurate results like the study [69].Furthermore, it accomplished better results than studies [69,70,74] which used complicated methods to reduce features like genetic algorithms.These results were attained even with a lower number of features (35 PCs) compared to 1000, 796, and 730 used in [69,70,74].In addition, the results show that merging multiple DL features from several CNNs with different handcrafted features from multiple domains utilized by the proposed CAD is better than using features from a single domain as done by previous studies [69][70][71][72][73][74].

Limitations and Upcoming Work
The described CAD has numerous limitations such as it does not apply any fine-tuning or optimization in the CNNs.Furthermore, it is presently only used for the classification of pap smear images and has not been tested on other tasks.Furthermore, one of the study's constraints is that no correlation analysis was performed to figure out the link between deep learned/handcrafted representations and clinical findings in pap smear slides.Future directions will tackle these issues by first utilizing an optimization approach to fine-tune DL models' hyper-parameters.Second, upcoming work will conduct a correlation analysis to find out the association among handcrafted descriptors/DL features and clinical interpretations.In addition, the implementation of the presented CAD will be tested on new datasets of different imaging modalities and other classification problems.

Conclusions
Cervical cancer is the second most common cancer in women globally.It is critical to discover it as early as possible using low-cost, high-accuracy smart health monitoring systems, particularly in countries that have restricted medical resources.The complications of cervical cancer are extremely concerning.Healthcare experts have made considerable attempts to tackle this issue.Nevertheless, due to the growing population, it is indeed critical to investigate CAD methods in order to reduce the likelihood of human mistakes.Hence, this study proposed an automatic CAD to diagnose cervical cancer.The described CAD system was constructed using numerous CNNs rather than a single CNN, which gains from their distinguishable construction benefits.Rather than using attributes from a specific domain.The CAD provided here retrieves several descriptors from various domains.It obtains a vast group of DL features from the spatial domain.It also acquires a plethora of statistical and textural handcrafted features from the spatial and time-frequency domains, such as the DWT, GW transform, and GLCM.The diagnostic procedure is implemented in three different ways.The first is associated with training SVMs with independent and combined handcrafted features.The second configuration entails feeding SVM using each DL feature set separately and aggregated with the combined handcrafted features.The SVMs are built in the final configuration with the conjoined PCA features created by merging the entire DL features in addition to the combined handcrafted attributes.The accuracies obtained in configuration II were superior to the ones achieved in configuration I.These findings show that combining multiple handcrafted features from different domains with a single DL feature set can improve diagnostic performance.Furthermore, the performance attained in configuration III was better than the one in configuration II.This demonstrates that incorporating multiple DL features with numerous handcrafted descriptors mined from multiple domains using PCA outperforms whether utilizing one individual DL feature set of a single CNN or one set of handcrafted features.The result of configuration III furthermore demonstrates that PCA is able to decrease feature dimension while improving diagnostic accuracy.These findings and comparisons indicate that the proposed CAD based on multiple domains of feature fusion is effective for distinguishing between different kinds of cervical cancer photos.

Figure 1 .
Figure 1.Instances of the slides available in the Mendeley LBC dataset for class; (a) high squamous intraepithelial lesion, (b) low squamous intraepithelial lesion, (c) no intraepithelial malignancy, and (d) squamous cell carcinoma.

Figure 1 .
Figure 1.Instances of the slides available in the Mendeley LBC dataset for class; (a) high squamous intraepithelial lesion, (b) low squamous intraepithelial lesion, (c) no intraepithelial malignancy, and (d) squamous cell carcinoma.

Figure 1 .
Figure 1.Instances of the slides available in the Mendeley LBC dataset for class; (a) high squamous intraepithelial lesion, (b) low squamous intraepithelial lesion, (c) no intraepithelial malignancy, and (d) squamous cell carcinoma.

Figure 2 .
Figure 2. Introduced CAD's workflow for cervical cancer diagnosis via pap smear slides.4.2.1.Pap Smear Image PreparationPap smear training images were initially augmented to increase the amount of data available in the training procedure.Augmentation is an important procedure that is usually performed to improve the learning procedure of the DL models and prevent overfitting.There are numerous methods for augmentation, in the introduced CAD, flipping, rotation, scaling, and shearing techniques were employed.Next, the dimensions of the augmented images, as well as the original images of the entire Mendeley LBC dataset, were changed to fit the input layers size of the three CNNs which is equal to 224 × 224 × 3.These CNNs included MobileNet, ShuffleNet, and ResNet-18.CNN models'

Figure 2 .
Figure 2. Introduced CAD's workflow for cervical cancer diagnosis via pap smear slides.

4. 3 .
Evaluation Indices and Networks Hyper-Parameters 4.3.1.Evaluation Indices To consider the efficiency of the suggested CAD, four indices were computed: True Positive (TP), True Negative (TN), False Negative (FN), and False Positive (FP).Such indices describe the example numbers that are perfectly or incorrectly recognized as positives or negatives.The obtained indices are used to compute evaluation metrics, such as sensitivity, specificity, accuracy, F1-score, precision, and Mathew correlation coefficient (MCC).The following equation describes the evaluation metrics:

Figure 4 .
Figure 4. ROC curves of the quadratic SVM constructed with PCs generated after fusing the entire DL features with the combined handcrafted features when the positive class is: (a) High squamous intraepithelial, (b) Negative for intraepithelial (c) Low squamous intraepithelial, (d) Squamous cell carcinoma.

Figure 5 .FeaturesFigure 5 .
Figure 5.A comparison between (a) the highest accuracy reached in each configuration and (b) the number of features used to train the SVMs.

Table 1 .
The diagnostic accuracy (%) of the SVM classifiers trained with individuals and combined handcrafted features set from multiple domains.

Table 2 .
The diagnostic accuracy (%) of the SVM classifiers trained with independent DL Feature sets and concatenated with the combined handcrafted features set from multiple domains.

Table 3 .
The confusion matrices for the linear, quadratic, cubic, and gaussian SVMs trained with the fused DL features and the combined handcrafted features via PCA.

Table 4 .
Evaluation indices values calculated from the SVMs in configuration III.

Table 4 .
Evaluation indices values calculated from the SVMs in configuration III.

Table 5 .
Comparative analysis between state-of-the-art methods and the introduced CAD constructed with the Mendeley LBC dataset.