Chaotic Sparrow Search Algorithm with Deep Transfer Learning Enabled Breast Cancer Classification on Histopathological Images

Simple Summary Cancer is considered the most significant public health issue which severely threatens people’s health. The occurrence and mortality rate of breast cancer have been growing consistently. Initial precise diagnostics act as primary factors in improving the endurance rate of patients. Even though there are several means to identify breast cancer, histopathological diagnosis is now considered the gold standard in the diagnosis of cancer. However, the difficulty of histopathological image and the rapid rise in workload render this process time-consuming, and the outcomes might be subjected to pathologists’ subjectivity. Hence, the development of a precise and automatic histopathological image analysis method is essential for the field. Recently, the deep learning method for breast cancer pathological image classification has made significant progress, which has become mainstream in this field. Therefore, in this work, we focused on the design of metaheuristics with deep learning based breast cancer classification process. The proposed model is found to be an effective tool to assist physicians in the decision making process. Abstract Breast cancer is the major cause behind the death of women worldwide and is responsible for several deaths each year. Even though there are several means to identify breast cancer, histopathological diagnosis is now considered the gold standard in the diagnosis of cancer. However, the difficulty of histopathological image and the rapid rise in workload render this process time-consuming, and the outcomes might be subjected to pathologists’ subjectivity. Hence, the development of a precise and automatic histopathological image analysis method is essential for the field. Recently, the deep learning method for breast cancer pathological image classification has made significant progress, which has become mainstream in this field. This study introduces a novel chaotic sparrow search algorithm with a deep transfer learning-enabled breast cancer classification (CSSADTL-BCC) model on histopathological images. The presented CSSADTL-BCC model mainly focused on the recognition and classification of breast cancer. To accomplish this, the CSSADTL-BCC model primarily applies the Gaussian filtering (GF) approach to eradicate the occurrence of noise. In addition, a MixNet-based feature extraction model is employed to generate a useful set of feature vectors. Moreover, a stacked gated recurrent unit (SGRU) classification approach is exploited to allot class labels. Furthermore, CSSA is applied to optimally modify the hyperparameters involved in the SGRU model. None of the earlier works have utilized the hyperparameter-tuned SGRU model for breast cancer classification on HIs. The design of the CSSA for optimal hyperparameter tuning of the SGRU model demonstrates the novelty of the work. The performance validation of the CSSADTL-BCC model is tested by a benchmark dataset, and the results reported the superior execution of the CSSADTL-BCC model over recent state-of-the-art approaches.

Abstract: Breast cancer is the major cause behind the death of women worldwide and is responsible for several deaths each year. Even though there are several means to identify breast cancer, histopathological diagnosis is now considered the gold standard in the diagnosis of cancer. However, the difficulty of histopathological image and the rapid rise in workload render this process time-consuming, and the outcomes might be subjected to pathologists' subjectivity. Hence, the development of a precise and automatic histopathological image analysis method is essential for the field. Recently, the deep learning method for breast cancer pathological image classification has made significant progress, which has become mainstream in this field. This study introduces a novel chaotic sparrow search algorithm with a deep transfer learning-enabled breast cancer classification (CSSADTL-BCC) model on histopathological images. The presented CSSADTL-BCC model mainly focused on the recognition and classification of breast cancer. To accomplish this, the CSSADTL-BCC model primarily applies the Gaussian filtering (GF) approach to eradicate the occurrence of noise. In addition, a MixNet-based feature extraction model is employed to generate a useful set of feature vectors. Moreover, a stacked gated recurrent unit (SGRU) classification approach is exploited to allot class labels. Furthermore, CSSA is applied to optimally modify the hyperparameters involved in the SGRU model. None of the earlier works have utilized the hyperparametertuned SGRU model for breast cancer classification on HIs. The design of the CSSA for optimal hyperparameter tuning of the SGRU model demonstrates the novelty of the work. The performance validation of the CSSADTL-BCC model is tested by a benchmark dataset, and the results reported the superior execution of the CSSADTL-BCC model over recent state-of-the-art approaches.

Introduction
Cancer is considered the most significant public health issue which severely threatens people's health. The occurrence and mortality rate of breast cancer (BC) have been growing consistently. Initial precise diagnostics act as primary factors in improving the endurance rate of patients [1]. A mammogram is the starting stage of initial prognosis; hence, it becomes hard to detect cancer in the denser breasts of teenage women. X-ray radiation warns radiologists of the patient's health [2]. The golden standard for BC prognosis is only pathological examination. Pathological examinations generally attain tumor samples via excision, puncture, etc. [3]. Hematoxylin combines deoxyribonucleic acid (DNA), and eosin combines proteins. The precise prognosis of BC demands proficient histopathologists, and it needs more time and endeavor to finish this work. Moreover, the prognosis outcomes of distinct histopathologists are dissimilar and heavily based on histopathologists' earlier experience [4].
Recently, BC prognosis is dependent on the histopathological image, and this is confronted by three major difficulties. At first, there is a shortcoming of proficient histopathologists across the globe, particularly in quite a few undeveloped regions and small hospitals [5]. Next, the prognosis of histopathologists is subjective, and evaluation is not performed on an objective basis. Whether prognosis is right or not is wholly based on the histopathologists' earlier knowledge [6]. Lastly, the prognosis of BC depends on the histopathological image, which is time consuming, highly complex, and labor-intensive, and it is considered ineffective during the era of big data. Despite such issues, an objective and effective BC prognosis technique is essential for mitigating the pressure of the workload of histopathologists [7]. The speedy advancement of computer-aided diagnosis (CAD) was slowly employed in the clinical domain. The CAD system will not act as a substitute for the physician; however, it can be utilized as a "second reader" in assisting the physician in recognizing diseases [8]. However, there are false-positive areas identified by the computer that will consume time for the physician in evaluating the outcomes induced by the computer, again leading to a decline in effectiveness and preciseness. Thus, methods for improving the sensitiveness of computer-aided tumor identification methodologies while greatly minimizing the incorrect positive identification rate and enhancing the efficiency of the identification technique constitute a potential research area [9].
Currently, deep learning (DL) methods have become popular in computer vision (CV), particularly in biomedical image processing. These methods were able to investigate complex and enhanced characteristics from images automatically. At the same time, these methods greatly require the attention of several authors in using such techniques to categorize BC histopathology images [10]. In particular terms, convolutional neural networks (CNNs) are broadly utilized in image-based works because of their capabilities to efficiently distribute variables over several layers inside a DL method.
This study introduces a novel chaotic sparrow search algorithm with a deep transfer learning-enabled breast cancer classification (CSSADTL-BCC) model applied on histopathological images. The presented CSSADTL-BCC model applies the Gaussian filtering (GF) approach to eradicate the occurrence of noise. In addition, a MixNet-based feature extraction model was employed to generate a useful set of feature vectors. Furthermore, a CSSA with a stacked gated recurrent unit (SGRU) classification approach was exploited to allot class labels. The CSSADTL-BCC model does not exist in the literature to the best of our knowledge. The design of the CSSA for optimal hyperparameter tuning of the SGRU model demonstrates the novelty of the work. The performance validation of the CSSADTL-BCC model was verified using benchmark data collection, and the outcomes were inspected under different evaluation measures.
The remaining sections of the paper are planned as follows. Section 2 indicates the existing works related to BC classification. Next, Section 3 elaborates the proposed model, and Section 4 offers the performance validation. At last, Section 5 draws the conclusions.

Literature Review
In [11], the authors proposed a real time data augmentation-related transfer learning method to resolve existing limitations. Two popular and well-established image classification methods, such as Xception and InceptionV3 frameworks, have been trained on a freely accessible BC histopathological image data named BreakHis. Alom et al. [12] presented a technique for classifying BC using the Inception Recurrent Residual Convolution Neural Network (IRRCNN) framework. The proposed method is an effective DCNN system that integrates the strength of the Recurrent Convolution Neural Network (RCNN), Inception Network (Inception-v4), and the Residual Network (ResNet). The experiment result illustrates better performance against RCNN, Inception Network, and ResNet for object-detection tasks.
Vo et al. [13] presented a technique that employs the DL method with a convolution layer for extracting the visual feature for BC classification. It has been found that the DL model extracts the most useful feature when compared to the handcrafted feature extraction approach. In [14], the authors proposed a BC histopathological image categorization related to deep feature fusion and enhanced routing (FE-BkCapsNet) to exploit CapsNet and CNN models. Firstly, a new architecture with two channels could simultaneously extract capsule and convolutional features and incorporate spatial and sematic features into the new capsule to obtain a discriminative dataset.
The researchers in [15] proposed a patch-based DL method named Pa-DBN-BC for classifying and detecting BC on histopathology images with the Deep Belief Network (DBN). The feature is extracted by supervised finetuning and unsupervised pre-training phases. The network extracts feature automatically from image patches. Logistic regression is utilized for classifying the patches from histopathology images. In [16], the authors proposed a robust and novel technique based convolution-LSTM (CLSTM) learning method, the pre-processing method with the optimized SVM classifier, and the markercontrolled watershed segmentation algorithm (MWSA) for automatically identifying BC. Saxena et al. [17] presented a hybrid ML method for solving class imbalance problems. The presented method uses the kernelized weighted ELM and pre-trained ResNet50 for CAD of BC using histopathology.
Several automated breast cancer classification models are available in the literature. However, the models still contains a challenging problem. Because of the continual deepening of models, the number of parameters of DL models also increases quickly, which results in model overfitting. At the same time, different hyperparameters have a significant impact on the efficiency of the CNN model, particularly in terms of the learning rate. Modifying the learning rate parameter for obtaining better performance is also required. Therefore, in this study, we employ the CSSA technique for the hyperparameter tuning of the SGRU model.

The Proposed Model
In this study, a new CSSADTL-BCC model was developed to classify BC on histopathological images. The presented CSSADTL-BCC model mainly focused on the recognition and classification of BC. At the primary stage, the CSSADTL-BCC model employed the GF technique to eradicate the occurrence of noise. It was then followed by using a MixNetbased feature extraction model employed to produce a useful set of feature vectors. Then, the CSSA-SGRU classifier was exploited to allot class labels. Figure 1 illustrates the overall process of the CSSADTL-BCC technique.

Image Pre-Processing
At the primary stage, the CSSADTL-BCC model employed the GF technique to eradicate the occurrence of noise. GF is a bandpass filter, viz., efficiently implemented in machine vision and image processing applications [18]. A two-dimensional Gabor purpose was oriented by sinusoidal grates controlled by two dimensional Gaussian envelopes. In the two-dimensional coordinate (a, b) model, the GF comprising an imaginary and real one is illustrated by the following: where they are described as follows.
Now θ implies the orientation separation angle of the Gabor kernel, and δ signifies the wavelength of sinusoidal features. Notably, it is essential to consider θ from the range [0 o , 180 o ] as symmetry generates another redundant direction. ψ denotes the stage offset, σ indicates the standard derivation of the Gaussian envelope, and γ represents the ratio of spatial features for identifying the ellipticity of the Gabor role. ψ = 0 and ψ = π/2 return the real and imaginary parts of GF. Variable 0 can be determined as 6 and spatial frequency bandwidth bw is given by the following.

MixNet-Based Feature Extractor
Next, for image pre-processing, a MixNet-based feature extraction model is employed to generate a useful set of feature vectors. A CNN algorithm created by the traditional convolutional operation is difficult to use for mobile terminals due to its complicated calculations and excessive parameters. In order to improve its effectiveness on mobile terminals and to guarantee the accuracy of the model, a sequence of lightweight convolutional operators has been presented. Amongst them, one of the most commonly utilized is a depthwise separable convolution layer. A depthwise separable convolutional layer splits the convolution into pointwise and depthwise convolution. In the initial phase, it convolves a single channel at a time using convolutional kernels at size = 3. In the second phase, it uses a feature map with the 1 × 1 convolutional kernel. Assume that N D k × D k feature view and 1 convolutional sliding step are utilized to convolve a feature map with D F × D F × M dimensions, including the output feature map with dimensions of D F × D F × N. The parameter amount of traditional convolutional operations is provided as follows.
The parameters involved in the depthwise separable convolutional operation is provided below.
The computation involved in traditional convolutional operation is provided as follows.
The computation involved in depthwise separable convolutional operation is defined in Equation (8).
The ratio of the two operations is provided as follows.
A depthwise separable convolutional layer uses a similar size 3 × 3 convolutional kernel in the computation method; however, a network with larger convolutional kernels of 5 × 5 or 7 × 7 confirms that a larger convolutional kernel improves the efficiency and accuracy of the model. However, the experiment shows that the case where a larger convolutional kernel is better is rare; simultaneously, a large convolutional kernel minimizes the model's accuracy. Here, MDConv splits the input channel with M size into C groups, later convolving all the groups with distinct kernel sizes. The standard depthwise separable convolution splits the input channel with M size into M groups and later implements convolutional calculations for all groups with a similar kernel size.

Image Classification Using SGRU Model
At this stage, the generated feature vectors are passed into the SGRU classifier to allot class labels. SGRU is made up of various GRU units. For time series t, the input series {e 1 , e 2 , . . . , e t } first enters into hidden layer h 1 1 , h 1 2 , . . . , h 1 t to attain all data from the previous time step. Next, the upper hidden layer takes the output from the lower hidden layers at a similar time step as the input for extracting features [19]. In particular, the upper layer of the hidden layer is h 2 1 , h 2 2 , . . . , h 2 t . For all layers, a hidden layer h i t , as provided in Equation (13), is shown by Equations (10)- (12) to attain the candidate value, update, and reset gates. It should be noted that in Equations (10)-(12), we have included embedding vector e t in the initial layer. Starting from the next layer upward, we employ the hidden state from the current time step in the previous layer, h i−1 t , rather than e t in (10)- (12). Figure 2 depicts the framework of SGRU.

Hyperparameter Optimization
Finally, CSSA is implied to optimally modify the hyperparameters included in the MixNet model. SSA attains the best possible solution by mimicking certain behaviors of sparrows [20]. Firstly, the discoverer-joiner sparrow population models are established, and then the sparrow is arbitrarily chosen as a guard. The joiner snatches food from the discoverer, observes the discoverer, and follows the discoverer for food. The discoverer takes the responsibility to provide foraging direction and areas for the sparrow population. Once the vigilante realizes the threat, the population implements anti-predation behavior immediately. Lastly, with various iterations of the location of the discoverer and joiner, the adoptive position for the entire population can be found. The sparrow population is within the space of N × D, where N indicates the overall amount of sparrows, D represents the spatial dimension. Next, the location of the i-th sparrow in space represents , and x id characterizes the location of i-th sparrow in d-dimension. The position update equation of the discoverer can be shown in the following Equation (14).
In the equation, t signifies the existing amount of iterations; T indicates the maximal amount of iterations; α represents an arbitrary value within [0, 1]; Q implies an arbitrary value with standard distribution; L indicates a matrix in that element is 1, and its size is 1 × d; R 2 ∈ [0, 1] signifies the warning values; ST ∈ [0.5, 1] denotes the safety values. If R 2 < ST, this implies that the population is not at risk and the discoverer continues searching. If R 2 ≥ ST, this implies that the vigilante discovered the predator and instantly delivered an alarm to the others. The sparrow population implements anti-predation behavior immediately any fly to a safer region for food. The position update equation of the joiner can be shown in the following Equation (15).
Here, x t worstd signifies the global worst place in tth iteration; x t+1 bestd signifies the global optimal location at the tth iteration. If i > N 2 , it implies that the i-th joiner has not attained food and that it needs to fly toward another location in order to search for food. If i ≤ N 2 , this implies that the i-th joiner is closer to the world's best location and is arbitrarily foraging around. The vigilant location upgrade equation is provided as follows: where β signifies the step length control variable that is an arbitrary value subjected to a regular distribution with a variance of 1 and means value of 0; K denotes the movement direction of sparrow, and arbitrary values lie within [1,1]; e indicates a constant with smaller value; f i characterizes the fitness of i-th sparrow; f g signifies the optimum fitness of the existing population; f w denotes the worst fitness of existing population. If f i = f g , this implies that the i-th sparrow is at the edge of the population and can be attacked easily by the predator. If f i = f g , this implies that i-th sparrow is within center of the population, and it is aware of danger; it relocates closer to other sparrows in order to reduce the threat of becoming caught.
With the addition of a global optimum sparrow neighborhood in all iterations, the searching ability of SSA can be enhanced. Additionally, this could assist the sparrow group in attaining the best location through the search process. The chaotic local searching technique can be employed in the iteration process of SSA for improving the capability of exploitation and maintaining a better harmony among the core search processes. Moreover, the logical chaotic function is employed to calculate chaotic SSA. This can be obtained as follows.
On the other hand, ρ 1 ∈ (0, 1) and ρ 1 = 0.25, 0.5, 0.75, and 1 once the control parameter µ is set to 4, and the logistic function is converted to a chaotic state. Therefore, the chaotic local searching function is shown below.
Here, [a, b] indicates the searching space, and the chaotic function was produced by mapping chaotic parameters ρ i into the chaotic vector P i . Furthermore, chaotic vector P i was linearly integrated with targeted position TP for generating candidate location CL, which is expressed as follows.
The CSSA approach resolves an FF for obtaining higher classification performances. It defines a positive integer for demonstrating the optimal performance of candidate solutions. During this case, the minimized classifier error rate was regarded as FF, as offered in Equation (21).

Performance Validation
In this section, the experimental validation of the CSSADTL-BCC model is tested using a benchmark dataset [21], and the details are provided in Table 1. The CSSADTL-BCC model is simulated using the Python 3.6.5 tool. The parameter settings are provided as follows: learning rate-0.01; dropout-0.5; batch size-5; epoch count-50; activation-ReLU. A few sample images are demonstrated in Figure 3.     Figure 5 highlight the overall classification outcomes of the CSSADTL-BCC model under distinct epochs and class labels. The experimental outcomes implied that the CSSADTL-BCC model has resulted in ineffectual outcomes over other models in terms of different measures such as accuracy (accu y ), precision (prec n ), recall (reca l ), specificity (spec y ), F-score (F score ), MCC, and G-mean (G mean ). For instance, with 500 epochs, the CSSADTL-BCC model provided the averages of accu y , prec n , reca l , spec y , F score , MCC, and G mean at 95.62%, 78.78%, 73.25%, 97.09%, 75.71%, 73.18%, and 84.01%, respectively. Moreover, with 1000 epochs, the CSSADTL-BCC method obtained the averages of accu y , prec n , reca l , spec y , F score , MCC, and G mean at 97.10%, 85.21%, 82.09%, 98.16%, 83.52%, 81.84%, and 89.62%, respectively. In addition, with 1500 epochs, the CSSADTL-BCC methodology provided averages of accu y , prec n , reca l , spec y , F score , MCC, and G mean at 98.61%, 92.80%, 91.48%, 99.14%, 92.10%, 91.29%, and 95.19%, respectively. At last, with 2000 epochs, the CSSADTL-BCC technique obtained the averages of accu y , prec n , reca l , spec y , F score , MCC, and G mean at 98.54%, 92.58%, 90.87%, 99.08%, 91.66%, 90.82%, and 94.84%, respectively.  The training accuracy (TA) and validation accuracy (VA) attained by the CSSADTL-BCC model on test dataset are demonstrated in Figure 6. The experimental outcomes implied that the CSSADTL-BCC model has gained maximum values of TA and VA. In particular, VA appeared to be higher than TA.  The training loss (TL) and validation loss (VL) achieved by the CSSADTL-BCC method on test dataset are established in Figure 7. The experimental outcome inferred that the CSSADTL-BCC model obtained the lowest values of TL and VL. In particular, VL seemed to be lower than TL. Next, a brief precision-recall examination performed on the CSSADTL-BCC method on the test dataset is displayed in Figure 8. By observing the figure, it can be observed that the CSSADTL-BCC approach has established maximal precision-recall performance under all classes.     To highlight the enhanced outcomes of the CSSADTL-BCC model, a brief comparison study with recent models is shown in Table 3 [22]. Figure 11 investigates a detailed accu y and F score analysis of the CSSADTL-BCC with existing models. The results indicated that GLCM-KNN and GLCM-NB models obtained lower values of accu y and F score . At the same time, the GLCM-discrete transform, GLCM-SVM, and Deep learning-IRV2 models have attained moderately closer values of accu y and F score . Next to that, the GLCM-DL and Deep learning INV3 models have resulted in reasonable accu y and F score values. However, the CSSADTL-BCC model has gained an effectual outcome with maximum accu y and F score at 98.61% and 92.80%, respectively. Figure 12 examines a detailed prec n and reca l examination of CSSADTL-BCC with existing techniques. The outcomes represented that the GLCM-KNN and GLCM-NB approaches have gained lesser values of prec n and reca l . Moreover, the GLCM-discrete transform, GLCM-SVM, and Deep learning-IRV2 algorithms have attained moderately closer values of prec n and reca l . Along with that, the GLCM-DL and Deep learning INV3 approaches have resulted in reasonable prec n and reca l values. However, the CSSADTL-BCC technique has gained effectual outcomes with maximum values of prec n and reca l at 92.80% and 91.48%, respectively. After observing the results and discussion, it is apparent that the CSSADTL-BCC model has showcased enhanced outcomes over other methods. The enhanced performance of the CSSADTL-BCC model is due to the effectual hyperparameter tuning process of the SGRU classifier. Thus, the proposed model can be applied to assist physicians in the disease diagnosis process.

Conclusions
In this study, a new CSSADTL-BCC method was advanced for classifying BC on histopathological images. The presented CSSADTL-BCC model mainly focused on the recognition and classification of BC. At the primary stage, the CSSADTL-BCC model employed the GF technique to eradicate the occurrence of noise. Moreover, a MixNet-based feature extraction model was employed for producing a useful collection of feature vectors. Then, the SGRU classifier was exploited to allot class labels. Furthermore, CSSA is applied to optimally modify the hyperparameters involved in the MixNet model. The performance validation of the CSSADTL-BCC model can be tested by using a benchmark dataset, and the results reported the superior efficiency of the CSSADTL-BCC method over the current existing approaches with a maximum accuracy of 98.61%. In the future, deep instance segmentation approaches can be included to enhance classification performance. In addition, the classifier's results can be boosted by designing deep fusion-based ensemble models.