Intelligent Hybrid Deep Learning Model for Breast Cancer Detection

: Breast cancer (BC) is a type of tumor that develops in the breast cells and is one of the most common cancers in women. Women are also at risk from BC, the second most life-threatening disease after lung cancer. The early diagnosis and classiﬁcation of BC are very important. Furthermore, manual detection is time-consuming, laborious work, and, possibility of pathologist errors, and incorrect classiﬁcation. To address the above highlighted issues, this paper presents a hybrid deep learning (CNN-GRU) model for the automatic detection of BC-IDC (+, − ) using whole slide images (WSIs) of the well-known PCam Kaggle dataset. In this research, the proposed model used different layers of architectures of CNNs and GRU to detect breast IDC (+, − ) cancer. The validation tests for quantitative results were carried out using each performance measure (accuracy (Acc), precision (Prec), sensitivity (Sens), speciﬁcity (Spec), AUC and F1-Score. The proposed model shows the best performance measures (accuracy 86.21%, precision 85.50%, sensitivity 85.60%, speciﬁcity 84.71%, F1-score 88%, while AUC 0.89 which overcomes the pathologist’s error and miss classiﬁcation problem. Additionally, the efﬁciency of the proposed hybrid model was tested and compared with CNN-BiLSTM, CNN-LSTM, and current machine learning and deep learning (ML/DL) models, which indicated that the proposed hybrid model is more robust than recent ML/DL approaches.


Introduction
Breast cancer is recognized by developing tissues in breast cells and is considered one of the most common cancers in women worldwide after lung cancer [1], especially in America, each year approximately 30% of new cases of females have been diagnosed with breast cancer. The death rate is 190 per 100,000 women every year [2]. The two most common types of BC are ductal carcinoma in situ (DCIS), and invasive ductal carcinoma (IDC) [3]. The DCIS detected cases are a very small percentage only 2% of BC patients. Furthermore, IDC is dangerous, because of encompasses full breast cells. This category includes 80% of BC patients and the death rate is 10% per 100 [4].

•
In this research, a new hybrid DL(CNN-GRU) model is presented that automatically extracts BC-IDC (+,−) features and classifies them into IDC (+) and IDC (−) from histopathology images to reduce the pathologist's error.

•
The hybrid DL model (CNN-GRU) is proposed to efficiently classify IDC breast cancer detection in clinical research.

•
In the evaluation process of the proposed CNN-GRU model, we have compared the key performance measure (Acc (%), Prec (%), Sens (%), and Spec (%), F1-score, and AUC with the current ML/DL model implemented the same dataset (Kaggle). In order to find the classification performance of the hybrid models. It is found that the proposed hybrid model has impressive classification outcomes compared to other hybrid DL models.
Furthermore, the structure and organization of the paper are as follows; a comprehensive explanation of the BC-IDC dataset has presented in Section 1, while Section 2 provides a full discussion of the proposed model structure and the data pre-processing. The experimental study of the models is shown in Section 3, and the comparative results and discussion are presented in Section 4, including the discussion, conclusion, and future study.
Medical diagnosis research is not only limited to the CNNs model for the extraction of features from imaging but also includes other types of models [33]. Wahab et al. [34] introduced multi-fused-CNN (MF-CNN) for BC detection.
The results demonstrate that suitable color and textural qualities might help identify ROIs based on the mitotic count at a lower spatial resolution. CNNs allow for exploring hitherto unthinkable possibilities in domains difficult for specialists to build effective imaging features. The research of Gravina et al. [35] used CNNs that had no effect when "cancer images are high dimensional than simple images." Breast cancer types, such as lesion segmentation, were presented as useful sources of information. It can be used to extract shape-related features and pinpoint the specific location on mammography images. In their research, Tsochatzidis et al. [36] experimented with and examined mammography images' accuracy in detecting BC. They implemented different mammographic mass datasets such as DDSM-400 and CBIS-DDSM have different key performance measures (accuracies (70% and 73%), furthermore the segmentation maps were compared to one another to check the performance of the proposed model. Malathi et al. [37] adapted a computer-aided diagnostic (CAD) system for mammograms to enable early detection, assessment, and diagnosis of breast cancer during the screening process. They spoke about the possibility of developing a breast CAD structure that is based on CNN's distinctive fusion and deep learning (DL) techniques. The outcome demonstrates that the random forest algorithm (RFA) had the best accuracy of 78%, with less error than the CNN model. The abnormality of breast tissue is explored using the deep belief network (DBN). Desai et al. [38] experimented on every network's design and operation. The analysis was then carried out on the performance metrics of the accuracy (79%), and the network diagnoses and categorizes BC to determine whether the network surpasses the others. When it comes to identifying and detecting BC-IDC detection, the CNN model is shown to have greater accuracy than MLP in certain cases. Wahab et al. [34] conducted a previous study investigating the automated identification of BC-IDC type using CNNs. Several researchers employed automatic identification approaches-based ML to detect the same thing. It acquired accurate findings and reduced the number of errors discovered during the diagnostic process. When utilizing the provided dataset, the research of D.Abdelhafiz [39] revealed that the augmentation approaches with the DL model accurately classify BC. Another study [40] used max pooling, at its deepest CNNs used, to accurately classify mitosis images of breast cancer.
The networks managed and organized the proposed pixel-by-pixel method to classify and examined the IDC tissue zones. Murtaza et al. [41] used DL methods to accurately detect cancer. Hossain et al. [13] proposed context-aware stacked CNNs for detecting IDC and DCIS using whole slide images (WSIs). They attained an area under the curve of 0.72 while categorizing nonmalignant and malignant slides. The system achieved a three-class performance accuracy of up to 76.2% for the WSI classification, suggesting its potential in routine diagnostics. Alamid and Qian et al. [42,43] described various approaches for identifying BC in their respective studies. The findings of their experiments demonstrated that the amplitude and phase of the shearlet coefficients might be used to improve detection performance and generalizability. Some earlier research [1,33,34,41] advocated using artificial intelligence (AI) and CNN for cancer image identification and healthcare monitoring. However, the accuracy percentage for a medical-side solution was too low [44,45], with a rate of roughly 60% for full class detection and 75% for just mass class detection. The accuracy of all arguments may be refined even more to get a more favorable outcome [46,47]. The study aimed to improve the precision level of the diagnosis of breast cancer.
In the DL, CNN is the most popular DL model because it can extract a rich set of features by applying various filters belonging to the convolutional layers, along with fully connected (FC) and pooling layers [48]. Additionally, CNN is unable to retain the memory of prior time series patterns; as a result, it has a tough time immediately learning the features of BC-IDC (+,−) that are considered to be the most significant and indicative of the disease [49]. Hence, the GRU network layer is concatenated with the CNN model to address the above issues, which improves the classification performance of BC-IDC (+,−), furthermore, it also stored the previous series pattern of data storage. This research aims to reduce pathology errors in the diagnosis process and automate detecting BC-IDC (+,−) tissue [49][50][51][52][53][54][55]. Table 1 presents various existing literature about BC detection using DL models.

The Framework of Predicting BC-IDC Detection
The whole process of BC-IDC (+,−) detection implementing the proposed CNN-GRU model are described as follows: Two key phases are required to perform the breast cancer (IDC tissues) detection process: data collecting and pre-processing (labeling and resizing), as shown in Figure 1. While the other is to analyze the data using the proposed CNN-GRU model for further detection.

The Framework of Predicting BC -IDC Detection
The whole process of BC-IDC (+,−) detection implementing the proposed CNN-GRU model are described as follows: Two key phases are required to perform the breast cancer (IDC tissues) detection process: data collecting and pre-processing (labeling and resizing), as shown in Figure 1. While the other is to analyze the data using the proposed CNN-GRU model for further detection.

Data Collection and Class Label
In this study, a publicly accessible dataset was obtained from the well-known Kaggle website (http://Kaggle.com (accessed on 10 March 2020) [56]. While the full dataset comes from the research [57], including 162 women diagnosed with IDC at the Hospital of the University of Pennsylvania. The dataset contains high-resolution pathologist images (2040 × 1536 pixels). To maintain consistency, each slide was scanned at a resolution of 0.25 micro/pr and 277,524 small images were extracted from the original dataset. There was a total of 277,524 images obtained, 78,786 of which were IDC (+) samples presented as 0 labels, and 198,738 non-IDC (−) were labeled 1, as given in Figure 2.
Electronics 2022, 11, x FOR PEER REVIEW 5 of

Data Collection and Class Label
In this study, a publicly accessible dataset was obtained from the well-known Kagg website (http://Kaggle.com (accessed on 10 March 2020) [56]. While the full dataset com from the research [57], including 162 women diagnosed with IDC at the Hospital of t University of Pennsylvania. The dataset contains high-resolution pathologist images (20 × 1536 pixels). To maintain consistency, each slide was scanned at a resolution of 0.25 m cro/pr and 277,524 small images were extracted from the original dataset. There was total of 277,524 images obtained, 78,786 of which were IDC (+) samples presented as labels, and 198,738 non-IDC (−) were labeled 1, as given in Figure 2.

Data Pre-Processing
The pre-processing is the most important step for the best classification results. It often performed on data before classification to ensure that the required results are o tained. Pre-processing strategies for the breast cancer dataset are being investigated improve the detection model accuracy, less computational time, and speed up the traini process. Additionally, by normalizing the data, the optimizer may achieve a mean (µ) = and furthermore the standard deviation (σ) = 1, allowing it to converge faster. The Kagg data were split into test data 20% of the total images, while the training data used t remaining 80%. To avoid the overfitting problem, the other process needs a validation s Another issue of unequal distribution of the dataset classes. While, this proposed that t quantity of data of the benign type is around 3 times more than the malignant catego affecting CNN's performance. The oversampling approach SMOTE (synthetic minor oversampling technique) is used to balance the samples and decrease the overfitting sues. Random cropping was also used in this research, one of the important steps of p processing. Figure 3 presents the IDC class distribution.

Data Pre-Processing
The pre-processing is the most important step for the best classification results. It is often performed on data before classification to ensure that the required results are obtained. Pre-processing strategies for the breast cancer dataset are being investigated to improve the detection model accuracy, less computational time, and speed up the training process. Additionally, by normalizing the data, the optimizer may achieve a mean (µ) = 0 and furthermore the standard deviation (σ) = 1, allowing it to converge faster. The Kaggle data were split into test data 20% of the total images, while the training data used the remaining 80%. To avoid the overfitting problem, the other process needs a validation set. Another issue of unequal distribution of the dataset classes. While, this proposed that the quantity of data of the benign type is around 3 times more than the malignant category, affecting CNN's performance. The oversampling approach SMOTE (synthetic minority oversampling technique) is used to balance the samples and decrease the overfitting issues. Random cropping was also used in this research, one of the important steps of pre-processing. Figure

Random Cropping
To handle the BC dataset, another pre-processing approach (random cropping) was used in conjunction with convolution neural networks. This technique is arbitrarily cropping different areas of large images to maximize the amount of available data for CNNs the random cropping is given in Figure 4. After pre-processing techniques, the data is delivered into the proposed approach in the following section (CNN-GRU); in this part, the CNN and GRU. Models for IDC (+,−) breast cancer detection is discussed briefly.

Random Cropping
To handle the BC dataset, another pre-processing approach (random cropping) was used in conjunction with convolution neural networks. This technique is arbitrarily cropping different areas of large images to maximize the amount of available data for CNNs the random cropping is given in Figure 4.

Random Cropping
To handle the BC dataset, another pre-processing approach (random cropping) was used in conjunction with convolution neural networks. This technique is arbitrarily cropping different areas of large images to maximize the amount of available data for CNNs the random cropping is given in Figure 4. After pre-processing techniques, the data is delivered into the proposed approach in the following section (CNN-GRU); in this part, the CNN and GRU. Models for IDC (+,−) breast cancer detection is discussed briefly. After pre-processing techniques, the data is delivered into the proposed approach in the following section (CNN-GRU); in this part, the CNN and GRU. Models for IDC (+,−) breast cancer detection is discussed briefly.

Convolutional Neural Networks (CNN)
CNNs are used to find images pattern and have several front layers of CNNs, the network can detect lines and corners. We can, however, transfer these patterns down through our neural network and try to identify more distinctive features as we progress deeper into the network [48]. The CNN model is extremely efficient for image feature extraction. Additionally, according to the researchers, the proposed CNNs model efficiently identifies BC from breast tissue images. The structure of the CNN is consisting of three main layers: the pooling, convolutional layer (CLs), and, fully connected layers (FCs). The CLs are responsible for calculating the outcome of neurons connected to local points. It is determined by considering the dot product of the weights and region. In the case of the input images, the typical filters consist of a small area (3 × 3 to 8 × 8) pixels. Such filters can scan the images by sliding a window over the image and automatically controlling the recurrent patterns that appear in any image region during the scanning process. The stride is the distance between filters in a chain. They extend the convolution to include windows that overlap if the stride set of parameters is less than any of the filter dimensions. Figure 5 presents the main architecture of the CNNs.

Convolutional Neural Networks (CNN)
CNNs are used to find images pattern and have several front layers of CNNs, the network can detect lines and corners. We can, however, transfer these patterns down through our neural network and try to identify more distinctive features as we progress deeper into the network [48]. The CNN model is extremely efficient for image feature extraction. Additionally, according to the researchers, the proposed CNNs model efficiently identifies BC from breast tissue images. The structure of the CNN is consisting of three main layers: the pooling, convolutional layer (CLs), and, fully connected layers (FCs). The CLs are responsible for calculating the outcome of neurons connected to local points. It is determined by considering the dot product of the weights and region. In the case of the input images, the typical filters consist of a small area (3 × 3 to 8 × 8) pixels. Such filters can scan the images by sliding a window over the image and automatically controlling the recurrent patterns that appear in any image region during the scanning process. The stride is the distance between filters in a chain. They extend the convolution to include windows that overlap if the stride set of parameters is less than any of the filter dimensions. Figure 5 presents the main architecture of the CNNs.

Gated Recurrent Unit Network (GRU)
In RNN, the GRU network model is implemented most of the time in research articles to handle the problem of vanishing gradient [58] presented in Figure 6, the GRU is more effective than the LSTM because it included three primary gates, none of which contained an internal cell state. Within the GRU, the information is kept in a concealed format for protection. The forward and backward information is offered jointly to update gate (z). Furthermore, previous information is stored in the reset gate (r).

Gated Recurrent Unit Network (GRU)
In RNN, the GRU network model is implemented most of the time in research articles to handle the problem of vanishing gradient [58] presented in Figure 6, the GRU is more effective than the LSTM because it included three primary gates, none of which contained an internal cell state. Within the GRU, the information is kept in a concealed format for protection. The forward and backward information is offered jointly to update gate (z). Furthermore, previous information is stored in the reset gate (r).
While the current memory gate takes advantage of the reset gate to save and maintain the essential information that was present in the system in its prior state. It is possible to incorporate nonlinearity into the input by using the input modulation gate while simultaneously giving it the properties of a zero mean. This is accomplished in a two-fold manner. The mathematical expression of the basic GRU gates is as follows: where K xr and K xz present weight parameters, while the C r , C z are biased.  While the current memory gate takes advantage of the reset gate to save and maintain the essential information that was present in the system in its prior state. It is possible to incorporate nonlinearity into the input by using the input modulation gate while simultaneously giving it the properties of a zero mean. This is accomplished in a two-fold manner. The mathematical expression of the basic GRU gates is as follows: where Kxr and Kxz present weight parameters, while the Cr, Cz are biased.

CNN-GRU
The CNN-GRU model consisted of 4 convolution layer (CLs), 3 max-pooling layers, and 3 fully connected layers (FC). The activation function (rectified linear units (ReLUs)) were implemented because it may not activate every neuron simultaneously, which allows the model to perform better and learn faster. Initially, the input image data were given with the dimensions size of (50, 50, and 3) in to CLs . This meant that the image's height, width was 50 pixels, while the channels were 3. The CNN-GRU model requires features to extract by passing through the CLs 1 In this particular instance, the feature map output shape was 128. In addition, the parameters were set such that the stride was 1 and the kernel size (3 × 3) of the CLs1, respectively. The ReLUs were used with CLs1 to decrease the nonlinearity dimension. After the first CLs 1, the output shape was 128 feature maps the size of (50,50). Furthermore, the pooling layer decreases the parameter of training to (48,48). To avoid the model from overfitting issues, the training parameter (48,48,128) was carried over from the dropout layer after the pooling layer.
Initially, the dropout of the convolutional layer was 0.3. An additional dropout of 0.9 was applied in the first two fully linked layers to overcome the problem of overfitting. After each max pooling and CLs, the training parameter dramatically dropped, followed by ReLUs and drop out. After that training process, the data need to be combined into an I-D array to utilize as input for FC layer implementation. Flatten was used to create a features map (512) and training parameter (32, 32) size. After completing the whole process of 2D (dimensional) convolutional layers, the dropout was employed to generate 256 feature maps. A GRU model used the FC layer of 512 neurons to tackle the vanishing gradient issue. After this process, two FLs were also utilized. Finally, the SoftMax performed the operations of the binary classification as presents in Table 2 and Figure 7. Table 2. A comprehensive summary of parameters used for the proposed model.

CNN-GRU
The CNN-GRU model consisted of 4 convolution layer (CLs), 3 max-pooling layers, and 3 fully connected layers (FC). The activation function (rectified linear units (ReLUs)) were implemented because it may not activate every neuron simultaneously, which allows the model to perform better and learn faster. Initially, the input image data were given with the dimensions size of (50, 50, and 3) in to CLs. This meant that the image's height, width was 50 pixels, while the channels were 3. The CNN-GRU model requires features to extract by passing through the CLs 1 In this particular instance, the feature map output shape was 128. In addition, the parameters were set such that the stride was 1 and the kernel size (3 × 3) of the CLs1, respectively. The ReLUs were used with CLs1 to decrease the nonlinearity dimension. After the first CLs 1, the output shape was 128 feature maps the size of (50, 50). Furthermore, the pooling layer decreases the parameter of training to (48,48). To avoid the model from overfitting issues, the training parameter (48, 48, 128) was carried over from the dropout layer after the pooling layer.
Initially, the dropout of the convolutional layer was 0.3. An additional dropout of 0.9 was applied in the first two fully linked layers to overcome the problem of overfitting. After each max pooling and CLs, the training parameter dramatically dropped, followed by ReLUs and drop out. After that training process, the data need to be combined into an I-D array to utilize as input for FC layer implementation. Flatten was used to create a features map (512) and training parameter (32,32) size. After completing the whole process of 2D (dimensional) convolutional layers, the dropout was employed to generate 256 feature maps. A GRU model used the FC layer of 512 neurons to tackle the vanishing gradient issue. After this process, two FLs were also utilized. Finally, the SoftMax performed the operations of the binary classification as presents in Table 2 and Figure 7.

Experimental Setup
For this experiment, we utilized an Intel Core i7 CPU and an NVIDIA graphics processing unit (GPU). The recommended model was also trained by Keras and Python 3.7 programming environments. Table 3 provides details of the software and hardware specifications.

Experimental Setup
For this experiment, we utilized an Intel Core i7 CPU and an NVIDIA graphics processing unit (GPU). The recommended model was also trained by Keras and Python 3.7 programming environments. Table 3 provides details of the software and hardware specifications.

Performance Metrics
The following performance metrics were considered and computed to test the CNN-GRU model to properly classify BC-IDC (+,−) tissue. The following are the mathematical expression of the accuracy (Acc), precision (Prec), sensitivity (Sens), and specificity (spec), F1 score and most importantly Matthew's correlation coefficient (MCC) and AUC, which were used as a performance indicator to detect breast IDC cancer in most cases.

Result and Discussion
The experimental study was conducted by three hybrid DL models such as CNN-LSTM, CNN-BiLSTM, and the suggested CNN-GRU model. These models' results were compared using a testing dataset.
6.1. Analysis of Performance Measure (Acc, Pres, Sens, Spec, F1 Score, and AUC) When evaluating the effectiveness of a certain classifier, accuracy is one of the most important factors to take into account. Furthermore, precision (Prec) is defined as the degree of accuracy that may be quantified based on real-time prediction. F1 score may be used interchangeably with "TPR," and it investigated many IDC scenarios in previous literature. A reasonable metric that reveals the robustness of an IDC breast cancer architecture is the F1 score. Furthermore, AUC presents the ratio to distinguish between the classes. Based on the aforementioned performance indicator, the proposed model was tested and compared with CNN-BiLSTM, and CNN-LSTM of the BC-IDC (+,−) detection. The CNN-GRU model performed better. Because GRU can be modified easily and does not need memory units, there are few parameters to train the model.
The proposed method attained an Acc of 86%, Prec of 85%, Sens of 85%, an F1 score of 86%, and AUC of 0.89, respectively. Figure 9 contains the all-performance measurement indicator analysis that was performed during predicting.
LSTM, CNN-BiLSTM, and the suggested CNN-GRU model. These models' results were compared using a testing dataset. (Acc, Pres, Sens, Spec, F1 Score, and AUC) When evaluating the effectiveness of a certain classifier, accuracy is one of the most important factors to take into account. Furthermore, precision (Prec) is defined as the degree of accuracy that may be quantified based on real-time prediction. F1 score may be used interchangeably with "TPR," and it investigated many IDC scenarios in previous literature. A reasonable metric that reveals the robustness of an IDC breast cancer architecture is the F1 score. Furthermore, AUC presents the ratio to distinguish between the classes. Based on the aforementioned performance indicator, the proposed model was tested and compared with CNN-BiLSTM, and CNN-LSTM of the BC-IDC (+,−) detection. The CNN-GRU model performed better. Because GRU can be modified easily and does not need memory units, there are few parameters to train the model.

Analysis of Performance Measure
The proposed method attained an Acc of 86%, Prec of 85%, Sens of 85%, an F1 score of 86%, and AUC of 0.89, respectively. Figure 9 contains the all-performance measurement indicator analysis that was performed during predicting.

Confusion Matrix
In order to check the classification performance of the model we implemented confusion matrix. The confusion matrix actually classifies the BC-IDC (+,−). Additionally, the CNN-GRU is checked and evaluated on this classification measure scale, also compared with CNN-LSTM and CNN-BiLSTM models. While the performance of the CNN-GRU is superior to other hybrid models and accurately classifies BC-IDC (+,−), as presents in Figure 10.

Confusion Matrix
In order to check the classification performance of the model we implemented confusion matrix. The confusion matrix actually classifies the BC-IDC (+,−). Additionally, the CNN-GRU is checked and evaluated on this classification measure scale, also compared with CNN-LSTM and CNN-BiLSTM models. While the performance of the CNN-GRU is superior to other hybrid models and accurately classifies BC-IDC (+,−), as presents in Figure 10.

ROC Curve Analysis
The receiver operating characteristic (ROC) curve is a graph that presents the classification performance of the models along with the given total classification thresholds. The ROC curve is the visual comparative plots of true positive rates (TPR) on the Y-axis

ROC Curve Analysis
The receiver operating characteristic (ROC) curve is a graph that presents the classification performance of the models along with the given total classification thresholds. The ROC curve is the visual comparative plots of true positive rates (TPR) on the Y-axis and on the X-axis false positive rates (FPR). Figure 11 presents the ROC of the CNN-GRU, along with CNN-BiLSTM, and CNN-LSTM models, showing that the proposed methods performed better classification than another hybrid model.

ROC Curve Analysis
The receiver operating characteristic (ROC) curve is a graph that presents the classification performance of the models along with the given total classification thresholds. The ROC curve is the visual comparative plots of true positive rates (TPR) on the Y-axis and on the X-axis false positive rates (FPR). Figure 11 presents the ROC of the CNN-GRU, along with CNN-BiLSTM, and CNN-LSTM models, showing that the proposed methods performed better classification than another hybrid model.

Evaluation of TNR ,TPR and MCC
To evaluate the performance, of the proposed hybrid model through analysis, a confusion matrix technique is implemented to identify the TNR, TPR, and MCC values. Figure 13 presents the TPR, TNR and MCC, which are 86%, 84%, and 85.5%. The proposed CNN-GRU model has the best outcomes as compared to other hybrid models.

Evaluation of TNR, TPR and MCC
To evaluate the performance, of the proposed hybrid model through analysis, a confusion matrix technique is implemented to identify the TNR, TPR, and MCC values. Figure 13 presents the TPR, TNR and MCC, which are 86%, 84%, and 85.5%. The proposed CNN-GRU model has the best outcomes as compared to other hybrid models.

Evaluation of TNR ,TPR and MCC
To evaluate the performance, of the proposed hybrid model through analysis, a confusion matrix technique is implemented to identify the TNR, TPR, and MCC values. Figure 13 presents the TPR, TNR and MCC, which are 86%, 84%, and 85.5%. The proposed CNN-GRU model has the best outcomes as compared to other hybrid models. Figure 13. TPR, TNR, and MCC. Figure 13. TPR, TNR, and MCC.

Model Efficiency
The time complexity (ms) is measuring the training time of the model during the process of classification. The fact that most of the training was completed offline was not considered during the experiment. As shown in Figure 14, the proposed CNN-GRU has a training time of 4.4 milliseconds, which is much less than the training times of CNN-BiLSTM and CNN-LSTM, which are 6.4 and 7.4 ms, respectively. The time complexity (ms) is measuring the training time of the model during the process of classification. The fact that most of the training was completed offline was not considered during the experiment. As shown in Figure 14, the proposed CNN-GRU has a training time of 4.4 milliseconds, which is much less than the training times of CNN-BiLSTM and CNN-LSTM, which are 6.4 and 7.4 ms, respectively.

Comprative Anaylis Considering Proposed Hybird Alogerthm with ML/DL Exting Model
To further investigate the performance of the proposed hybrid model classification of IDC (+,−), we compared and correlated with the best DL model i.e., LSTM, CNN, DNN, and BiLSTM using key performance measures (Acc, Pres, Sens, Spec, and F1 score). The CNN-GRU model performed phenomenal classification measures as compared to these models. The LSTM has the least key performance matrix in IDC (+,−) detection, as presents in Figure 15. Furthermore, the proposed hybrid algorithm was also compared with exiting ML/DL for BC-IDC (+,−) tissue classification.   In order to expand the scope of the CNN-GRU validation, a complete performance comparison was made between the CNN-GRU and several existing ML/DL frameworks from the research literature. This was executed in order to widen the validation scope. CNN-GRU attained an outstanding performance on all of the performance metrics that were listed above by drubbing the existing literature. A comparative investigation can be found in Table 4, which provides a summary. Furthermore, the proposed hybrid model has some disadvantages; In the training process, the proposed model needed high computing resources and specialized hardware, good GPU.

Conclusions and Future Work
The aim of automatic detection of BC-IDC (+,−) tissue is to improve the treatment of patients, which is very difficult to diagnose in early-stage detection. A CNN-GRU method is proposed in the present work, which examined the BC-IDC tissue areas in WSIs for automated detection and classification. In this research study, a proposed model automatically implemented different layers' architectures to detect breast cancer (IDC tissues). The validation tests for quantitative results were carried out using each methodology's key performance indicators (Acc (%), Pres (%), Sens (%), Spec (%), AUC and F1-score (%). The proposed system successfully produced an Acc of 86.21%, Prec of (85.90%), Sens (85.71%), Spec (84.51%), F1 score (88%) and AUC (0.89), which can reduce the pathologist error and efforts during the clinical process. Furthermore, the result of the proposed model was compared with CNN-BiLSTM, CNN-LSTM, and other existing ML/DL, which indicated that CNN-GRU has 4 to 5% high accuracy as well as Pres (%), Sens (%), Spec (%), AUC, F1-score (%), and less time complexity (ms). In this research, the fundamental constraint is using a secondary database, such as Kaggle. Future studies should be conducted using primary data to improve the accuracy of findings linked to BC detection.