Breast Cancer Mammograms Classification Using Deep Neural Network and Entropy-Controlled Whale Optimization Algorithm

Breast cancer has affected many women worldwide. To perform detection and classification of breast cancer many computer-aided diagnosis (CAD) systems have been established because the inspection of the mammogram images by the radiologist is a difficult and time taken task. To early diagnose the disease and provide better treatment lot of CAD systems were established. There is still a need to improve existing CAD systems by incorporating new methods and technologies in order to provide more precise results. This paper aims to investigate ways to prevent the disease as well as to provide new methods of classification in order to reduce the risk of breast cancer in women’s lives. The best feature optimization is performed to classify the results accurately. The CAD system’s accuracy improved by reducing the false-positive rates.The Modified Entropy Whale Optimization Algorithm (MEWOA) is proposed based on fusion for deep feature extraction and perform the classification. In the proposed method, the fine-tuned MobilenetV2 and Nasnet Mobile are applied for simulation. The features are extracted, and optimization is performed. The optimized features are fused and optimized by using MEWOA. Finally, by using the optimized deep features, the machine learning classifiers are applied to classify the breast cancer images. To extract the features and perform the classification, three publicly available datasets are used: INbreast, MIAS, and CBIS-DDSM. The maximum accuracy achieved in INbreast dataset is 99.7%, MIAS dataset has 99.8% and CBIS-DDSM has 93.8%. Finally, a comparison with other existing methods is performed, demonstrating that the proposed algorithm outperforms the other approaches.


Introduction
Cancer is a fatal disease, with an estimated ten million deaths and 19.3 million cancer cases reported in 2020 [1]. Breast cancer after lung cancer is the second utmost common cancer [2], and a fifth foremost reason of death in women [2,3]. In 2020, 684,996 deaths occurred with breast cancer and 2.3 million new cases diagnosed in women (https://gco. iarc.fr/today/data/factsheets/cancers/20-Breast-fact-sheet.pdf) (accessed on 20 October 2021) [1]. In less developed countries the breast cancer is the foremost cause of death [4,5]. The cells in the breast tissues change and split into multiple cells, causing a mass or lump. Cancer begins in ducts or lobules that are connected to the nipples (https://www. cancer.org/content/dam/cancer-org/research (accessed on 11 November 2021)) [3]. Most masses in the breast are benign that is, noncancerous, and cause fibroids, tenderness, area thickening, or lumps [3]. Mostly, breast tumors have no signs when small in size and can be easily treated (https://www.cancer.org/content/dam/cancer-org/research (accessed on 11 November 2021)). Painless mass is the sign of abnormal cells. Family history, reproductive factors, personal characteristics, excess body weight, diet, alcohol, tobacco, environmental factors, and other risk factors, such as night shift work, are all breast cancer issues. In primary phase, breast cancer spreads slowly but with passage of time it

•
Data augmentation is performed using three mathematical formulas: horizontal shift, vertical shift, and rotation 90. • Two deep learning pre-trained models are fine-tuned such as Nasnet mobile andMo-bilenetV2 and deep features are extracted from the middle layer (average pool) instead of the FC layer.

•
We proposed a Modified Entropy-controlled Whale Optimization Algorithm for optimal feature selection and reducing the computational cost.

•
We fused the optimal deep learning features using a serial-based threshold approach.

Literature Review
To perform the feature extraction and classification lot of models have been proposed [18]. To extract the features and to perform mammogram images classification into malignant and benign CAD is developed by deploying the Deep Convolutional Neural Network (DCNN) and AlexNet model [19]. The SVM approach is used to connect to the last layer, which is fully connected to achieve good accuracy. Fine-tuning is also performed. The AUC is 0.94% and accuracy is 87.2% achieved [20]. The DCNN model is used for mammogram images detection. The model is fine-tuned [21]. To classify the malignant and non-malignant images, a CAD system is proposed. The K-Clustering technique and SVM classifier is used. Sensitivity and specificity of 96% are achieved by using the dataset DDSM [22]. The researcher took INbreast and CBIS-DDSM datasets in the form of png, resizing of the images is performed. The VGG and Resnet methods used for classification. The CBIS-DDSM achieved 0.90 AUC and 0.98 on INbreast dataset [23]. The Deep Learning models VGG, Resnet, Xception are applied to the CBIS-DDSM dataset. Transfer learning and fine tuning methods used to adjust the overfitting problem. The 0.84 AUC value is achieved on CBIS-DDSM [24]. To perofrm the classification researchers proposed Multi-View Feature Fusion(MVFF) method on mini-MIAS and CBIS-DDSM dataset. The AUC 0.932% is achieved [25]. The researchers used the MobilenetV2 model and performed transfer learning on the CBIS-DDSM dataset to perform the classification. The 74.5% accuracy is achieved. The data resizing and augmentation are performed [26]. The multi-level thresholding and radial region-growing methods are used on the DDSM dataset with an accuracy of 83.30% and an AUC of 0.92, which reduce the false positive rates [27].The CAD system is proposed by using the DDSM and mini-MIAS datasets. The histogram regions are used for segmentation and classification. K-means analysis is used to segment the images. The shape and texture features extracted and SVM classifier is used to perform the classification. The classification accuracy in mini-MIAS is 94.2% with an AUC 0.95% and in CBIS-DDSM 90.44% accuracy with an AUC value of 0.90% is achieved [28].
The CAD system is proposed to classify the INbreast dataset. To perform the classification the deep CNN model is used. The 95.64% accuracy, 94.78% AUC and 96.84% F1-score is achieved [30]. In the other study, the Modified VGG (MVGG) model is used to classify data from the CBIS-DDSM dataset. The hybrid transfer learning fusion approach is used in MVGG and ImageNet models. The modified MVGG achieves 89.8% accuracy, while MVGG and Imagenet combined by the fusion method achieve 94.3% accuracy [31]. In the other study, the researchers extract the features by using the Maximum Response (MR) filter bank that is convolved by the CNN to perform the classification. The fusion approach is applied to address the mass features. The accuracy on the CBIS-DDSM dataset after the fusion reduction approach is 94.3%, an AUC is 0.97%, and the specificity is 97.19% is achieved [32]. To extract features ensemble transfer learning approach is used.The neural networks are used to perform the classification. The 88% accuracy and an AUC 0.88% achieved on CBIS-DDSM [33]. To generate the ROI and classification of the INbreast dataset, a CAD system is proposed. Deep learning techniques such as a Gaussian mixture model and deep belief network are proposed. The cascade deep learning method is used to reduce the false-positive results. Bayesian optimization is performed to learn and segment the ROIs. In the last, the deep learning classifier is used to classify the INbreast images by achieving an accuracy of 91% and AUC is 0.76% [34].
The transfer learning [13] approach is used for improving the efficiency of the training models that are used to perform the classification. This approach makes learning faster and easier. Transfer learning is helpful when data is not available in a large amount. Transfer learning with fine-tuning is usually faster and training is easier when initialized the weights. It quickly learned transfer features using a small number of [33][34][35][36]. The transfer learning approach with CNNs has been used to classify the different types of images like histological cancer images, digital mammograms, and chest-X ray images [37].
To classify the INbreast and DDSM datasets, deep learning models CNN, ResNet-50, Inception-ResNetV2 were used. Mammogram images are classified as benign or malignant.The INbreast dataset achieved an accuracy of 88.74%, 92.55%, and 95.32% [38]. In another study, Faster-RCNN is used to detect and perform classification on the INbreast and CBIS-DDSM datasets. An AUC of 0.95 on the INbreast dataset is achieved [39]. The large number of data set require to train the deep learning models, so augmentation on the mini-MIAS dataset is performed by using the rotation and flipping method. The 450,000 images of MIAS after augmentation are taken and resized into 192 × 192. The images are classified into three categories using the multiscale convolutional neural network method (MCNN): normal, benign, and malignant. The AUC is 0.99 and the sensitivity is 96% [40]. The random forest (RF) on CNN with a pre-training approach is used to extract the hand-crafted features from the INbreast dataset. The 91.0% accuracy is achieved [41]. The author's used physics informed neural network (PINN) by applying regression adaptive activation functions to predict the smooth and discontinuous functions to solve the linear and non-linear differential equation. To provide the smooth solution the nonlinear Klein Gordon equation has been solved, to use the high gradient solutions the non-linear Burgers equation and the Helmholtz equation, in particular, are used. To achieve the network best performance the activation function hyper parameter is optimized by changing the topology loss function that participates in the optimization process. To improve the convergence rate during initial training and solution accuracy the adaptive activation function outperforms in terms of learning capabilities. The efficiency can be increased by using this method [42]. To improve the performance of PINN the adaptive activation function use layer-wise and neuron-wise approaches. To complete the local adaption of activation function the scalable parameter is initialized in each layer of layer-wise and neuron to perform the optimization updation by utilizing the stochastic gradient descent algorithm. To increase the training speed the slope-based activation with loss function is applied [43]. The adaptive activation functions are utilizd to propose the Kronecker neural networks (KNNs). The number of parameters in the large network is reduced by KNNs by using the Kronecker product. The KNNs tempts faster loss decay as compare to feed forward network. For KNNs, the global convergence of gradient descent is established. The Rowdy activation function remove the saturation region with training parameters by using sinusoidal fluctuations [44].

Methods and Materials
This section illustrates the proposed methodology. The six steps are involved in the proposed methodology. In the first step, to increase the number of training samples the data augmentation is applied. In the second step, fine-tuning is performed on two selected deep models: MobilenetV2 and Nasnet mobile. Fine-tuned models are used to extract the features from the global average pool layer. In the third step, a Modified Entropy Whale Optimization Algorithm (MEWOA) is applied to the extracted deep features. In the fourth step, features are fused using a serial-based non-redundant approach. In fifth step, again, to reduce thecomputational time MEWOA is applied, and finally, classification is performed using machine learning classifiers. Figure 1 shows the detailed architecture of the proposed method. The detail of each step is given below. gradient descent is established. The Rowdy activation function remove the saturation region with training parameters by using sinusoidal fluctuations [44].

Methods and Materials
This section illustrates the proposed methodology . The six steps are involved in the proposed methodology. In the first step, to increase the number of training samplesthe data augmentation is applied. In the second step, fine-tuning is performed on two selected deep models: MobilenetV2 and Nasnet mobile. Fine-tuned models are used to extract the features from the global average pool layer. In the third step, a Modified Entropy Whale Optimization Algorithm (MEWOA) is applied to the extracted deep features. In the fourth step, features are fused using a serial-based non-redundant approach. In fifth step, again,to reduce thecomputational time MEWOA is applied, and finally, classification is performed using machine learning classifiers. Figure 1 shows the detailed architecture of the proposed method. The detail of each step is given below.

Datasets
In this work, three publicly available mammography datasets are utilized for the experimental process: CBIS-DDSM [45], INbreast [46], and MIAS (http://peipa.essex.ac.uk/info/mias.html (10 October 2019)). For evaluation of the proposed framework, a 50:50 approach has opted which means 50% of the images of each dataset are consumed for the training and remaining for testing. A few sample images of each dataset are illustrated in the figures. The each dataset description is given below.
CBIS-DDSM: The Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) is an improved and standardized form of DDSM. A trained mammographer curated the dataset. The images are in the Dicom form. The ROI annotations are also provided of the images. The two views craniocaudal(CC) and mediolateral oblique (MLO) are available. The 1696 mass images with pathological information training and testing are available [45]. Figure 2 shows a few examples of images from this dataset.

Datasets
In this work, three publicly available mammography datasets are utilized for the experimental process: CBIS-DDSM [45], INbreast [46], and MIAS (http://peipa.essex. ac.uk/info/mias.html (accessed on 10 October 2019)). For evaluation of the proposed framework, a 50:50 approach has opted which means 50% of the images of each dataset are consumed for the training and remaining for testing. A few sample images of each dataset are illustrated in the figures. The each dataset description is given below.
CBIS-DDSM: The Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) is an improved and standardized form of DDSM. A trained mammographer curated the dataset. The images are in the Dicom form. The ROI annotations are also provided of the images. The two views craniocaudal(CC) and mediolateral oblique (MLO) are available. The 1696 mass images with pathological information training and testing are available [45]. Figure 2 shows a few examples of images from this dataset.    [46]. The 108 mass mammogram images are taken for the experiment. Figure 3 represents a few sample images as well.   [46]. The 108 mass mammogram images are taken for the experiment. Figure 3 represents a few sample images as well. There are 322 images in this dataset. MIAS images have been condensed to 200-micron pixel edges. Every image is 1024 × 1024 pixels. The benign, malignant, and normal images are given. The complete information of the dataset regarding normal, benign, and malignant images is available (http://peipa.essex.ac.uk/info/mias.html (10 October 2019)). The images are available in a portable gray map (PGM). The 300 images without calcification cases are taken for the experiment. Figure 4, represent sample images of this dataset. mini_MIAS (Mammographic Image Analysis Society): This is publicly available dataset. There are 322 images in this dataset. MIAS images have been condensed to 200-micron pixel edges. Every image is 1024 × 1024 pixels. The benign, malignant, and normal images are given. The complete information of the dataset regarding normal, benign, and malignant images is available (http://peipa.essex.ac.uk/info/mias.html (accessed on 10 October 2019)). The images are available in a portable gray map (PGM). The 300 images without calcification cases are taken for the experiment. Figure 4, represent sample images of this dataset.  These three datasets CBIS-DDSM, INbreast, and mini-MIAS are converted into portable network graphic (PNG) format [47]. The resizing of the images is performed by using the neighbor interpolation method into 256 × 256.

Data Augmentation
To enable the deep learning models the sample images increased by using data augmentation [47]. The deep learning models give promising results on a large amount These three datasets CBIS-DDSM, INbreast, and mini-MIAS are converted into portable network graphic (PNG) format [47]. The resizing of the images is performed by using the neighbor interpolation method into 256 × 256.

Data Augmentation
To enable the deep learning models the sample images increased by using data augmentation [47]. The deep learning models give promising results on a large amount of data. In this work, three mathematical operations are implemented such as flip left to right, flip up to down, and rotation at 90 degrees. Algorithm 1 of data augmentation is presented below.

Algorithm 1: Data Augmentation
While (i = 1 to target object) Step 1: Input read Step 2: Flip Left to right Step 3: Flip-up to down Step 4: Rotate image to 90 • Step 5: Image write step 2 Step 6: Image write step 3 Step 7: Image write Step 4 End In Figure 5, data augmentation is presented of the CBIS-DDSM [45] images. The augmentation of the data is performed by using flip left to right, up to down, and by rotating at 90 • .   Table 1 shows the detailed information of the three datasets CBIS-DDSM, INbreast, and MIAS. The detail of original images and data augmentation is given below. There are several layers in the CNN model, including an input layer, convolutional  Table 1 shows the detailed information of the three datasets CBIS-DDSM, INbreast, and MIAS. The detail of original images and data augmentation is given below.

Convolutional Neural Network
There are several layers in the CNN model, including an input layer, convolutional layers, batch normalization layers, Pooling, ReLu, softmax layers, and one output layer. The input layer consist of dimensions a × b × c of the input image. The number of channels described by c. The convolutional layer that is main and first layer utilizes three inputs: a, b, and c. The mapping of features is performed in the convolutional layer. These features are utilized for visualization and used in the activation layer.

Fine-Tuned MobilenetV2
The MobilenetV2 model is a portable custom-based model in computer vision. This model sustains the same accuracy by decreasing the number of operations and consuming a small amount of memory. In this model, the inverted residual layer with a linear bottleneck is included. In this model, the compressed representation of low-dimensional input is used that is converted into high dimension by using the light weight depth-wise convolution filters [48].
MobilenetV2 performs efficiently in any framework. This model reduces the need for main memory in many embedded hardware designs while providing a small amount of cache memory that increases the speed and efficiency of the system. It reduces the main memory need. The MobilenetV2 performs best in object detection, semantic segmentation, and classification tasks. In Mobilenetv2, depth-wise convolution, linear bottleneck, inverted residuals, and information flow interpretation are used [49]. The depth-wise separable convolution blocks achieve good performance. In Mo-bilenetV2, the convolution layers are replaced with the two other layers. The depth-wise convolution that is the first layer uses the single convolution filter per input unit to perform the lightweight filtering. The pointwise convolution that is the second layer generates new features by utilizing the input channels of the linear combinations. In the residual bottleneck, the information in the deep convolutional layer is encoded in some manifold that is residing in a low-dimensional subspace. This procedure can be captured by reducing the layer dimensionality and operating space dimensionality.
The manifold expands the space by allowing us to reduce the activation space dimensionality. The deep convolution neural network has ReLU, which is a nonlinear per coordinate transformation that breaks down intuition. If the volume of the manifold of interest after ReLU transformation remains non-zero, the linear transformation is formed. ReLU has complete information about the input manifold if it remains in the low-dimensional subspace of the input space. The inverted residuals are built up that is more memory efficient [49].
In fine-tuned MobilenetV2 the last three layers replaced by adding new layers according to the target datasets. The target dataset is based on mini-MIAS, CBIS-DDSM, and INbreast. To train the fine-tuned model transfer learning approach is used. In the training process, 100 epochs, 0.00001 learning rate, and 8 batch size is set. The Single Shot Multibox Detector (SSD) [50] and Adam optimizer are utilized for the learning method. To quantize bounding box space, the SSD uses default anchor boxes with different fractions and measures. SSD adds different feature layers in the network end [2]. Finally, deep features are extracted for further processing from the fine-tuned model of layer global average pool (GAP). The vector output size of this layer is N × 1280.

Fine-Tuned Nasnet Mobile
The Nasnet Mobile is a search network of neural architecture. By using a small dataset, the architectural building block searched and transferred on the large dataset. The best cells or convolutional layers are searched and applied to the Imagenet by making more copies of the convolution layers. The new method ScheduleDropPath regularization is proposed that improves the generalization of the Nasnet models [51]. The samples child of the different networks is added in the recurrent neural network (RNN) to propose the NAS [52]. To achieve accuracy the network child is trained. To update the controller, the resulting accuracies are used that generate the best architecture. The controller weights are updated by using the gradient. The RNN controller only returns the structure of the normal and reduction cells [51]. The Nasnet search space is familiar with CNN architecture engineering because it identifies motifs such as convolutional filter bank combinations, nonlinearities, and connections prudent selection [53][54][55].
The above-mentioned studies suggests to predict the generic convolutional cells that are utilized to express motifs for the controller RNN. To control the filter depth and spatial dimensions of the input the cells are stacked in the form of series. While in Nasnet convolutional, nets manually pertain to the architecture. These are built up by convolutional cells and repeated several times by using different weights but the same architecture [51].
In Nasnet proposed by [51], the reinforcement learning search method is used to search the blocks. The initial convolutional filters and motif repetitions N is free from the parameters and used in scaling. The features map of the same dimensions is returned by the convolutional cells when in normal cell form, otherwise in reduction cells the features map with their height and width reduced by a factor of two.
The Nasnet model uses scalable, convolutional cells from data and can be transferred to the other image classification tasks. The parameters and computational cost of the architecture are quite flexible and this model can be used model for a lot of different problems. The search space is used to minimize the architecture complexity from the depth of the network. The searching space achieves good architecture on small datasets and shifts the learned architecture to the classification.
The last three layers of the Nasnet are replaced by adding new layers based on the target dadasset during the fine-tuning phase. The target dataset consists of mini-MIAS, CBIS-DDSM, and INbreast. The transfer learning approach is used to train the fine-tuned models. The number of epochs used to train the process is 100, the learning rate is 0.00001, and the batch size is 8. To learn the methods, the Adam optimizer and SSD are used [50]. To quantize bounding box space, the SSD uses default anchor boxes with different fractions and measures. SSD adds different feature layers in the network end [2]. Finally, deep features are extracted for further processing from the fine-tuned model of the layer Global Average Pool (GAP). This layer's output vector size is N*1056.

Transfer Learning
Transfer learning makes use of an already trained and reused model as the foundation for a new task and model. The model used for one task can be repurposed for other tasks as an optimization to improve performance. By applying transfer learnin the model can be train with a small volume of data. It is helpful to save time and achieve good results [56,57].
In the transfer learning approach, we transfer knowledge from the source mammogram input images I s to the target domain mammogram mass images I T . The target classifier T c (M t ) is to be trained from the input mammogram image I s to the target image I T to get the classifier prediction about BMN Ti , which stands for benign, malignant, and normal. To extract the features transfer layer is used. The top layer from the classifier retrained the new target classes while the other layers kept frozen.
To extract the features from MobilenetV2 and Nasnet the transfer learning approach is used. In Figure 6, multiple classes of knowledge have been utilized into two classes.
gram input images to the target domain mammogram mass images . The target classifier ( ) is to be trained from the input mammogram image to the target image to get the classifier prediction about , which stands for benign, malignant, and normal. To extract the features transfer layer is used. The top layer from the classifier retrained the new target classes while the other layers kept frozen.

= ( )
To extract the features from MobilenetV2 and Nasnet the transfer learning approach is used. In Figure 6, multiple classes of knowledge have been utilized into two classes.

Whale Optimization Algorithm (WOA)
To explore the feasible solution to the problems in the search space, whale individuals are used in the community. There are three functions performed by WOA: encircling, shrinking, and hunting. In the exploitation phase, the encircling and shrinking operations are used, while in the exploration phase, the hunting function is used [58].

Whale Optimization Algorithm (WOA)
To explore the feasible solution to the problems in the search space, whale individuals are used in the community. There are three functions performed by WOA: encircling, shrinking, and hunting. In the exploitation phase, the encircling and shrinking operations are used, while in the exploration phase, the hunting function is used [58].
To provide the solution of the dimension optimization problems (DO) the procedures of the ith individual in the cth generation is used to find the best solution.
The WOA procedures are following. Encircling Operation Shrinking Operation Hunting Operation The arbitrary number in the range [0 1] is described by (rd), The present number of the iterations is represented by c, iterations maximum no is described by c max , the best solution positive vector is represented by ESH * (c). To define logarithmical spiral shape the constant e is used, and the random number in [−1, 1] is represented by t. The arbitraryposition vector ESH K (c) is selected from the present population. Three distances are following. The first is According to the probabil-ity p rob , Equations (1)-(3) are executed by WOA. The whale individuals are updated by Equation (1), when p rob < 0.5 and |B| < 1; otherwise individuals are corrected by Equation (3), when |B| ≥ 1. The Equation (2) is used to update the individuals, when p rob ≥ 0.5.

Modified Entropy Whale Optimization Algorithm (MEWOA)
The WOA learns the best current solution from the exploitation phase, which easily succumbs to local optimization and reduces population diversity. The random individual learning operation has some sightlessness and does not perform any effective interchange of information between groups in the exploration phase, which disrupts the algorithm convergence rate. The WOA needed to be improved to reduce these issues. The new algorithm MEWOA is proposed. To balance the WOA's an exploration and exploitation functions the control parameter B is used.The exploration probability in WOA is only 0.1535, during the iterative process of the algorithm.WOA has limited ability. The development and exploration process in the MEWOA is controlled by linearly increasing the probability. Individual quality in the animal large group improves when individuals learn from the elite and other members of the group. Individual neighborhoods are formed through adaptive social learning procedures that use the social position of the individual, social influence, and social network formation. The adaptive social networking approach is used to build whales' adaptive community and to improve the interaction between groups, as well as to improve the MEWOA's calculation accuracy. The new approach is proposed based on neighborhood, which will also increase the population diversity. The MEWOA's convergence speed increases when the population jumps out from a local optimum by introducing the wavelet mutation strategy, and the algorithm exhibits premature convergence when the population falls into the local optimum [58].

Linear Increasing Probability
The control parameters |B| ∈ [0, 2] in the WOA, the global exploration is performed by the algorithm when |B| ≥ 1. As presented in Equation (4), when c ≥ 1 2 c max , |B| < 1 is always true. The algorithm has weak exploration ability in the second half of the iteration.
Let q = 2 1 − c c max , Λ = 2rd − 1, then B = q.Λ in the whole iteration, and the probability of |B ≥ 1| is The WOA performs exploitation operations when the p rob ≥ 0.5 and the exploration probability is 0.5 × 0.307 = 0.1535 in the iterations. The search ability of the MEWOA is not maintained by |B| due to weak exploration ability, so the exploitation and exploration ability is handled by probability P i that will increase the number of iterations linearly to conduct global exploration.
The r no is arbitrary no in [0 1]. The exploitation operation is performed when the r no < p i ; otherwise, an exploration operation is performed by the algorithm. The global exploration has a possibility of 0.1 when the coefficient of C c max < 0.5 even in the last iteration, which will rise algorithm capacity to jump out of local optimization.
Average exploration probability according to Equation (6) will be when c max ≥ 2,P i ≥ 0.2 > 0.1535. The exploitation and exploration is controlled by linear increasing probability P i that will increase the algorithm search ability.

Adaptive Social Learning Strategy
In social behavior, each whale can build a neighborhood membership relationship and can change its current best solution behavior of imitation. The algorithm (MEWOA) moves away from the local optimal solution by improving and enhancing information sharing between groups. For the current population, The fitness value is computed and ordered from minor to huge to achieve the stored population G 1 (c) = ESH (1) The social impact is represented by S i f , and S i f ≤ 0.4. Equations (8) and (9) defined that when the social impact is greater the social ranking is also greater that denote the better individual and by using the specific limit of S i f the influence will be limited. For G 1 (c) population, the social network is constructed according to social influence. The relationship between ESH i (c) and ESH j (c) is defined as where rd 1 is a random number in [0, 1]. When the social influence is greater, the individual has the strongest connection with other individuals as shown in Equation (10), and enhance the likelihood possibility (t (j) (c)), and when there is less social influence, the likelihood (t i (c)) of the relationship enhances between the individuals and other individuals. More Individuals can adopt the best individual behavior. The greater an individual's social influence, the more interaction between the individuals. The ESH i (c) the adaptive neighborhood of individuals built up the relationship between individuals: In the algorithm, the exploitation stage is in the center of the best search solution, and the exploration ability is finished due to interaction between the group members. The new search strategy of a whale is recognized using community adaptive strategy and linearly increasing probability. The new strategy is described here.
If p rob1 < p i , the jth dimension of ith individual, ESH (i) (c) in population G 1 (c) updates its position as follows. where the adaptive neighborhood procedure is used by the algorithm to explore. This process is described in Equation (13). Let the following: Then, where Prob 1 , prob 2 , rd and rd are the arbitrary number in [0, 1], P i is represented in . Using Equations (12) and (13), updating individuals fully utilizes the most recent best solution and individual adaptive neighborhood information while effectively increasing population diversity.

Morlet Wavelet Mutation
MEWOA holds the key to breaking out of the local optimum for optimization problems involving extreme points of intensive distribution. In biological growth, change is the main factor. To adjust the mutation space dynamically that increases the solution frequency. The amplitude function can be reduced by fixing the wavelet function, extending parameters, and fixing the mutation space of the number of iterations to a specific limit, the change operation can be grasped by using the fine-tuning effect. The WOA is incorporated by using the wavelet mutation to improve the algorithm's convergence and correctness speed and by allowing it to release from local optimization by enhancing its ability. The purpose of the change in the algorithm's exploration phase is to find the best solution from all other solutions.
Suppose the Prob m is the mutation probability, and random no in [0, 1] is represented by rd. When P rob1 ≥ P i and rd ≥ prob m , modified wavelet mutation represent the position of whale according to prob m .
The random number is represented by P rob1 , P rob2 , rd, and rd in the range of [0 1] as mentioned above. The upper and lower bounds of jth dimensions are described by y j and Wavelet mutation is $(ESH) and $(ESH) = g − esh 2 2 . cos(5ESH), the function energy 99% is consists of [−2.5, 2.5], so ∅J in [−2.5v, 2.5v] is represented as a random number.
When iterations increase the scaling parameter v also increases makes possible for the algorithm to find the best solution when there are huge at the end of the iteration.
The constant number is represented by a. The proposed Algorithm 2 MEWOA is presented below.

Start
Parameters to initialize MEWOA such as a, c max , e, t, PN, S if ,prob m The initial population randomly generates Update p i according to Equation (6), compute the neighborhood PN (i)(c) of whale, for each search individual ESH (i)(c) according to Equations (8)-(11) If (Prob 1 < p i ) update the whale individual by using Equation (12); Else If Prob 2 < 0.5 Use Equation (13); Else Use Equation (14)

Results
The experimental results are offered in this section by using three datasets: CBIS-DDSM, Mini-MIAS, and INbreast. The detail of the datasets is given in Section 2.1. The results of each dataset are measured by applying the deep learning models from a different perspective. For the validation purpose, several classifiers of machine learning are applied by using 10-fold cross-validation. In the 10-fold cross-validation test, the provided learning set is divided into ten distinct subsets of comparable size.
The number of subsets created is referred to as the fold in this context. Then, these subsets are used for training and testing, and the loop is repeated until the model has trained and tested every subset, whereas the 10-fold cross-validation performed better than any other k fold selection.
As a result, the 10-fold cross-validation method is used to validate the models in order to avoid over-and under-fitting during the training process. Different measures like Sensitivity, Precision, F1-Score, AUC, FPR, Accuracy, and Time are computed to evaluate the performance of the proposed method. All the training is conducted on MATLAB2020a by using a Personal Computer of 16 GB Ram and a 4 GB graphics card.

1.
Several experiments are conducted to validate the proposed method.
Classification using serial-based non-redundant fusion approach. 7.
Classification using MEWOA on fused features.

Classification Results
The classification results are conducted on three datasets. Several classifiers are applied to compute the classification results. In Table 2, the Fine-tuned MobilenetV2 model is applied to the CBIS-DDSM dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 90.3%, which is achieved by the Cubic SVM classifier in 260.13 (s). The minimum time is taken by Gaussian Naïve Bayes is 20.432 (s), but the accuracy is 71.5%, which is less than that of Cubic SVM. The second highest accuracy is achieved by Weighted KNN, which is 88.2% in 72.92 (s). The sensitivity rate of each classifier is also calculated, and the best-noted value is 90.20% for Cubic SVM. The sensitivity can be confirmed by using a confusion matrix, as mentioned in Figure 7.   In Table 3, the Fine-tuned Nasnet model is applied to the CBIS-DDSM dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 93.9%, which is achieved by the Cubic SVM classifier in 112.96 (s). In this table, the minimum time is taken by Fine Tree, which is 16.91 (s), but the accuracy is 89.8%, which is less than Cubic SVM.
The second highest accuracy is achieved by WKNN, which is 93.6% in 59.909 (s). Each classifier sensitivity rate is also computed, and the Cubic SVM achieved the best-noted value that is 94%. A confusion matrix, as shown in Figure 8, can be used to confirm it.  In Table 3, the Fine-tuned Nasnet model is applied to the CBIS-DDSM dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 93.9%, which is achieved by the Cubic SVM classifier in 112.96 (s). In this table, the minimum time is taken by Fine Tree, which is 16.91 (s), but the accuracy is 89.8%, which is less than Cubic SVM. The second highest accuracy is achieved by WKNN, which is 93.6% in 59.909 (s). Each classifier sensitivity rate is also computed, and the Cubic SVM achieved the best-noted value that is 94%. A confusion matrix, as shown in Figure 8, can be used to confirm it. In Table 4, MEWOA on MobilenetV2 is applied to the CBIS-DDSM dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 90.0%, which is achieved by Cubic SVM in 132.98 (s). In this table, the minimum time is taken by GN-Bayes, which is 8.70 (s), but the accuracy is 70.5%, which is less than Cubic SVM.
The second highest accuracy is achieved by WKNN, which is 87.4% in 37.385 (s). Every classifier sensitivity rate is also calculated, and Cubic SVM achieved the best-noted value that is 89.95%. A confusion matrix, as shown in Figure 9, can be used to confirm it.  In Table 4, MEWOA on MobilenetV2 is applied to the CBIS-DDSM dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 90.0%, which is achieved by Cubic SVM in 132.98 (s). In this table, the minimum time is taken by GN-Bayes, which is 8.70 (s), but the accuracy is 70.5%, which is less than Cubic SVM. The second highest accuracy is achieved by WKNN, which is 87.4% in 37.385 (s). Every classifier sensitivity rate is also calculated, and Cubic SVM achieved the best-noted value that is 89.95%. A confusion matrix, as shown in Figure 9, can be used to confirm it. In Table 5, MEWOA on Nasnet is applied to the CBIS-DDSM dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 93.50%, which is achieved by Cubic SVM in 73.24 (s).
In this table, the minimum time is taken by GN-Bayes, which is 11.57 (s), but the accuracy is 83.7%, which is less than Cubic SVM. The second highest accuracy is achieved by QSVM, which is 92.30% in 75.26 (s). The sensitivity rate of each classifier is also computed, and the Cubic SVM achieves the best-noted value that is 93.50%. The sensitivity can be calculated by the confusion matrix, as illustrated in Figure 10.  In Table 5, MEWOA on Nasnet is applied to the CBIS-DDSM dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 93.50%, which is achieved by Cubic SVM in 73.24 (s). In this table, the minimum time is taken by GN-Bayes, which is 11.57 (s), but the accuracy is 83.7%, which is less than Cubic SVM. The second highest accuracy is achieved by QSVM, which is 92.30% in 75.26 (s). The sensitivity rate of each classifier is also computed, and the Cubic SVM achieves the best-noted value that is 93.50%. The sensitivity can be calculated by the confusion matrix, as illustrated in Figure 10. In Table 6, by using CBIS-DDSM dataset, Serial Fusion on MobilenetV2 and Nasnet deep features is applied. The GAP layer extracts deep features from the dataset and feeds them to classifiers.
The highest accuracy is 94.1%, which is achieved by Cubic SVM in 314.97 (s). In this table, the minimum time is taken by GN-Bayes, which is 55.161 (s), but the accuracy is 85.5%, which is less than Cubic SVM. The second highest accuracy is achieved by QSVM, which is 93.0% in 265.45 (s). Each classifier sensitivity rate is also calculated and the Cubic SVM has the best-noted value that is 94.1% and can be confirmed by using the confusion matrix as described in Figure 11.  In Table 6, by using CBIS-DDSM dataset, Serial Fusion on MobilenetV2 and Nasnet deep features is applied. The GAP layer extracts deep features from the dataset and feeds them to classifiers. The highest accuracy is 94.1%, which is achieved by Cubic SVM in 314.97 (s). In this table, the minimum time is taken by GN-Bayes, which is 55.161 (s), but the accuracy is 85.5%, which is less than Cubic SVM. The second highest accuracy is achieved by QSVM, which is 93.0% in 265.45 (s). Each classifier sensitivity rate is also calculated and the Cubic SVM has the best-noted value that is 94.1% and can be confirmed by using the confusion matrix as described in Figure 11. In Table 7, MEWOA on fusion is applied to the CBIS-DDSM dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers.
The highest accuracy is 93.8%, which is achieved by Cubic SVM in 255.84 (s). In this table, the minimum time is taken by Fine Tree, which is 42.42 (s), but the accuracy is 88%, Figure 11. Fusion on MobilenetV2 and Nasnet TPR for CBIS-DDSM.
In Table 7, MEWOA on fusion is applied to the CBIS-DDSM dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 93.8%, which is achieved by Cubic SVM in 255.84 (s). In this table, the minimum time is taken by Fine Tree, which is 42.42 (s), but the accuracy is 88%, which is less than Cubic SVM.
The second highest accuracy is achieved by QSVM, which is 93.0% in 227.28 (s). Each classifier sensitivity rate is also calculated and Cubic SVM has a best-noted value that is 93.75% and can be verified by using the confusion matrix as defined in Figure 12.
In Figure 13, the deep learning model's time comparison graph by using the machine learning classifiers is shown.
The Fine Tree utilized maximum time in the MEWOA in Fine-tuned Nasnet model. The FG-SVM classifier utilized maximum time in serial fusion. The GN-Bayes utilized minimum time.
In Table 8, the Fine-tuned MobilenetV2 model is applied to the MIAS dataset. The GAP layer is used to extract the deep features and fed to classifiers. In Figure 13, the deep learning model's time comparison graph by using the machine learning classifiers is shown.
The Fine Tree utilized maximum time in the MEWOA in Fine-tuned Nasnet model. The FG-SVM classifier utilized maximum time in serial fusion. The GN-Bayes utilized minimum time.  In Figure 13, the deep learning model's time comparison graph by using the machine learning classifiers is shown.
The Fine Tree utilized maximum time in the MEWOA in Fine-tuned Nasnet model. The FG-SVM classifier utilized maximum time in serial fusion. The GN-Bayes utilized minimum time. In Table 8, the Fine-tuned MobilenetV2 model is applied to the MIAS dataset. The GAP layer is used to extract the deep features and fed to classifiers.
The highest accuracy is 99.4%, which is achieved by the Cubic SVM classifier in 85.29 (s). In this table, the minimum time is taken by Fine Tree, which is 22.82 (s), but the accuracy is 88.9%, which is less than Cubic SVM.  The highest accuracy is 99.4%, which is achieved by the Cubic SVM classifier in 85.29 (s). In this table, the minimum time is taken by Fine Tree, which is 22.82 (s), but the accuracy is 88.9%, which is less than Cubic SVM.
The second highest accuracy is achieved by QSVM, which is 99.3% in 79.88 (s). Each classifiers sensitivity is computed and Cubic SVM has 98.73% best-noted value that can be verified by using the confusion matrix that is presented in Figure 14.
In Table 9, the Fine-tuned Nasnet model is applied to the MIAS dataset. The GAP layer extracts the dataset's deep features and feeds them to the classifiers.
The highest accuracy is 99.7%, which is achieved by the WKNN classifier in 81.459 (s). In this table, the minimum time is taken by Fine Tree, which is 35.187 (s), but the accuracy is 99.1%, which is less than WKNN.
The second highest accuracy is achieved by Cubic SVM, which is 99.6% in 267.43 (s). Every classifier sensitivity rate is computed and WKNN has a 99.2% best sensitivity rate that can be confirmed by using a confusion matrix, as described in Figure 15. In Table 9, the Fine-tuned Nasnet model is applied to the MIAS dataset. The GAP layer extracts the dataset's deep features and feeds them to the classifiers.
The highest accuracy is 99.7%, which is achieved by the WKNN classifier in 81.459 (s). In this table, the minimum time is taken by Fine Tree, which is 35.187 (s), but the accuracy is 99.1%, which is less than WKNN.
The second highest accuracy is achieved by Cubic SVM, which is 99.6% in 267.43 (s). Every classifier sensitivity rate is computed and WKNN has a 99.2% best sensitivity rate that can be confirmed by using a confusion matrix, as described in Figure 15.    In Table 10, MEWOA on Fine-tuned MobilenetV2 model is applied to the MIAS dataset. The GAP layer is used to extract the deep features and fed to classifiers.
The highest accuracy is 99.4%, which is achieved by the Cubic SVM classifier in 75.49 (s). In this table, the minimum time is taken by Fine Tree that is 20.09 (s), but the accuracy is 89.1%, which is less than Cubic SVM. In Table 10, MEWOA on Fine-tuned MobilenetV2 model is applied to the MIAS dataset. The GAP layer is used to extract the deep features and fed to classifiers. The highest accuracy is 99.4%, which is achieved by the Cubic SVM classifier in 75.49 (s). In this table, the minimum time is taken by Fine Tree that is 20.09 (s), but the accuracy is 89.1%, which is less than Cubic SVM.
The second highest accuracy is achieved by FKNN, which is 99.3% in 65.81 (s). Each classifier sensitivity rate is calculated and Cubic SVM has a best-noted value that is 98.87%. The sensitivity rate can be confirmed by using the confusion matrix as described below in Figure 16.  In Table 11, MEWOA on the Fine-tuned Nasnet model is applied to the MIAS dataset. The deep features are extracted from the GAP layer and fed to classifiers. The highest accuracy is 99.7%, which is achieved by the WKNN classifier in 24.70 (s). In this table, the minimum time is taken by Fine Tree, which is 9.40 (s), but the accuracy is 98.9%, which is less than that of WKNN.
The second highest accuracy is achieved by Cubic SVM, which is 99.6% in 18.35 (s). Each classifier sensitivity rate is calculated and WKNN has the best value of 99%, which can be confirmed using the confusion matrix shown in Figure 17.  In Table 11, MEWOA on the Fine-tuned Nasnet model is applied to the MIAS dataset. The deep features are extracted from the GAP layer and fed to classifiers. The highest accuracy is 99.7%, which is achieved by the WKNN classifier in 24.70 (s). In this table, the minimum time is taken by Fine Tree, which is 9.40 (s), but the accuracy is 98.9%, which is less than that of WKNN. The second highest accuracy is achieved by Cubic SVM, which is 99.6% in 18.35 (s). Each classifier sensitivity rate is calculated and WKNN has the best value of 99%, which can be confirmed using the confusion matrix shown in Figure 17. In Table 12, Fusion on Fine-tuned MobilenetV2 and the Nasnet model are applied on the MIAS dataset. The GAP layer extracts deep features from the dataset and fed them to the classifiers.
The highest accuracy is 99.8%, which is achieved by the Cubic SVM classifier in 133.46 (s). In this table, the minimum time is taken by GN-Bayes, which is 60.069 (s), but the accuracy is 96.4%, which is less than that of Cubic SVM.
The second highest accuracy is achieved by Linear SVM, which is 99.6% in 115.46 (s). Each classifier sensitivity rate is calculated and Cubic SVM has the best value that is 99.66% and can be verified by confusion matrix as described in Figure 18.  In Table 12, Fusion on Fine-tuned MobilenetV2 and the Nasnet model are applied on the MIAS dataset. The GAP layer extracts deep features from the dataset and fed them to the classifiers. The highest accuracy is 99.8%, which is achieved by the Cubic SVM classifier in 133.46 (s). In this table, the minimum time is taken by GN-Bayes, which is 60.069 (s), but the accuracy is 96.4%, which is less than that of Cubic SVM.
The second highest accuracy is achieved by Linear SVM, which is 99.6% in 115.46 (s). Each classifier sensitivity rate is calculated and Cubic SVM has the best value that is 99.66% and can be verified by confusion matrix as described in Figure 18. In Table 13, MEWOA on Fusion is applied to the MIAS dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers.
The highest accuracy is 99.8%, which is achieved by the Cubic SVM classifier in 63.287 (s). In this table, the minimum time is taken by GN-Bayes, which is 7.9 (s), but the accuracy is 95.7%, which is less than that of Cubic SVM.
The second highest accuracy is achieved by QSVM, which is 99.7% in 15.37 (s). The sensitivity rate of each classifier is also computed, and the best-noted value for Cubic SVM is 99 %, which can be verified using a confusion matrix, as shown in Figure 19.  In Table 13, MEWOA on Fusion is applied to the MIAS dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 99.8%, which is achieved by the Cubic SVM classifier in 63.287 (s). In this table, the minimum time is taken by GN-Bayes, which is 7.9 (s), but the accuracy is 95.7%, which is less than that of Cubic SVM.
The second highest accuracy is achieved by QSVM, which is 99.7% in 15.37 (s). The sensitivity rate of each classifier is also computed, and the best-noted value for Cubic SVM is 99 %, which can be verified using a confusion matrix, as shown in Figure 19.  In Table 14, Fine-tuned MobilenetV2 was applied on the INbreast dataset. The GAP layer is used to extract the deep features of the dataset and fed to classifiers. The highest accuracy is 98.3%, which is achieved by the LSVM classifier in 18.80 (s). In this table, the minimum time is taken by GN-Bayes, which is 13.53 (s), but the accuracy is 94.8%, which is less than that of Linear SVM. The second highest accuracy is achieved by QSVM, which is 98.2% in 16.11 (s).  In Table 14, Fine-tuned MobilenetV2 was applied on the INbreast dataset. The GAP layer is used to extract the deep features of the dataset and fed to classifiers. The highest accuracy is 98.3%, which is achieved by the LSVM classifier in 18.80 (s). In this table, the minimum time is taken by GN-Bayes, which is 13.53 (s), but the accuracy is 94.8%, which is less than that of Linear SVM. The second highest accuracy is achieved by QSVM, which is 98.2% in 16.11 (s). In Table 14, Fine-tuned MobilenetV2 was applied on the INbreast dataset. The GAP layer is used to extract the deep features of the dataset and fed to classifiers. The highest accuracy is 98.3%, which is achieved by the LSVM classifier in 18.80 (s). In this table, the minimum time is taken by GN-Bayes, which is 13.53 (s), but the accuracy is 94.8%, which is less than that of Linear SVM. The second highest accuracy is achieved by QSVM, which is 98.2% in 16.11 (s). The sensitivity rate of each classifier is also computed, and the best-noted value is 98.35% for LSVM. It can be confirmed using a confusion matrix, as illustrated in Figure 21. The sensitivity rate of each classifier is also computed, and the best-noted value is 98.35% for LSVM. It can be confirmed using a confusion matrix, as illustrated in Figure  21.  In Table 15, Fine-tuned Nasnet is applied on the INbreast dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 98.6%, which is achieved by the Cubic SVM classifier in 10.85 (s). In this table, the minimum time is taken by QSVM, which is 9.549 (s), and the accuracy is 98.6%.
The second highest accuracy is achieved by GN-Bayes, which is 98.4% in 13.07 (s). Each classifier sensitivity rate is calculated and QSVM has a best-noted value that is 98.5%. The sensitivity rate can be verified by the confusion matrix that is described in Figure 22.  In Table 15, Fine-tuned Nasnet is applied on the INbreast dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 98.6%, which is achieved by the Cubic SVM classifier in 10.85 (s). In this table, the minimum time is taken by QSVM, which is 9.549 (s), and the accuracy is 98.6%. The second highest accuracy is achieved by GN-Bayes, which is 98.4% in 13.07 (s). Each classifier sensitivity rate is calculated and QSVM has a best-noted value that is 98.5%. The sensitivity rate can be verified by the confusion matrix that is described in Figure 22. In Table 16, MEWOA was applied on Fine-tuned MobilenetV2 by using INbreast dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 98.3%, which is achieved by the Fine KNN classifier in 35.41 (s). In this table, the minimum time is taken by GN-Bayes, which is 8.4453 (s), and the accuracy is 94.0%.
The second highest accuracy is achieved by QSVM, which is 98.2% in 8.86 (s). The sensitivity rate of each classifier is also computed, and the best-noted value is 98% for Cubic SVM. A confusion matrix, as shown in Figure 23, can be used to verify it.  In Table 16, MEWOA was applied on Fine-tuned MobilenetV2 by using INbreast dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers. The highest accuracy is 98.3%, which is achieved by the Fine KNN classifier in 35.41 (s). In this table, the minimum time is taken by GN-Bayes, which is 8.4453 (s), and the accuracy is 94.0%. The second highest accuracy is achieved by QSVM, which is 98.2% in 8.86 (s). The sensitivity rate of each classifier is also computed, and the best-noted value is 98% for Cubic SVM. A confusion matrix, as shown in Figure 23, can be used to verify it.
In Table 17, MEWOA was applied on Fine-tuned Nasnet by using INbreast dataset. The GAP layer is used to extract the deep features of the dataset and features are fed to classifiers.
The highest accuracy is 98.6%, which is achieved by the Cubic SVM classifier in 6.24 (s). In this table, the minimum time is taken by QSVM, which is 4.55 (s), and the accuracy is 98.6%.
The second highest accuracy is achieved by WKNN, which is 98.5% in 10.47 (s). The sensitivity rate of each classifier is also computed and Cubic SVM has the best noted value that is 98.5 and It can be verified by using a confusion matrix, as described in Figure 24. In Table 17, MEWOA was applied on Fine-tuned Nasnet by using INbreast dataset. The GAP layer is used to extract the deep features of the dataset and features are fed to classifiers.
The highest accuracy is 98.6%, which is achieved by the Cubic SVM classifier in 6.24 (s). In this table, the minimum time is taken by QSVM, which is 4.55 (s), and the accuracy is 98.6%.
The second highest accuracy is achieved by WKNN, which is 98.5% in 10.47 (s). The sensitivity rate of each classifier is also computed and Cubic SVM has the best noted value that is 98.5 and It can be verified by using a confusion matrix, as described in Figure 24.    In  The highest accuracy is 99.9%, which is achieved by the Cubic SVM classifier in 23.04 (s). In this table, the minimum time is taken by Fine Tree, which is 17.63 (s), and the accuracy is 98.8%.
The second highest accuracy is achieved by QSVM, which is 99.8% in 23.68 (s). The sensitivity rate of each classifier is also computed, and the best-noted value is 99.9% for Cubic SVM. The confusion matrix is illustrated in Figure 25 to verify the results.
In Table 19, MEWOA on Fusion is applied to the INbreast dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers.
The highest accuracy is 99.7%, which is achieved by the WKNN classifier in 6.57 (s). In this table, the minimum time is taken by Cubic SVM is 1.6178 (s), and the accuracy is 99.1%.
The second highest accuracy is achieved by Quadratic SVM, which is 99.6% in 1.9933 (s). The sensitivity rate of each classifier is also computed, and the Cubic SVM has a best-noted value that is 99%. A confusion matrix, as shown in Figure 26, can be used to verify the sensitivity rate. In Table 19, MEWOA on Fusion is applied to the INbreast dataset. The deep features of the dataset are extracted from the GAP layer and fed to classifiers.
The highest accuracy is 99.7%, which is achieved by the WKNN classifier in 6.57 (s). In this table, the minimum time is taken by Cubic SVM is 1.6178 (s), and the accuracy is 99.1%.
The second highest accuracy is achieved by Quadratic SVM, which is 99.6% in 1.9933 (s). The sensitivity rate of each classifier is also computed, and the Cubic SVM has a best-noted value that is 99%. A confusion matrix, as shown in Figure 26, can be used to      In Figure 27, the time comparison graph of the deep learning models by using the machine learning classifiers is presented. FG-SVM utilized maximum time in the fusion model. The second highest time was utilized by the Fine KNN classifier in Fine-tuned MobilenetV2. The QSVM utilized minimum time.  Table 20 illustrates comparisons of CBIS-DDSM classification images with respect to other classification studies. The number of images, methods, sensitivity, precision, F1-score, AUC, and accuracy is mentioned in the table. The proposed method shows better results as compared to the other studies.    Table 21 contrasts comparative analysis on MIAS dataset classification images. The number of images, methods, sensitivity, precision, F1-score, AUC, and accuracy compared with the other studies. The proposed method shows good results compared to other studies.  Table 22 shows classification comparisons of the INbreast images with respect to other classification studies. The number of images, methodology, sensitivity, F1-Score, precision, AUC, and accuracy results are mentioned in the table. The proposed method shows good results as compared to the other studies.

Discussion
Breast cancer for women is a fatal disease all over the world. The women's life can be saved if cancer is detected at an initial phase. To classify the mammogram images on the basis of features is difficult because optimal features extraction from mammogram images is a challenging task. The three publically available mammogram images dataset of CBIS-DDSM, INbreast, and mini-MIAS are taken to extract the features and perform the classification. Data augmentation is performed to increase the volume of data. Deep learning models achieve best when train with large datasets.The datasets are simulated using the Fine-tuned MobilenetV2 and Nasnet Mobile approaches. To improve the model efficiency the deep features are extracted from the middle layer and fed into MEWOA. The ideal features are selected by using the MEWOA, which reduces the computational cost. The optimal features are selected by using the MEWOA, which reduces the computational cost. The serial fusion is performed by using the MEWOA on MobilenetV2 and MEWOA Nasnet mobile. In the latter, the MEWOA is performed on the fused features to select the best optimized features. The machine learning classifiers are applied. To estimate the performance of the system different measures are applied such as Sensitivity, Precision, F1-Score, AUC, FPR, Accuracy, and Time are computed. All computation is performed on MATLAB2020a by using the personal computer of 16 GB RAM, 4 GB graphics card is used. The comparison time graphs of the figures are made to represent the comparisons of the different classifiers.
The limitation of this proposed approach is that it entails a large volume of the datasets. To rise the size of the datasets, data augmentation is required. The results increased when the data size is large. The deep learning training on a large number of datasets also takes more time. The transfer learning approach is used to increase the efficiency of the system.

Conclusions
In medical imaging field, extract the features and on the basis of optimized features classification of images is the main domain by using the deep learning procedures. The machine learning classifiers are applied to generate more productive results. This proposed work employed Fine-tuned MobilenetV2 and Nasnet Mobile models to perform the training of the three imbalanced datasets. To extract the deep features the average pool layer is used. Transfer learning and the Adam optimization approaches are utilized to extract the deep features by using the MEWOA on fine-tuned models. The extracted deep features of these optimized models are fused by using the non-redundant serial fusion. The fused deep features are again optimized by using the MEWOA. Finally, classification results are established by applying the machine learning classifiers. The fusion practice increases the accuracy of the results but increases the time of the system. The MEWOM is applied, which optimized the features by reducing the time of computation. By using these techniques, the false-negative and true positive rates decreased. This methodology will be helpful for the radiologists as a second opinion to address the problems of optimal feature extraction and on the basis of optimal features perform the classifications.