Multimodal Brain Tumor Classification Using Deep Learning and Robust Feature Selection: A Machine Learning Application for Radiologists

Manual identification of brain tumors is an error-prone and tedious process for radiologists; therefore, it is crucial to adopt an automated system. The binary classification process, such as malignant or benign is relatively trivial; whereas, the multimodal brain tumors classification (T1, T2, T1CE, and Flair) is a challenging task for radiologists. Here, we present an automated multimodal classification method using deep learning for brain tumor type classification. The proposed method consists of five core steps. In the first step, the linear contrast stretching is employed using edge-based histogram equalization and discrete cosine transform (DCT). In the second step, deep learning feature extraction is performed. By utilizing transfer learning, two pre-trained convolutional neural network (CNN) models, namely VGG16 and VGG19, were used for feature extraction. In the third step, a correntropy-based joint learning approach was implemented along with the extreme learning machine (ELM) for the selection of best features. In the fourth step, the partial least square (PLS)-based robust covariant features were fused in one matrix. The combined matrix was fed to ELM for final classification. The proposed method was validated on the BraTS datasets and an accuracy of 97.8%, 96.9%, 92.5% for BraTs2015, BraTs2017, and BraTs2018, respectively, was achieved.


Introduction
A brain tumor is an abnormal growth of brain cells in an uncontrollable way [1,2]. Brain tumors can be cancerous or noncancerous. The gravity inside the skull can accelerate the growth of a brain tumor. In the worst case, it can cause brain damage, which can be life-threatening. According to an  [15].
In this work, we propose a deep learning scheme for multimodal brain tumors classification. To handle the problem of shallow contrast, we implemented a linear contrast enhancement technique, which was further refined through histogram equalization. Transfer learning was used for feature extraction from two different CNN models and the fusion was performed. The motivation behind the fusion of two CNN models was to get a new feature vector with more information. Although this process improved accuracy, the computational time was affected. To further enhance efficiency and computational time, we proposed a feature selection technique. The robust features obtained using  [15]. In this work, we propose a deep learning scheme for multimodal brain tumors classification. To handle the problem of shallow contrast, we implemented a linear contrast enhancement technique, which was further refined through histogram equalization. Transfer learning was used for feature extraction from two different CNN models and the fusion was performed. The motivation behind the fusion of two CNN models was to get a new feature vector with more information. Although this process improved accuracy, the computational time was affected. To further enhance efficiency and computational time, we proposed a feature selection technique. The robust features obtained using this technique were later classified through the Extreme Learning Machine (ELM).

•
We divided the image into two clusters based on a K-Means clustering algorithm and applied edge-based histogram equalization on each image. Further, the discrete cosine transform (DCT) was utilized for local information enhancement.

•
Deep learning features were extracted from two pre-trained CNN models through transfer learning (TL). The last FC layer was used in both models for feature extraction.

•
The Partial Least Square (PLS) based features of both CNN models were fused in one matrix.

•
The robust features were selected using correntropy-based joint group learning. The robust features were finally classified using the ELM classifier.

•
Three datasets such as BRATS 2015, BRATS 2017, and BRATS 2018 were used for the experiments and the statistical analysis to examine the scalability of the proposed classification scheme.

Related Work
Classification of multimodal brain tumors (i.e., T1, T2, T1CE, and Flair) required the determination of altered features, such as shape and texture in the MRI Image [16]. The popular method of diagnosis of these tumors-which spread widely among computer vision researchers-is a computer-aided diagnosis (CAD) system [1,17]. In a CAD system, two main stages are involved-first, tumor preprocessing and detection, and second, classifying the tumor into a relevant category. In this work, we focused on the classification task of multimodal brain tumors. For classification, we used the BRATS series based on few top submissions [18][19][20][21]. Amin et al. [22] introduced a CNN framework for brain tumor classification. In the presented method, the DWT fusion process was performed to improve the original MRI scan and then a partial diffusion filter was employed for noise removal. Later on, they used a global thresholding algorithm for tumor extraction that passed to the CNN model for classification of tumors into the related categories. Five BRATS datasets, namely, BRATS2012, 2013, 2015, 2018, and BRATS2013 were used and showed improved performance on the fusion approach. Sajjad et al. [23] presented a CNN-based multimodal tumor classification system. They initially segmented the tumor regions in the MRI scans using CNN. Then, they performed an extensive data augmentation to train a good CNN model. Later on, they fine-tuned the pre-trained CNN model using augmented brain data. The last layer was used as a classification of tumors in the presented method and it showed that augmented data gave better results on the selected datasets.
Sharif et al. [24] presented an active deep learning system for the segmentation and classification of brain tumors. They initially performed contrast enhancement, and the resultant image was passed to the Saliency-based Deep Learning (SbDL) method, for the construction of a saliency map. The thresholding was applied in the next step, and the resultant images were used to fine-tune the pre-trained CNN model Inception V3. Further, they also extracted dominant rotated local binary pattern (DRLBP) features, fused with CNN features. Later on, a PSO-based optimization was performed and the optimal vector was passed to the Softmax classifier for final classification. They used BRATS 2015, 2017, and 2018 datasets for evaluation, and achieved improved classification accuracy. In [25], the authors presented a CNN-based scheme for the classification of brain tumors. They considered the problem of structural variability of the tumor around the adjacent regions. For this purpose, they designed Diagnostics 2020, 10, 565 4 of 19 small kernels to keep the weights of each neuron very small. Taking advantage of these weights, they achieved an accuracy of 97.5%.
Vijh et al. [26] presented an adaptive particle swarm optimization (PSO) with the Otsu method to find the optimal threshold value. Later, they applied anisotropic diffusion (AD) filtering on brain MRI images to cancel noise and improve image quality. Features were extracted from enhanced images that were used both for training the CNN and performing the classification. Other methods were also introduced in the literature for brain tumor classification, such as a generative adversarial network (GAN)-based approach [19], artificial neural network (ANN)-based learning [27], ELM-based learning [28], residual network [29], standard-features-based classification [30,31], adaptive independent subspace analysis [32], transfer learning-based tumors classification [33], and Excitation DNN [34]. In addition, Togaçar et al. [35] proposed a hybrid method based on CNN and feature selection, for the classification of brain tumors. They achieved an improved accuracy of above 90%. In the above techniques, they did not provide the computational time. However, the computational time was most needed for this current era for each automated system. The more recent, Muhammad et al. [36] presented a detailed review on multi-grade brain tumor classification. They presented a detailed description of brain tumor classification (BTC) steps like preprocessing of tumor, deep learning features, and classification. They discussed detailed limitations and achievements of existing deep learning techniques for BTC. In addition, they also presented the importance of transfer learning for deep learning feature extraction.

Proposed Methodology
In this section, the proposed methodology for multimodal brain tumor classification using deep learning is presented. The proposed method consists of five core steps-linear contrast stretching, deep learning features extraction using transfer learning, a correntropy-based joint learning approach along with ELM for best features selection, the PLS-based fusion of the selected features, and finally the ELM-based classification. The testing of the proposed method was performed on the BRATS datasets. The performance of the approach was checked using standard performance measures like accuracy and false negative rate (FNR). Furthermore, the performance of the proposed work was also reported by measuring the execution time. A detailed flow of the proposed methodology is illustrated in Figure 2. In the following, the technical description of each step is provided.

Linear Contrast Enhancement
Improving the graphic features of an image is the primary objective of contrast enhancement. It is a preprocessing step that is used in many applications like biomedical imaging, agriculture infections diagnosis, and some others [37][38][39][40][41][42]. The impact of low contrast images is not useful for feature extraction, as visually, tumors are not visible and error prone. Therefore, in this step, we

Linear Contrast Enhancement
Improving the graphic features of an image is the primary objective of contrast enhancement. It is a preprocessing step that is used in many applications like biomedical imaging, agriculture infections diagnosis, and some others [37][38][39][40][41][42]. The impact of low contrast images is not useful for feature extraction, as visually, tumors are not visible and error prone. Therefore, in this step, we improved the linear contrast of an image, which showed the main impact on the tumor region. For this purpose, we implemented a hybrid technique. In this technique, initially, we split the image into two parts using the K-Means clustering algorithm. Then, edge-based texture histogram equalization (HE) was applied. Later on, DCT was applied to combine both clusters in one image. The resulting image had enhanced contrast as compared to the original one. The mathematical formulation of this method is given as follows: Consider, we have a dataset ∆ = {τ 1 , τ 2 , τ 3 , . . . , τ N }, τ N ∈ R d . Consider τ(x, y) is an MRI image of dimension N × M where N = 256 and M = 256, rows, and columns, respectively. Let τ i denotes the average of clusters K i then using this, the criterion function is defined as follows: where S denotes the sum of square error of all pixels, τ i means input images, and K implies the number of clusters that are initialized in this work. In K-Means, the Euclidean distance was used to as criterion distance, which was defined as follows: where τ i and y i are two vectors. This formulation obtained two clusters. Using resultant images defined by τ 1 (x, y), we employed edge-based texture HE, where τ 1 (x, y) ∈ S. For the resultant image τ 1 (x, y), the gradient was computed as follows: where G x and G y denotes x derivatives and y derivatives of τ 1 (x, y), respectively. Later, the edge map was constructed using a threshold function, as follows: From this equation, we considered the pixels with values higher than the threshold (T = 0.55). These pixels were used for texture histogram computation (HC). Later on, α and β were calculated, where α denotes minimum and β denotes maximum pixel value. The grey levels whose value lied between α and β, were represented as HC. Finally, the cumulative distribution function (CDF) and the transfer functions were applied to obtain an enhanced image. This was defined by Equations (5) and (6), as follows: The resultant image τ 2 (x, y) ∈ CDF(i)&F τ was passed to the DCT method to refine the local contrast of the tumor region. Mathematically, this was computed as follows: Diagnostics 2020, 10, 565 6 of 19 Hence, using τ xy , the DCT method was applied to an image τ 2 (x, y), as follows: As τ xy is a real orthogonal matrix and its inverse could be computed as: where t denotes the transpose of an image. Hence, the representation of the final DCT enhanced image τ 3 (x, y) is depicted in Figure 3. In this figure, the sample enhancement results are presented for each step (top to bottom).
The resultant image 2 ( , ) ∈ ( )& was passed to the DCT method to refine the local contrast of the tumor region. Mathematically, this was computed as follows: Hence, using , the DCT method was applied to an image 2 ( , ), as follows: As is a real orthogonal matrix and its inverse could be computed as: where denotes the transpose of an image. Hence, the representation of the final DCT enhanced image 3 ( , ) is depicted in Figure 3. In this figure, the sample enhancement results are presented for each step (top to bottom).

Deep Learning Features
The deep learning features were extracted using two pre-trained deep CNN models-VGG16 and VGG19. The visual representation of both models is shown in Figures 4 and 5, respectively.
The VGG16 model consisted of 12 convolution layers, 15 ReLu activation layers, five max-pooling layers, three fully connected (FC) layers, and one Softmax layer, as a classification layer. The input layer size was 224 × 224 × 3. The number of filters in the first convolution layer was 64, and the filter size was 3 × 3 × 3, along with a stride of 1 × 1. In the next convolution layer, the number of filters was not updated but the filter size was updated to 3 × 3 × 64. Further, the dimension of learnable weights was 3 × 3 × 64 × 64, which were 3 × 3 × 3 × 64 in the first convolution layer. The learnable weights of each convolution layer were updated according to the number of filters and the filter size. In the first max-pooling layer, a 2 × 2 filter size was opted along with the same stride 2 × 2. After the convolution layers, three FC layers were added. The learnable weights dimension of the first FC layer was 4096 × 25088. After a 50% dropout, the weights matrix size of the second FC layer was 4096 × 4096. Another dropout layer was added and a ratio of 50% was set. The resultant weight matrix used as an input of the third layer (denoted as FC8) returned a weight matrix of dimension 1000 × 4096. Finally, the Softmax function and the classification layers were added for the final classification.

Deep Learning Features
The deep learning features were extracted using two pre-trained deep CNN models-VGG16 and VGG19. The visual representation of both models is shown in Figures 4 and 5, respectively.

Deep Learning Features
The deep learning features were extracted using two pre-trained deep CNN models-VGG16 and VGG19. The visual representation of both models is shown in Figures 4 and 5, respectively.   VGG19 model consists of a series of 16 convolution layers, 19 ReLu activation layers, four max-pooling layers, three FC layers, and one Softmax layer as a classification layer. The input layer size was 224 × 224 × 3. The number of filters in the first convolution layer was 64, and the filter size was 3 × 3 × 3. This filter size was updated according to the number of filters. In the first max-pooling layer, a 2 × 2 filter size was opted along with the same stride. After the convolution layers, three FC layers were added. The weights dimension of the first FC layer was 4096 × 25088. After a 50% dropout, the weights matrix size of the second FC layer was 4096 × 4096. The resultant weight matrix used as an input of the third layer (denoted as FC8) returned a weight matrix of dimension 1000 × 4096.

Network Modification for Transfer Learning
Using domain adaptation transfer learning (TL) [43], we retrained both models (VGG16 and VGG19) on the BRATS datasets, without changes in any parameters. In the tuning process, first, we loaded the brain datasets and set the training/testing ratio to 60:40. Further, the labels of each image were also defined. Then, we set input and output layers for training. This process was conducted for both deep learning models. In this paper, for the VGG16 model, the input convolution layer (conv_1) was employed, where the number of filters was 64, and the filter size was 3 × 3 × 64. The selected output layer was FC8. Then, we performed activation on this layer and trained a new modified CNN network that included only the brain image features. The last two layers, namely, the classification and Softmax layers were removed. In the output, the resultant learnable weights vector length was 4 × 4096, and the feature-length was 1 × 1000. Hence, for n images, the feature vector length was N × 1000, denoted by η i . Similarly, for the VGG19 model, the last two layers were removed. The convolution layer (conv_1) was employed as an input with 64 filters, and the filter size was 3 × 3 × 64. The selected output layer was FC8, which we chose for the activation function. The activation function was performed on this layer and trained a new modified CNN network that included only the brain image features. The dimension of the learnable weight matrix was 4 × 4096, and the length of the extracted feature vector was 1 × 1000. For n brain images, the feature vector length should be N × 1000 and should be denoted by η j .

Feature Selection
The main motive of the feature selection step was to remove the redundancy among features and select only those features that were robust for the correct classification. The second motive of this step was to minimize the number of predictors, which helped in the fast execution of the testing process. To inspire with these two essential functionalities, we implemented a technique named correntropy via mutual learning and ELM (CML-ELM). The working of this method is presented in Algorithm 1: Algorithm 1 Proposed feature selection method using CML-ELM.

End For End
Diagnostics 2020, 10, 565 9 of 19 In the above algorithm, the notation η i denotes the original feature vector of the VGG16 deep learning model, S w (i) means selected feature vector, LR denotes regularization parameter, b i is a selected parameter, A i is an affine combination of S w (i) and S w (i − 1), MSER denotes mean squared error, computed by Equation (10), and the updating of features S w (i + 1) are done by Equation (11).
where the LR i denotes the observed features, andL R i denotes the predicted features. Each time, the MSER was calculated, and if its value was greater than or equal to 0.1, then the features were updated, iterating this process for 1000 times. If the target was not achieved, then the last iteration features were selected for the classification. Finally, a robust vector was obtained, where the dimension of this vector was X 1 × K and was denoted by η S w (1), where the K stood for the number of selected features and X 1 denoted the total number of images. This feature selection process was also performed for the VGG19 feature vector η j and obtained a robust feature vector of dimension X 2 × K and denoted by η S w (2), where X 2 was the number of observations, and K represented the number of selected features.

Feature Fusion and Classification
Finally, the selected feature vectors were fused in one matrix using the PLS-based fusion approach. Consider η S w (1) and η S w (2) are two selected feature vectors of dimension X 1 × K and X 2 × K. Suppose η S w ( j) represents a fused vector of dimension X 3 × K. Further, we assumed that the central variables When using PLS, a pair of directions among u i and v i was found, as follows: These pairs were combined in one matrix and a resultant vector was obtained with X 3 × K dimension. The fused vector was represented by η S w ( j). Later on, this vector was passed to ELM [44] for the final classification. The formulation of ELM was given as follows. For L hidden layers node, the activation function g(x) was defined as follows: where L denotes a hidden layer, which was initialized as one in this work, β i denotes the output weight vector, u i is the input weight vector coming to the hidden layer, B i denotes the offset value, H is the output hidden layer node, u i .u j means an inner product of u i , and O is the expected output. Equation (19) was solved as:β To further improve the stability of ELM, we defined a minimization function as: where i denotes training error, t i indicates corresponding labels to the sample u i , and c denotes the penalty parameter. The labeled results of the proposed architecture are given in Figure 6.

Experimental Results and Analysis
We present the classification results for the proposed ELM classifier using three datasets, namely, BraTS 2015, BraTS 2017, and BraTS 2018. For all datasets, a 60-40 split ratio was used along with 10-fold cross-validation. The results are provided for two different pipeline procedures, namely; (i) feature extraction from FC layer seven and a performed feature selection approach that followed the feature fusion and classification and (ii) which followed the proposed architecture, as given in Figure 2. For the sake of comparison, we also provided the results for four well-known classifiers, like Naïve Bayes, Multiclass Support Vector Machine (MSVM), Softmax, and Ensemble Tree, as baselines. The performance of all classifiers was validated by the following measures, namely

Experimental Results and Analysis
We present the classification results for the proposed ELM classifier using three datasets, namely, BraTS 2015, BraTS 2017, and BraTS 2018. For all datasets, a 60-40 split ratio was used along with 10-fold cross-validation. The results are provided for two different pipeline procedures, namely; (i) feature extraction from FC layer seven and a performed feature selection approach that followed the feature fusion and classification and (ii) which followed the proposed architecture, as given in Figure 2. For the sake of comparison, we also provided the results for four well-known classifiers, like Naïve Bayes, Multiclass Support Vector Machine (MSVM), Softmax, and Ensemble Tree, as baselines. The performance of all classifiers was validated by the following measures, namely accuracy and FNR measures. Furthermore, the clock time taken by each classifier was also reported to give the reader an idea about the classification time during the testing process. All simulations of the proposed technique were conducted on MATLAB 2019b (MathWorks, Natick, MA, USA). The personal Desktop Computer with 16 GB RAM and 128 GB SSD was used for these experiments. A graphics processing unit (GPU) was also utilized for feature extraction and classification, which significantly helped in improving the classification time. The execution time was also noted during the testing process; however, it was not consistent and was only based on the execution platform. Table 1 presents the classification results for the BraTS 2015 dataset. The results were provided for the proposed classifier, as well as the existing well-known classifiers, such as Naïve Bayes, MSVM, Softmax, and Ensemble Tree. These results were provided for two experimental pipeline procedures, as mentioned above. Apart from the validation measures in terms of accuracy and FNR, the results were also provided for the classification time in seconds. The entries in the bold represent the best results. It can be seen from Table 1 that the minimum accuracy achieved was 91.48% for Softmax. The maximum accuracy of 98.16% (FNR = 1.74%) was achieved by the ELM classifier, which used the proposed method. Pro-FC7 defines feature extraction from the FC7 layer and performed feature selection, as well as fusion, and 'Proposed' denotes the proposed classifier architecture, as given in Figure 2. The best values are shown in bold.

Results for the BraTS 2015 Dataset
The proposed selection scheme also reduced the classification time during the testing process. In Table 1, time is given for all classifiers, which clearly shows that the time for the proposed method was lesser than that compared to Pro-FC7. The classification time for Softmax was minimum (81.02 s), using the proposed method. Though the classification time for the proposed classifier was not minimum (87.41 s), it was still quite close to Softmax and considerably lower, as compared to the rest of classifiers.
The results of the proposed method on the ELM classifier were also verified by the confusion matrix values presented in Table 2. The diagonal values showed the correct classification rate of each tumor class. The maximum achieved accuracy of Pro-FC7 was 96.02% for ELM (Table 1), which could also be verified by the confusion matrix in Table 3.   Table 4 presents the classification results for the BraTS 2017 dataset. Results are provided for the proposed method along with several other well-known classifiers, such as Naïve Bayes, MSVM, Softmax, and Ensemble Tree. These results were provided for two experimental pipeline procedures, as mentioned above. Apart from the validation measures in terms of accuracy and FNR, results were also provided for the classification time in seconds. It can be clearly seen from Table 4 that the ELM classifier, which used the proposed method, had an accuracy of 97.26% and an FNR of 2.74%. The minimum met accuracy was 90.09% for Softmax. The best values are shown in bold.

Results for the BraTS 2017 Dataset
The proposed selection scheme also reduced the classification time during the testing process, as was evident from the results shown in the last column in Table 4. The classification time for ELM was minimum (89.64 sec) using the proposed method, which clearly showed the improved efficiency of the ELM classifier.
The results of the proposed method on the ELM classifier could also be verified by the confusion matrix in Table 5. The diagonal values showed the correct classification rate of each tumor class, which were 96.24%, 98.66%, 97.20%, and 97% for the T1, T1CE, T2, and Flair tumors. The maximum achieved accuracy of Pro-FC7 was 95.82% for ELM, which could be further verified by the results in Table 6.   Table 7 presents the classification results for the BraTS 2018 dataset. Results were provided for the proposed method, as well as for other well-known classifiers, such as Naïve Bayes, MSVM, Softmax, and Ensemble Tree. These results were provided for two experimental pipeline procedures, as discussed earlier in Section 3. It can be seen from this table that the maximum achieved accuracy was 93.40% for the ELM classifier, using the proposed method. The noted FNR rate was 6.60%. The minimum achieved accuracy was 89.49%, using the proposed method for the Naïve Bayes classifier. The best values are shown in bold.

Results of the BraTS 2018 Dataset
The classification accuracy was also computed for Pro-FC7 to analyze the proposed results. For Pro-FC7, the maximum achieved accuracy was 91.69% for the ELM classifier. The accuracy of ELM using the proposed method and Pro-FC7 was further verified through Tables 8 and 9. In both these tables, the diagonal values represent the correct predicted rate of each tumor class, such as T1, T2, T1CE, and Flair.
Time was measured for each classifier during the testing process and presented in Table 7. We used tic-toc commands to compute the testing computational time of proposed method. In this table, it was observed that the best execution time was (63.83 s) for the ELM classifier, using the proposed method. However, this time was based on the platform that was used like GPU, system RAM, etc. Based on the presented results of accuracy and the testing execution time, the effectiveness of the proposed method was apparent for the accurate and efficient brain tumor type classification.

Statistical Analysis of Results
To examine the stability of the proposed method results, a detailed statistical analysis was conducted in terms of variance, standard deviation, and standard error mean (SEM). The noted values were obtained after 1000 iterations. The detailed analysis of the proposed method for the BraTs2015 dataset is given in Table 10. In this table, the accuracy of ELM had low variability, and SEM was 0.1862, which was better than that compared to other methods. Table 11 shows the detailed analysis of the proposed method using the BraTs2017 dataset. The accuracy of ELM was better than that compared to other listed classifiers (SEM is 0.0754). Table 12 illustrates the analysis results for the BraTs2018 dataset. Here, the SEM for the proposed method was 0.2875. As compared to other classifiers, it was better, and the results were stable after the selected iterations. Overall, the results of the proposed method were more stable for all listed classifiers. Moreover, we also plotted the confidence interval of ELM at different confidence levels (CL), such as 90%, 95%, 99%, etc., as shown in Figures 8-10. As shown in Figure 8, at 95% CL, the margin of error was 97.763 ± 0.365 (±0.37%). Similarly, in Figures 9 and 10, the margin of error at 95% CL was 97.1 ± 0.148 (±0.15%) and 92.79 ± 0.564 (±0.61%), respectively. Based on these values, it was shown that our method was significantly better than that compared to other classifiers.

Statistical Analysis of Results
To examine the stability of the proposed method results, a detailed statistical analysis was conducted in terms of variance, standard deviation, and standard error mean (SEM). The noted values were obtained after 1000 iterations. The detailed analysis of the proposed method for the BraTs2015 dataset is given in Table 10. In this table, the accuracy of ELM had low variability, and SEM was 0.1862, which was better than that compared to other methods. Table 11 shows the detailed analysis of the proposed method using the BraTs2017 dataset. The accuracy of ELM was better than that compared to other listed classifiers (SEM is 0.0754). Table 12 illustrates the analysis results for the BraTs2018 dataset. Here, the SEM for the proposed method was 0.2875. As compared to other classifiers, it was better, and the results were stable after the selected iterations. Overall, the results of the proposed method were more stable for all listed classifiers. Moreover, we also plotted the confidence interval of ELM at different confidence levels (CL), such as 90%, 95%, 99%, etc., as shown in Figures 8-10. As shown in Figure 8, at 95% CL, the margin of error was 97.763 ± 0.365 (±0.37%). Similarly, in Figures 9 and 10, the margin of error at 95% CL was 97.1 ± 0.148 (±0.15%) and 92.79 ± 0.564 (±0.61%), respectively.
Based on these values, it was shown that our method was significantly better than that compared to other classifiers.  The best values are shown in bold. Min, Avg, and Max are the minimum, average, and maximum accuracy, respectively. SEM-standard error mean.  The best values are shown in bold. Min, Avg, and Max are the minimum, average, and maximum accuracy, respectively. SEM-standard error mean. .

Discussion
We discuss the results of the proposed method from a critical point of view. The labeled results are illustrated in Figure 6. Three BraTs datasets were used for the validation of the proposed method. The numerical results are presented in Table 1, Table 4, and Table 7. The results presented in these tables were validated through two pipeline procedures, as mentioned in Section 3. The results showed that the accuracy of Pro-FC7 was less, as compared to the proposed architecture. The main reason for the degradation of the classification accuracy was the number of features. For the architecture of VGG19, the feature length on FC7 was 4096, whereas, the feature length of FC8 was 1000; therefore, during the selection process, the target MSER could not be met. Moreover, due to higher number of features, the execution time was also higher for Pro-FC7, as compared to the proposed method.
To give the reader an idea of comparison with the existing techniques, we briefly mentioned some published results. In [24], the authors presented a deep-learning-based system and used the BraTs dataset series for the experimental process. They achieved an accuracy of 97.8%, 96.9%, and 92.5% for the BraTs2015, BraTs2017, and BraTs2018, respectively. Sajjad et al. [23] presented a deep learning model and evaluated this on two datasets-Brain tumor and Radiopaedia. They achieved an accuracy of 94.58% and 90.67%, respectively, on both. Togaçar et al. [35] achieved an average 96.77% accuracy for the classification of healthy and tumor MRI images. The proposed method achieved an accuracy of 98.16%, 97.26%, and 93.40%, which was better than that compared to the accuracy reported for the state-of-the-art techniques. Additionally, the worst time complexity of our algorithm was O n 3 + k + C, where k represents the number of iterations and C is a constant term.
In addition, we also calculated the Mathew correlation coefficient (MCC) measure for the ELM classifier; the results are given in Table 13. In this table, it is shown that the MCC values were closer to 1, which showed the better prediction performance of the proposed scheme. The better values are shown in bold.

Conclusions
This paper presents a fully automated deep learning system, along with contrast enhancement for multimodal brain tumor classification. The strength of this work was in three steps. First, in the preprocessing step, contrast stretching using edge-based texture HE was employed to increase the local contrast of the tumor region. Secondly, the selection of robust deep learning features by implementing correntropy via mutual learning and ELM (CML-ELM) was utilized. Using CML-ELM, the robust features were computed, which were fused through the PLS-based approach, in a later stage. Third, the ELM classifier was implemented for the classification of proposed tumors into the relevant category. The experimental process was conducted on the BraTs datasets and the results showed an improved accuracy (98.16%, 97.26%, and 93.40%, for the BraTs2015, BraTs2017, and BraTs2018 datasets, respectively). The feature selection process was not only helpful for improving the classification accuracy, but also resulted in the reduction of the computational time. Finally, the accuracy results of the proposed method were stable, which could be concluded on the basis of the presented results.