Combining CNN Features with Voting Classifiers for Optimizing Performance of Brain Tumor Classification

Simple Summary This study presents a hybrid model for brain tumor detection. Contrary to manual featur extraction, features extracted from a convolutional neural network are used to train the model. Experimental results show the efficacy of CNN features over manually extracted features and model can detect brain tumor with a 99.9% accuracy. Abstract Brain tumors and other nervous system cancers are among the top ten leading fatal diseases. The effective treatment of brain tumors depends on their early detection. This research work makes use of 13 features with a voting classifier that combines logistic regression with stochastic gradient descent using features extracted by deep convolutional layers for the efficient classification of tumorous victims from the normal. From the first and second-order brain tumor features, deep convolutional features are extracted for model training. Using deep convolutional features helps to increase the precision of tumor and non-tumor patient classification. The proposed voting classifier along with convoluted features produces results that show the highest accuracy of 99.9%. Compared to cutting-edge methods, the proposed approach has demonstrated improved accuracy.


Introduction
Medical image analysis is a growing field that uses a variety of modern image processing techniques. As a result, a variety of diseases can now be detected in a timely manner. Early detection can help in the treatment of most life-threatening diseases such as tumors, eye disease, Alzheimer's, blood clots, and cancer [1]. Biopsies and images of the infected areas are used inr the diagnosis of these life-threatening diseases. Images of the affected areas are typically used to diagnosing diseases in the early stages. Biopsies on the other hand are used to confirm the presence of certain diseases [2]. In such cases, it is crucial that the modeling of infected areas is highly accurate and easily visualized. The brain is a critical organ in the human body and plays a vital role in controlling the body and decision-making. Therefore, brain tumors are life-threatening conditions. Most malignancies involve the nervous system and thus it has important implications regarding diagnosis. The brain parenchyma, also referred to as metastases, is commonly involved [3]. The majority of brain tumors are brain metastases, which are estimated to have an incidence achieved 98.91% accuracy using Inceptionresnetv2 in [17]. The study [18] suggested a conditional segmentation strategy based on a residual network, as well as an attention approach based on an extreme gradient boost. The results showed that the CNN-CRF-Resnet system achieved an accuracy of 99.56% across all three classes. Samee et al. designed a hybrid transfer learning system GN-AlexNet for the classification of brain tumors and achieved an accuracy of 99.51% [19].
An ensemble deep learning-based system was designed by Rasool et al. [20] for the categorization of three different kinds of brain tumors. The authors used the ensemble deep learning model with fine-tuned GoogleNet and achieved an accuracy of 93.1%. As opposed to that, when the authors used GoogleNet as a feature extractor they obtained an accuracy of 98.1%. As genetic mutation is the primary reason for brain cancer, classifying and segmenting brain tumors using genomic information can help in diagnosis [21]. Using AI approaches, it is possible to identify disease-related molecular features from radiological medical images by assessing the genomic state of genetic mutations on numerous genes and cell proteins [22,23]. Authors combined AI with radio genomics for brain tumor detection in [24].
Some studies utilized the same dataset used in this study and have shown promising results. The study [25] employs an ensemble learning approach based on machine learning to detect brain cancers. NGBoost classifier was used alongside ETC, RF, GBC, and ADA for comparison. The findings revealed that the use of NGBoost produced a significantly higher accuracy of 98.54%. Aryan Sagar Methil [26] presented a deep learning approach for detecting brain tumors. Several image processing techniques were applied for obtaining better results. The employed CNN model achieves an accuracy of 95%. Shah et al. [27] utilized MR scans to determine the prognosis of brain malignancies. They proposed a refined EfficientNet-B0 for brain tumor prediction and also employed data augmentation techniques to obtain higher-quality photos. The proposed Efficient-B0 system acheived an accuracy of 98.87%. However, the proposed transfer learning model was a complex neural network model that required millions of parameters to train, which was the key drawback of the study.
This paper aims to develop a simple machine learning-based system that uses CNN as the feature engineering technique to classify patients with brain tumors and normal patients using MRI scan data. In summary, the proposed system offers the following advantages

•
This study proposes an ensemble model that utilizes convolutional features from a customized CNN model for predicting brain tumors. The proposed ensemble model is based on logistic regression and a stochastic gradient descent classifier with a voting mechanism for making the final output. • The impact of the original features is analyzed against the performance of models using convolutional features. • The performance comparison is performed using various machine learning models including random forest (RF), K-nearest neighbor (k-NN), logistic regression (LR), gradient boosting machine (GBM), decision tree (DT), Gaussian Naive Bayes (GNB), extra tree classifier (ETC), support vector machine (SVM), and stochastic gradient descent (SGD). Moreover, the performance of the proposed model is compared with leading-edge methodologies in terms of accuracy, precision, recall, and F1 score.
The remaining sections are arranged as follows. Section 2 discusses the proposed system's components and functions. Section 3 provides the results, whereas Section 4 contains the discussions and conclusion.

Materials and Methods
The 'brain tumor' dataset used for the detection of the disease, the proposed approach, and the steps taken for the proposed framework is discussed in this section. The machine learning classifiers utilized in this work are also briefly described in this section.

Dataset
For the performance comparison, various machine learning models were utilized in this study. The selection of the right dataset is a vital step, this study makes use of the "Brain tumor" dataset which is publicly available on Kaggle [28]. The dataset contained 3762 instances, 13 features, and a target class. Of these 13 features, 5 were first-order features and 8 were texture features. The first-order features were the standard deviation, mean, kurtosis, variance, skewness, and texture features are entropy, contrast, homogeneity, energy, dissimilarity, correlation, coarseness, and ASM (Angular second moment). The target features contained two classes: tumors and non-tumor. Of 3762 instances, 2079 belonged to the non-tumor class and 1683 belonged to the tumor class.

Machine Learning Models
In this work, nine machine learning algorithms were utilized to identify brain tumors including RF, SVM, k-NN, LR, GBM, DT, GNB, ETC, and SGD. A brief explanation of these machine-learning models is given here.

Random Forest
RF [29,30] is a well-known and widely used tree-based machine learning algorithm. From the previous random vector, RF generates the independent random vector and distributes them among all the trees. It is a step-by-step process in which the root node divides the data into its child nodes, and so on until the leaf nodes are reached. In RF, each node of the tree independently classifies the feature's objective variables, and after that class votes. The classification results from the decision trees depend on the majority voting. Error in RF is calculated using the following formula where random vectors are represented by the i and j and these random vectors represent the probability. And f computes the average number of votes across all random vectors for the desired outcome [31], it is calculated as

Decision Tree
A DT is one of the tree-based methods used for the classification of brain tumors. It handles classification and regression problems efficiently [32,33]. The major issue in DT is the finding of the root node at each level. Attribute selection is the method used to identify the root node. "Gini Index" and "information gain" are the attribute selection techniques. The following formula may be used to compute the Gini value.
Impurity in the dataset is calculated using Gini. The other method used for attribute selection is information gain. It calculates the purity of the dataset. Information Gain for each attribute can be calculated using the following steps Step 1: determine the target's entropy.
The following formula may be used to get the entropy for a collection of instances D.
For the construction of the trees in all the tree-based classifiers in this work information gain and the Gini index value are used.

K-Nearest Neighbour
k-NN is the first choice for medical data mining. k-NN is a straightforward instancebased classifier [34,35]. A supervised learning model called k-NN compares new data to existing cases to determine how similar they are, then groups the new data with those cases that have the highest similarity. Finding the similarity of the data involves measuring the distance between the new and existing data points. For distance calculation, various methods are available such as Manhattan, Euclidean, Murkowski, etc. Although k-NN is utilized for regression problems it is widely used to solve classification problems. There are multiple parameters in k-NN and these need to be correctly refined for good results.

Logistic Regression
LR is a supervised learning-based machine learning classifier that is statistics-based [36][37][38]. The input characteristics (X: input) can be categorized by LR into a discrete set of target values (Y: output). A logistic function is employed in LR to determine the likelihood of either class 0 or class 1. A logistic function typically has the shape of an "S" as in the equation below.
LR uses the sigmoid function for probability prediction. The following formula can be used to determine the sigmoid function.
where σ(x) shows output as either 0 or 1 and e is the base of the natural log and x represents the input. For linearly separable data LR is the best choice. It works well to deal with binary classification problems.

Support Vector Machine
A common supervised learning technique used for classification and regression issues is SVM [39]. The dataset is divided using SVM by creating decision paths known as hyperplanes. SVM can effectively handle both linear and nonlinear data. Because the hyperplane separates the dataset into two groups, linear SVM handles the separable data. Data points above the hyperplane are classified as class 1, while those below the hyperplane are classified as class 2. There are support vectors as well. The points that are near the hyperplane are known as support vectors. SVM separates the data on the one-vs-all concept which stops when the dataset separates into several classes. Nonseprable data is handled by the nonlinear SVM. In non-linear SVM the actual coordinate space is converted to separable coordinate space x = φ(x).

Gradient Boosting Machine
GBM is utilized for both classification and regression issues [40,41]. The main reason for boosting GBM is to enhance the capacity of the model in such a way as to catch the drawbacks of the model and replace them with a strong learner to find the near-toaccurate or perfect solution. This stage is carried out by GBM by gradually, sequentially, and additively training a large number of models. GBM is very sensitive to noisy data. Due to the boosting technique in GBM, it is less susceptible to overfitting problems.

Extra Tree Classifier
ETC is a tree-based learning model that uses the results of multiple correlated DTs for the final prediction [42]. The training samples are used to generate each DT in the forest that will be utilized for further classification. Numerous uncorrelated DTs are constructed using random samples of features. During this process of constructing a tree, the Gini index is used for every feature, and feature selection is performed for data splitting.

Gaussian Naive Bayes
The GNB method is based on the Bayes theorem and assumes that each feature in the model is independent [43,44]. It is used for object classification using uniformly distributed data. It is also known as the GNB classifier because of these features. It can be calculated using the following formula 2.2.9. Stochastic Gradient Decent SGD integrates many binary classifiers and has undergone extensive testing on a sizable dataset [45,46]. It is easy to develop and comprehend, and its functioning resembles the regression technique quite a bit. SGD hyperparameter settings need to be correct in order to obtain reliable results. The SGD is sensitive to feature scaling.

Convolutional Neural Network for Feature Engineering
In this study, a CNN was used for feature engineering [47,48]. The embedding layer, flatten layer, max-pooling layer, and 1D convolutional layer are the four layers that make up CNN. In this study, an embedding layer with an embedding size of 20,000 was used. This layer utilized the features from the brain tumor dataset. The embedding layer had an output dimension of 300. After this layer 1D convolutional layer was used with a filter size of 5000. ReLU was utilized as an activation function and had a kernel size of 2 × 2. In order to map key features from the output of the 1D convolutional layer, a 2 × 2 max-pooling layer was utilized. The output was flattened at the end, and the ML models were then converted back to 1D arrays. Let ( f s i , tc i ) be a tuple set of brain tumor data set where the target class columns are represented by the tc, the feature set is represented by the f s, and the tuple index is represented by the I. An embedding layer was utilized to get the desired output from the training set. EL = embedding_layer(Vs, Os, I) EOs = EL( f s) where EO s denotes the embedding layer outputs and is fed to the convolutional layer as input, the embedding layers are denoted by EL. Three parameters are available in EL: V s as the size of the vocabulary, I as the length of the input, and Os as the dimension of the output. For brain tumor detection, we set the embedding layer at 20,000. This shows that this layer has the ability to take inputs ranging from 0 to 20,000. The length of the input was set at 13 and the output dimension was set at 300. All the input data in the CNN were processed in the embedding layer which created the output for the models for the next processing. The output dimensions of the embedding layers are where the 1D − Convs represents the output of 1D convolutional layers. For brain tumor detection, we used 500 filters for the CNN i.e., F = 500 and the Kernel size is Ks = 2 × 2. The activation function not only changes the negative values but also helps to keep other values unchanged.
For significant feature mapping the max-pooling layer was utilized in CNN. For brain tumor detection a 2 × 2 pool was used to map the features. Here Fmap represents the features obtained from max-pooling, S-2 shows the stride and Ps = 2 is the pooling window size.
A flattened layer was used to transform the 3D data into 1D. The main reason behind this conversion is that the machine learning models work well on the 1D data. For the training of the ML models, the above-mentioned step was implemented and for the training, we obtained the 25,000 features. The architecture of the used CNN along with the predictive model is shown in Figure 1.

Proposed Voting Classifier
For obtaining better results, several studies preferred ensemble machine learning models. When compared with individual models, the performance of ensemble classifiers is better. Therefore, this study used an ensemble model to detect brain tumors. Figure 2 displays the pipeline flowchart for detecting brain tumors. Two machine learning models, LR and SGD, were combined to create the proposed model. The brain tumor dataset from the Kaggle was used for experiments. The proposed model was used for the brain tumor dataset for two scenarios. Firstly, all 13 features of the brain tumor dataset were used for brain tumor prediction. In the second experiment, the dataset's characteristics were extracted using CNN, and models were trained on them to distinguish between patient groups with and without tumors. The split of the data is 0.7 to 0.3, with 70% of the data utilized for training and 30% for testing. Accuracy, precision, recall, and F1 score were used to evaluate the model. In this work, LR and SGD are combined with soft voting criteria. The architecture of the voting classifier is given in Figure 3. The outcome with high probability is regarded as the final output in soft voting. Mathematically, the soft voting criteria can be represented as

P(1) P(2) P(1)= (P LR + P SGD )/2 P(2)= (P LR + P SGD
where the probability values against the test sample are denoted by The probability values for each instance using LR and SGD are then passed through on the basis of soft voting as shown in Figure 3. Each sample that has passed through the LR and SGD is given a probability score. For example, if the LR model's probability value is 0.4 and 0.7 for two classes, respectively, and the SGD model's probability value is 0.5 and 0.4 for two classes, respectively, and P(x) represents the probability value of x ranging from 0 to 1, the final probability is determined as The final output will be 2 because it has the highest probability. By combining the projected probabilities from both classifiers, VC(LR+SGD) selects the final class based on the maximum average probability for each class. The hyperparameter details of all models used in this research work are listed in Table 1. Table 1. Hyperparameter values of all models used in this research work.

Evaluation Metrics
Accuracy, precision, recall, and F1 score are the performance metrics utilized in this study to assess the machine learning models' effectiveness. These measurements are all dependent on the confusion matrix's values.

Experiment Setup
Several experiments were conducted for the performance analysis, and the performance of the proposed approach was extensively assessed in comparison to the other learning models. All the experiments were performed using a 7th generation Intel Corei7 machine with Windows 10 operating system. Python language was used for the implementation of the proposed approach and the other learning models. Tensor Flow, Sci-kit learn, and Keras libraries were also used. Experiments were carried out in two situations to evaluate the effectiveness of the proposed technique: using original features from the brain tumor dataset and using CNN features.

Performance of Models Using Original Features
The ML models were applied to the actual dataset in the first set of experiments and the results are shown in Table 2. Results show that the SGD and LR achieved the highest accuracy values of 0.881 and 0.869, respectively among all models. RF received a 0.854 accuracy while the LR+SGD ensemble model attained an accuracy score of 0.845. Tree-based model ETC attained an accuracy score of 0.829 while the GNB showed the worst performance with a 0.769 accuracy score. However, the linear models LR, SGD, and their ensemble outperform when using the original feature set. When compared to other linear models, the performance of the ensemble model was noteworthy. Individually, LR and SGD performed well on the original feature set and their combination further improved the results. Although the proposed voting ensemble model performed well, the obtained accuracy fell short of existing works and lacked the desired accuracy for brain tumor classification. More experiments were conducted for this purpose using CNN as a feature engineering technique and an ensemble learning model.

Results Using CNN Feature Engineering
In the second set of experiments, the performance of the proposed ensemble model and other models was assessed using CNN as a feature engineering technique to extract features from the dataset. Table 3 presents the results of the models when CNN features were used for model training. Expanding the feature set was the main goal of employing CNN model features, which was anticipated to increase the learning models' accuracy.
The results show that the proposed voting ensemble model LR+SGD leads the performance of all models applied in this study with an accuracy score of 0.995. The proposed ensemble model performs significantly better improving the accuracy by 0.15 over the original feature set. In the same manner, the results of the individual models have also improved using convoluted features. SGD obtained an accuracy score of 0.987 and the regression-based model LR achieves an accuracy score of 0.989. The tree-based models such as ETC and RF obtain accuracy scores of 0.926 and 0.958, respectively. Probability-based model GNB is again the least performer on the CNN features as well and achieved an accuracy score of 0.866. It is noted that GNB also showed some improvement in results as compared to the original features.

Results of K-Fold Cross-Validation
In order to verify the effectiveness of the proposed model this research work makes use of k-fold cross-validation. Table 4 provides the results of the 10-fold cross-validation. Cross-validation results reveal that the proposed ensemble model provides an average accuracy score of 0.996 while the average scores for precision, recall, and F1 are 0.998, 0.998, and 0.997, respectively.

Performance Comparison with State-of-the-Art Approaches
The results of the proposed model are compared with existing state-of-the-art studies to show the performance comparison in Table 5. For this purpose, several recently published works are selected so as to report the most recent results. Ref. [25] uses the NGBoost model for brain tumor detection and obtains 0.985 accuracy. Similarly, the study [26] utilizes a CNN deep learning model for the same task and reports a 0.950 accuracy score with the same dataset used in this study. An EfficientNet-B0 is employed in [27] for brain tumor detection that obtains a 0.988 accuracy score. The current study took the benefit of CNN features to train a voting classifier for brain tumor detection and obtained better results than existing state-of-the-art approaches with a classification accuracy of 0.999.

Conclusions and Future Work
The goal of this study was to create a framework that can properly distinguish between brain images with and without tumors and minimize the risks associated with this leading cause of mortality. The proposed method focuses on improving accuracy while reducing prediction errors for brain tumor detection. The experimental finding showed that by employing convolutional features, more accurate results were achieved than by using the original features. Furthermore, the ensemble classifier comprising LR and SGD outperformed individual models. Compared with state-of-the-art methods, the proposed method achieved an accuracy score of 0.999, demonstrating its superiority over existing methods and highlighting the effectiveness of the framework. In the future, we intend to employ deep-learning ensemble models to conduct tumor-type classifications with convolutional features. This study used a single dataset obtained from a single source. In the future, we plan to apply the proposed approach to other datasets to demonstrate its generalizability.

Conflicts of Interest:
The authors declare no conflict of interests.