Multi-class Disease Classification in Brain Mris Using a Computer-aided Diagnostic System

Background: An accurate and automatic computer-aided multi-class decision support system to classify the magnetic resonance imaging (MRI) scans of the human brain as normal, Alzheimer, AIDS, cerebral calcinosis, glioma, or metastatic, which helps the radiologists to diagnose the disease in brain MRIs is created. Methods: The performance of the proposed system is validated by using benchmark MRI datasets (OASIS and Harvard) of 310 patients. Master features of the images are extracted using a fast discrete wavelet transform (DWT), then these discriminative features are further analysed by principal component analysis (PCA). Different subset sizes of principal feature vectors are provided to five different decision models. The classification models include the J48 decision tree, k-nearest neighbour (kNN), random forest (RF), and least-squares support vector machine (LS-SVM) with polynomial and radial basis kernels. Results: The RF-based classifier outperformed among all compared decision models and achieved an average accuracy of 96% with 4% standard deviation, and an area under the receiver operating characteristic (ROC) curve of 99%. LS-SVM (RBF) also shows promising results (i.e., 89% accuracy) when the least number of principal features was used. Furthermore, the performance of each classifier on different subset sizes of principal features was (80%–96%) for most performance metrics. Conclusion: The presented medical decision support system demonstrates the potential proof for accurate multi-class classification of brain abnormalities; therefore, it has a potential to use as a diagnostic tool for the medical practitioners.


Introduction
In this modern era, different advanced imaging modalities (e.g., X-rays, computerized tomography (CT) scans, positron emission tomography (PET), single-photon emission computerized tomography (SPECT), and magnetic resonance imaging (MRI)) are used in neurology and basic neuroscience fields.In X-rays and CT scans, the patients are exposed in ionizing radiation waves which may increase the risk of developing cancers, whereas PET and SPECT use radioactive tracers, with minimal exposure to harmful radiation.However, MRI is a non-invasive, dominant, and flexible modality to investigate the pathological conditions of the brain and other body parts.The common practice to identify the brain abnormalities is done by MRI.MRI scans provide high contrast and high spatial resolution images, which enables to differentiate the characteristics of the soft tissues.Magnetic resonance (MR) image texture is used to distinguish between the healthy and diseased anatomy.Brain MR images having any diseased (such as Alzheimer, AIDS dementia, cerebral calcinosis, glioma, or metastatis) are categorized by large cells and high contrast, which can be identified by abrupt changes in the images [1,2].In recent years, machine learning techniques have been widely employed in the medical domain to support decision-making [3][4][5][6][7][8][9][10].Moreover, medical decision support systems are in high demand to automatically detect these abrupt changes properly and classify the brain MRI as normal or any class of disease [3,5].The very large amount of MR imaging data generates complexity to interpret the full pattern of atrophy by the existing visual inspection method.Therefore, it generates the requirement of computer-aided diagnosis (CAD) system to identify the specific condition of the brain MRI and enhance the diagnostic capabilities of the medical personnel.The radiologists can use these automated systems as an instrument for diagnosis, pre-surgical, and post-surgical procedures [4,[11][12][13][14][15][16][17][18][19][20][21][22].
Generally, supervised classification methods are used instead of unsupervised methods, because of better accuracy, for brain MRI classification.There are, commonly, three phases involved in the implementation of a classifier for medical images: (1) features extraction; (2) feature reduction; and (3) training/testing of classification models.In the first step, discriminative features are extracted from brain MR images.Then, in the second step, these features are processed by some feature reduction technique to reduce the dimensionality of the features, such as PCA, linear discriminant analysis (LDA), etc.Finally, the reduced principal features are used to train the classifier model and classify the query images on the bases of these features.Widely-used decision models include naïve Bayes (NB), J48, random forest (RF), k-nearest neighbor (kNN), and support vector machine (SVM).
Recently, various feature selection schemes and machine learning decision models for brain MRI classification have been proposed.In [6,[14][15][16][17][18][19], the authors have used 2D-DWT (two-dimensional discrete wavelet transform) and principal component analysis (PCA) for a features extraction and selection, respectively.Zhang et al. [14][15][16][17] proposed different advanced decision models based solutions with DWT and PCA techniques in their research and achieved promising results with some limitations.They used forward neural networks (FNN) with scaled chaotic artificial bee colony (SCABC) [14], back propagation with conjugate gradient method [15], kernel support vector machines (KSVM) with Gaussian radial basis function (GRB) [16] and particle swarm optimization (PSO) [17] as decision models to improve the efficiency of the brain MRI classifier.The schemes proposed in [18,19], have used feed-forward back propagation artificial neural networks (FP-ANN), kNN, feedback pulse-coupled neural network (FBPNN) and achieved an average accuracy of 99% for binary classification of brain MR images.Recently, [20][21][22] have proposed numerous complex feature engineering techniques with SVM-and NB-based decision models to enhance the classifier performance.In [20,21], the authors proposed Ripplet transform and discrete wavelet packet transform, respectively, instead of DWT for feature extraction.Whereas, the authors proposed wavelet entropy method in [22] for feature reduction.Wang et al. [23] proposed a different classification scheme by using dual-tree complex wavelet transform and twin support vector machine for the classification of pathological brain detection.In [23], the authors have achieved average accuracy of 99.57%.However, usage of a small number of cases and limited number of disease categories in their datasets are the main limitation of these works and also their performance is significantly reduced when large datasets are used.Regardless, the advanced complex methods are used in [20][21][22][23], which increases the complexity of the classifier and consumed relatively higher computational time, are not able to perform well on large datasets.Conversely, the technique proposed in [6] achieve accurate results for larger datasets; when using DWT, PCA, and LS-SVM (RBF) for feature extraction, feature reduction and classification, respectively.In addition, all these schemes [6,[14][15][16][17][18][19][20][21][22][23] have proposed for binary classification and only capable to predict normal and abnormal anatomy of a brain MRI.Furthermore, a multi-class brain MRI classifier has proposed by Zacharaki et al. [24].The authors extended the scope of the features and included age, tumour shape, and ROI (region of interest) as a part of feature sets.This technique is semiautomatic because ROIs needs to trace manually.In [24], the authors have tested three different disease classes (i.e., meningioma, glioma, and metastasis) and achieved a mean accuracy about 90%.The comparison between linear discriminant analysis (LDA), kNN, and non-linear SVM-based decision models were shown.The main limitation of this technique is that it needs human intervention for classification.Therefore, the research gap in the development of fully automatic multi-class classifier with significant accuracy is generated.
The main motivation behind this study is to develop an accurate multi-class brain MRI classifier, which is capable to diagnose the diseases class in brain MRIs.The proposed multi-class brain MRI classifier has a potential to classify five different brain diseases.These brain diseases include Alzheimer, AIDS dementia, cerebral calcinosis, glioma, and metastatsis.The proposed system technique is composed of three sub-models; master feature extraction, principal feature analysis and decision models.Fast discrete wavelet transform (DWT) used for extracting the master features from the brain MR images.The principal feature analysis was done by PCA and different subsets were used to calculate the efficiency of the multi-class classifier system.In addition, PCA analysis reduced the dimension of the master features, which also decreases the classification time and complexity of the classifier.For a comprehensive comparison of decision models' performance on the multi-classification of brain MRIs, the proposed research compared five different decision models (J48, kNN, RF, and LS-SVM with polynomial (Poly) and radial basis functions (RBF)).For comparative analysis with the proposed system, some of the other published methods from recent literature [6,18,20,21] were also tested using the same large datasets.

Materials and Methods
The proposed classifier for multi-classification is composed of master feature extraction, principal feature analysis, and classification model blocks, as shown in Figure 1, which illustrates the methodology of the proposed system.The classifier is constructed and evaluated using two phases: (1) a training phase, and (2) a testing phase.In the training phase, the classifier is trained by randomly selected images from the datasets.Once the classifier is trained, then it is capable to classify the query images.In the testing phase, the query image(s) is/are fed to the trained classifier to classify the image(s) as normal, Alzheimer, AIDS, cerebral calcinosis, metastatic, or glioma.Furthermore, a five-fold cross-validation is used in this work to minimize the generalization error.
tumour shape, and ROI (region of interest) as a part of feature sets.This technique is semiautomatic because ROIs needs to trace manually.In [24], the authors have tested three different disease classes (i.e., meningioma, glioma, and metastasis) and achieved a mean accuracy about 90%.The comparison between linear discriminant analysis (LDA), kNN, and non-linear SVM-based decision models were shown.The main limitation of this technique is that it needs human intervention for classification.Therefore, the research gap in the development of fully automatic multi-class classifier with significant accuracy is generated.
The main motivation behind this study is to develop an accurate multi-class brain MRI classifier, which is capable to diagnose the diseases class in brain MRIs.The proposed multi-class brain MRI classifier has a potential to classify five different brain diseases.These brain diseases include Alzheimer, AIDS dementia, cerebral calcinosis, glioma, and metastatsis.The proposed system technique is composed of three sub-models; master feature extraction, principal feature analysis and decision models.Fast discrete wavelet transform (DWT) used for extracting the master features from the brain MR images.The principal feature analysis was done by PCA and different subsets were used to calculate the efficiency of the multi-class classifier system.In addition, PCA analysis reduced the dimension of the master features, which also decreases the classification time and complexity of the classifier.For a comprehensive comparison of decision models' performance on the multi-classification of brain MRIs, the proposed research compared five different decision models (J48, kNN, RF, and LS-SVM with polynomial (Poly) and radial basis functions (RBF)).For comparative analysis with the proposed system, some of the other published methods from recent literature [6,18,20,21] were also tested using the same large datasets.

Materials and Methods
The proposed classifier for multi-classification is composed of master feature extraction, principal feature analysis, and classification model blocks, as shown in Figure 1, which illustrates the methodology of the proposed system.The classifier is constructed and evaluated using two phases: (1) a training phase, and (2) a testing phase.In the training phase, the classifier is trained by randomly selected images from the datasets.Once the classifier is trained, then it is capable to classify the query images.In the testing phase, the query image(s) is/are fed to the trained classifier to classify the image(s) as normal, Alzheimer, AIDS, cerebral calcinosis, metastatic, or glioma.Furthermore, a five-fold cross-validation is used in this work to minimize the generalization error.

Dataset Collection
The benchmark MRI dataset used in this research was collected from 'Open Access Series of Imaging Studies (OASIS)' and 'Harvard Medical School' MRI databases to validate the proposed classification system.This database consists of human brain MRI images in the axial plane.These datasets were acquired using the following scan parameters: Voxel res: 1.0 × 1.0 × 1.25 (mm 3 ), Rect.FOV: 256/256, TR: 9.7 (ms), TE: 4.0 (ms), TI: 20.0 (ms), and flip angle: 10 • .The dimensions of the image are 256 × 256 in a plane-resolution.Three hundred and ten patients' (men and women) brain MRI scans were involved to formulate this database.
The brain MR image dataset is composed of healthy and abnormal images.The abnormal image database has five types of different brain diseases.The abnormal MRI scan images having the following diseases: Alzheimer's disease, AIDS dementia, cerebral calcinosis, glioma and metastatic dementia.A sample image of each class of the images included in the benchmark dataset is shown in Figure 2.

Dataset Collection
The benchmark MRI dataset used in this research was collected from 'Open Access Series of Imaging Studies (OASIS)' and 'Harvard Medical School' MRI databases to validate the proposed classification system.This database consists of human brain MRI images in the axial plane.These datasets were acquired using the following scan parameters: Voxel res:   1.The ratio of training and testing images, i.e., 70% of the dataset is used for training and the remaining 30% of the dataset is used for testing purposes.Training images were used to construct the classifier, whereas testing images were used to evaluate the performance of the multi-class classifier.In addition, the testing images were unknown to the constructed classifier for the sake of unbiased evaluation.  1.The ratio of training and testing images, i.e., 70% of the dataset is used for training and the remaining 30% of the dataset is used for testing purposes.Training images were used to construct the classifier, whereas testing images were used to evaluate the performance of the multi-class classifier.In addition, the testing images were unknown to the constructed classifier for the sake of unbiased evaluation.

Master Feature Extraction
A MATLAB (R2013a, The Mathworks, Inc. Natick, MA, United States) script was written, using discrete wavelet transform, to extract the main features of the brain MR images.To improve the efficiency of a classifier, the master features in the MRI image is needed to be identified properly.In recent literature [6,[14][15][16][17][18][19][20][21][22][24][25][26][27], there are many different algorithms (such as DWT and Ripplet transform) used to extract the main features of the images.DWT has some advantages over RT, being less computationally complex and also due to the characteristics of brain MRIs.The sparse nature of MRIs provides an opportunity to identify the major contributed features of the MR image by representing it in some sophisticated domains (such as wavelet domains) [28].Thus, DWT provides master features, having rich knowledge of the input MR image pattern with less complex implementation.The master features extracted from the MRI database using DWT has a potential to increase the capability of the decision making power and complexity of the classifier.A three-level "Haar" DWT was used to extract the master features of the images in this paper.

Preparation of the Principal Feature Vector
The main characteristics of any robust and accurate classifier are a selection of the discriminative features from the dataset and reduce the dimensions of the dataset.Large databases increase the feature dimensions, which eventually increase the complexity of the classifier and demands excessive time to classify.Therefore, different feature reduction schemes are used by the researchers to remove the curse of dimensionality problems in the classifier system [19,[29][30][31].
In this article, the PCA technique was applied on the discriminative features of MR image to further reduce the dimension of the master features extracted by DWT.PCA preserved the variance by extracting the linear lower-dimensional representation of the MR image features [32,33].Therefore, it extracts the major components of the image (data) and forms the principal feature vector.This leads to an increase the efficiency of the classifier system.

Feature Subset Sizes
Principal feature vectors are used for decision modelling.However, subsets of the principal feature vectors were introduced to check the performance trend of the classifier.Subset sizes of 5, 10, 15, and 20 principal components were used to compute the results of the proposed multi-class classifier.

Classifier Models
Weka toolkit (Version 3.8, University of Waikato, Hamilton, New Zealand) was used to develop decision models.Principal feature vectors were exported from MATLAB in comma separated values (CSV) format.Afterwards the CSV data were loaded into Weka for further analysis.Five different decision models were constructed for performance measures.These five classifiers were J48, k-nearest neighbour (kNN), random forest (RF), least-square support vector machine with polynomial kernel (LS-SVM (Poly)), and least-squares support vector machine with radial basis function kernel (LS-SVM (RBF)).

J48 Classifier (J48)
J48 is a kind of decision tree algorithm [34].J48 utilizes the entropy to compute the homogeneity of a sample.If entropy is zero, then it means that the sample is completely homogeneous and if the sample is unequally divided, then it has entropy of one.The relative entropy of a given dataset X having positive and negative class instances is mathematically defined as: where p (P) and p (N) are related probabilities of positive and negative class, respectively.

K-Nearest Neighbor (kNN)
kNN is also known as a lazy learning non-parametric algorithm.kNN is the simplest classification algorithm that stores all training instances and uses a Euclidean distance function to classify new instances (shown in Equation ( 2)) [35,36]: where x and y are two vectors (trained instance vector and a query vector for classification), and k represents the number of attributes.

Random Forest (RF)
RF was proposed by UC Berkeley visionary Leo Breiman in 1999 [37].This algorithm works as a large collection of decorrelated decision trees using a bagging technique.The RF creates various sub-training sets from a super training set.A decision tree classifier is constructed from each sub-training set.At the time of testing, each input vector of the test set is classified by all of the decision trees in a forest and, finally, the forest is responsible for choosing the classification results; using either majority votes or averaging the predictions using the equation given below [38]: where B represents the samples/trees, f b is a predictor, and x corresponds to the test point.

Least Squares-Support Vector Machine (LS-SVM)
The SVM classifier is highly influenced by advances in statistical learning theory [39][40][41].SVM plays a vital role in the application of object detection [42], face detection [43], handwriting recognition [44], medical imaging classification [6], and bioinformatics [45].SVM learns from training examples.An improved version of SVM, i.e., LS-SVM, was used in this article because of its robustness and efficiency.Each training instance consists of n number of attributes (x 1 , x 2 , • • • , x n ) with a corresponding class label.The nonlinear function estimation can be mathematically presented as: where the high dimensional feature space is represented by ϕ(x), the weight vector is defined by W, and the bias term is denoted by b.Then, the LS-SVM solution of such an optimization problem can be obtained as follows (for a deeper introduction of this method, readers can refer to [46][47][48]): Table 2 provides some of the choices of kernel functions K(x k , x l ).

Performance Measures
Recall (sensitivity), precision, F-measure, accuracy, and area under the receiver operating characteristic (ROC) curve are widely used metrics to determine the performance of the classifiers [49].The possible outcomes of the proposed classifier can be described as: For multi-class classification, macro-averaged recall, macro-averaged precision and macro-averaged F-measure are used to validate the performance of the classifier [49].
Recall M is the average of the each class recall (i.e., the probability of the test finding the positive cases among all the positive cases of the respective class): Precision M is the average of the each class precision (i.e., the probability of the test correctly diagnosed as positive cases given that the number of cases labelled by the system as positive): F-Measure M (macro-averaged F-measure) is a weighted combination of the Recall M and Precision M .Mathematically, it is defined as: Average Accuracy is the fraction of test results predicted as correct among all the classes: Area under the ROC curve (AUC) is the area occupied by the receiver operating characteristic curve of each class.It is used to analyse how good any classification model predicts the specific class versus all other classes: where C represents the total number of classes.i.e., C = 6.M index represents to macro-averaging.β = 1 was used in this research.

Experimental Setup
Separate experiments were conducted on training and testing datasets.PCA with DWT was applied to extract the discriminative principal feature vectors with four different subset sizes (5, 10, 15, and 20).Four plus four (total of eight) principal feature vector sets were extracted from training datasets and testing datasets, respectively.To evaluate the performance of the proposed multi-class classifier using performance metrics; a total of 40 (8 × 5) analyses were performed by applying each of these 8 feature sets to five different classifier models (J48, kNN, RF, LS-SVM (Poly), and LS-SVM (RBF)).

Feature Reduction
In order to extract discriminative and reduced features, fast DWT with PCA was used in this research.The fast DWT only computes the approximation component of the wavelet features, which includes the major information of the MR image pattern.By only computing the approximation component of DWT decomposition, it decreases the size of the MRI images, which eventually reduces the computation time and complexity of the classifier.Initially, MRI images were 256 × 256 in size.After applying DWT with a three-level Haar wavelet decomposition (approximation component only) it changes to 32 × 32.Then PCA was applied on these reduced master feature sets, which allows a further decrease in the size of the feature sets by extracting the high variance components.In this article, four different feature subset sizes (5, 10, 15, and 20) were used.For classification purposes, the classifier used only 0.076%, 0.015%, 0.023%, and 0.031% of the original MR image in preparation of principal feature of size 5, 10, 15, and 20, respectively.Therefore, the proposed classification system achieved approximately 99.969% feature reduction while retains the accuracy of the classifier.

Performance Evaluation
The performance of the proposed multi-class classifier was evaluated in terms of macro-averaged recall, macro-averaged precision, macro-averaged F-measure, overall accuracy, and AUC of each class.Figure 3 shows the comparison of different decision models' performance against the number of principal features used.
Figure 3a illustrates the macro-averaged recall for each classifier model with respect to the feature subset sizes.A majority of the classifier models achieved recall M values greater than 81% for any number of principal components were used.To observe the effect of the feature subset sizes, the results indicate that the LS-SVM (RBF) classifier model produced fixed 86% recall M without dependence on feature subset sizes.However, RF and J48 models increase recall M values as the number of principal feature subset sizes increase and the attained macro-averaged recall values increase to 96% and 87%, respectively, whereas the remaining classifier models (kNN and LS-SVM (Poly)) were not able to increase their performance in terms of recall M .Figure 3b shows the performance of each classifier model with respect to the number of principal feature components in terms of macro-averaged precision.From the results, it is observed that a feature subset size of 10 or more produced precisionM greater than 90% for all five classifier models.However, the RF model outperformed and achieved precisionM values up to 96% using a feature subset size of 20.The lowest precisionM was observed in LS-SVM (Poly) for any number of given principal features.
The performance evaluation in terms of macro-averaged F-measure, with respect to the number of principal components used by the classifiers, is shown in Figure 3c.The results revealed that the F-measureM generally exceeded 90% for RF and J48 when feature subset size used 10 or more.However, LS-SVM (RBF) achieved 90% F-measureM values for any combination of feature subsets.Furthermore, kNN and LS-SVM (Poly) could not able to improve the efficiency significantly in terms of F-measureM even the number of features was increased.
The overall accuracy of each classifier model was compared (Figure 3d).The average accuracy of each classifier model exceeded 84% for maximum number of principal components was used.However, RF improved the average accuracy with increasing the number of features and achieved the highest accuracy rate (i.e., 96%, standard deviation = 4%) when a plateau of 20 features was reached.The results again show that LS-SVM (RBF) overall average accuracy was stable and not associated with increased feature subset sizes.LS-SVM (RBF) provided the best results when the least number of principal features was used for classification.Furthermore, we found that no significant accuracy improvement was achieved, for any feature subset sizes, in the case of kNN and LS-SVM (Poly) decision models.
In Figure 4, the area under the ROC curve for each class was estimated.The results reveal that all five classifier models achieved AUC 100% with 0% standard deviation for the "Normal" class as shown in Figure 4a.RF, LS-SVM (RBF), and kNN produced significant results (i.e., 100% AUC with 0% standard deviation) for the "Alzheimer" class, as depicted in Figure 4b.However, J48 and LS-SVM (Poly) has a fluctuating trend of AUC for the "Alzheimer" class.The comparison of AUC for the "AIDS" class is shown in Figure 4c.Only the RF decision model achieved AUC 100% for the "AIDS" class when using equal to, and more than, a 15 feature subset.Moreover, the remaining four classifier models exceeded AUC 78% for different sizes of principal feature sets.Figure 3b shows the performance of each classifier model with respect to the number of principal feature components in terms of macro-averaged precision.From the results, it is observed that a feature subset size of 10 or more produced precision M greater than 90% for all five classifier models.However, the RF model outperformed and achieved precision M values up to 96% using a feature subset size of 20.The lowest precision M was observed in LS-SVM (Poly) for any number of given principal features.
The performance evaluation in terms of macro-averaged F-measure, with respect to the number of principal components used by the classifiers, is shown in Figure 3c.The results revealed that the F-measure M generally exceeded 90% for RF and J48 when feature subset size used 10 or more.However, LS-SVM (RBF) achieved 90% F-measure M values for any combination of feature subsets.Furthermore, kNN and LS-SVM (Poly) could not able to improve the efficiency significantly in terms of F-measure M even the number of features was increased.
The overall accuracy of each classifier model was compared (Figure 3d).The average accuracy of each classifier model exceeded 84% for maximum number of principal components was used.However, RF improved the average accuracy with increasing the number of features and achieved the highest accuracy rate (i.e., 96%, standard deviation = 4%) when a plateau of 20 features was reached.The results again show that LS-SVM (RBF) overall average accuracy was stable and not associated with increased feature subset sizes.LS-SVM (RBF) provided the best results when the least number of principal features was used for classification.Furthermore, we found that no significant accuracy improvement was achieved, for any feature subset sizes, in the case of kNN and LS-SVM (Poly) decision models.
In Figure 4, the area under the ROC curve for each class was estimated.The results reveal that all five classifier models achieved AUC 100% with 0% standard deviation for the "Normal" class as shown in Figure 4a.RF, LS-SVM (RBF), and kNN produced significant results (i.e., 100% AUC with 0% standard deviation) for the "Alzheimer" class, as depicted in Figure 4b.However, J48 and LS-SVM (Poly) has a fluctuating trend of AUC for the "Alzheimer" class.The comparison of AUC for the "AIDS" class is shown in Figure 4c.Only the RF decision model achieved AUC 100% for the "AIDS" class when using equal to, and more than, a 15 feature subset.Moreover, the remaining four classifier models exceeded AUC 78% for different sizes of principal feature sets.In Figure 4d,e, a majority of decision models attained AUC > 96% with < 4% standard deviation for classes "cerebral calcinosis" and "metastasis", respectively.However, RF and LS-SVM (RBF) showed AUC 100% with 0% standard deviation for any feature subset order.Figure 4c shows the AUC performance of the "glioma" class for all analyses.With the exception of RF and kNN models, all AUC measures were below 85% for any number of principal features and decision models.However, the AUC values measured for the RF and kNN models were greater than 95% when keeping the feature subset size more than equal to 15.The maximum AUC value achieved for "glioma" class by the RF model was 99% with 1% standard deviation.
The overall comparison results of decision models with different number of principal features subset showed that RF performed significantly better than other four decision models (J48, kNN, LS-SVM (Poly), and LS-SVM (RBF)).However, RF has not performed well when the lesser number of features is used, but its performance gradually increases as the size of the feature sets increased.It is also notable that increasing the number of principal features may not always be worthy because the complexity of the machine learning classifier may be increased by using larger feature subset sizes.Therefore, from the results it is observed that RF required at least 15 features to achieve better performance in terms of accuracy, precisionM, recallM, F-measureM, and AUC.On the other hand, the overall comparison also reveals that LS-SVM (RBF) achieved significant performance regardless of the feature set size.LS-SVM (RBF) achieved a constant performance trend for any number of principal In Figure 4d,e, a majority of decision models attained AUC > 96% with < 4% standard deviation for classes "cerebral calcinosis" and "metastasis", respectively.However, RF and LS-SVM (RBF) showed AUC 100% with 0% standard deviation for any feature subset order.Figure 4c shows the AUC performance of the "glioma" class for all analyses.With the exception of RF and kNN models, all AUC measures were below 85% for any number of principal features and decision models.However, the AUC values measured for the RF and kNN models were greater than 95% when keeping the feature subset size more than equal to 15.The maximum AUC value achieved for "glioma" class by the RF model was 99% with 1% standard deviation.
The overall comparison results of decision models with different number of principal features subset showed that RF performed significantly better than other four decision models (J48, kNN, LS-SVM (Poly), and LS-SVM (RBF)).However, RF has not performed well when the lesser number of features is used, but its performance gradually increases as the size of the feature sets increased.It is also notable that increasing the number of principal features may not always be worthy because the complexity of the machine learning classifier may be increased by using larger feature subset sizes.Therefore, from the results it is observed that RF required at least 15 features to achieve better performance in terms of accuracy, precision M , recall M , F-measure M , and AUC.On the other hand, the overall comparison also reveals that LS-SVM (RBF) achieved significant performance regardless of the feature set size.LS-SVM (RBF) achieved a constant performance trend for any number of principal features used, which leads to a decrease in the computational time and complexity of the multi-class classifier.

Comparison with Existing State-of-the-Art Classification Schemes
Different classification schemes were compared to evaluate the proposed multi-class classifier performance, which were examined for the same MRI dataset and on the same platform for multi-class brain MRI classification.Initially, these methods [6,18,20,21] were proposed for binary classification of brain MR images.The average accuracy results (when using 20 principal features) were gathered for each scheme, as presented in Table 3.The results reveal that the highest accuracy (i.e., 95.7%) for multi-class brain MRI classification is achieved by the proposed scheme with RF as a decision model.LS-SVM (RBF) classification scheme also provided promising results with 89.25% accuracy rate.The scheme proposed by Das et al. [20] for binary classification attained an average accuracy of 86.02% when applied to multi-class brain MRI classification, regardless of the complex algorithm used (i.e., Ripplet transform) for feature extraction.Furthermore, the complex method used in [21] is not able to increase the accuracy rate up to 90% or more and managed to achieve a correctness rate of 88.92%.The average accuracy of the kNN-based scheme (i.e., 83.87%) is the worst performance among all of the compared state-of-the-art schemes examined for multi-class brain MRI classification.In addition, the method proposed in [24] includes age, tumour shape and ROI as feature sets for multi-class classification of brain MRI diseases.This scheme needs to trace ROIs manually, which makes this scheme semiautomatic.However, our proposed multi-class classifier is automatic and has no need of human intervention for decision-making purposes.In [24], the authors have achieved mean accuracy as 81.1%, 89.8%, and 91.2% for LDA, kNN, and non-linear SVM based decision models, respectively.Regardless of human intervention involved in feature extraction scheme, which is an additional cost, the average accuracy claimed in [24] is almost 4% less than our proposed work.
It is observed from the results and comparisons that the proposed classifier performance is quite remarkable as compared to the existing state-of-the-art techniques.Furthermore, a comprehensive study of different decision models' performances on MRI brain multi-class classification is shown.The comparison of decision models suggests that RF, LS-SVM (RBF), and J48 are more accurate than kNN and LS-SVM (POLY).RF achieved the highest accuracy rate when 20 features were used.Moreover, LS-SVM (RBF) maintained its performance for any number of features, which leads to being more advantageous when the least number of principal features are available.The feature engineering scheme used in this study proves that it reduced the number of discriminating features, which eventually reduces the classifier complexity and enhances its accuracy.Furthermore, the proposed classifier has the potential to classify various disease classes accurately using brain MRIs.The limitation of this study is that the experiments only involved brain MR images.However, the proposed approach has the potential to produce accurate results for different body parts' MR images as well.

Figure 1 .
Figure 1.Methodology of the proposed classifier.Figure 1. Methodology of the proposed classifier.

Figure 1 .
Figure 1.Methodology of the proposed classifier.Figure 1. Methodology of the proposed classifier.
/256, TR: 9.7 (ms), TE: 4.0 (ms), TI: 20.0 (ms), and flip angle:  10 .The dimensions of the image are 256 256  in a plane-resolution.Three hundred and ten patients' (men and women) brain MRI scans were involved to formulate this database.The brain MR image dataset is composed of healthy and abnormal images.The abnormal image database has five types of different brain diseases.The abnormal MRI scan images having the following diseases: Alzheimer's disease, AIDS dementia, cerebral calcinosis, glioma and metastatic dementia.A sample image of each class of the images included in the benchmark dataset is shown in Figure 2.
TP (True Positive): Number of images correctly diagnosed under any specific class; TN (True Negative): Number of images correctly rejected by the classifier; FP (False Positive): Number of images incorrectly identified by the classifier; FN (False Negative): Number of images incorrectly discarded by the classifier.

Table 1 .
The distribution of training and testing images.

Table 3 .
Performance comparison with different classification schemes.