Brain Magnetic Resonance Imaging Classification Using Deep Learning Architectures with Gender and Age

Usage of effective classification techniques on Magnetic Resonance Imaging (MRI) helps in the proper diagnosis of brain tumors. Previous studies have focused on the classification of normal (nontumorous) or abnormal (tumorous) brain MRIs using methods such as Support Vector Machine (SVM) and AlexNet. In this paper, deep learning architectures are used to classify brain MRI images into normal or abnormal. Gender and age are added as higher attributes for more accurate and meaningful classification. A deep learning Convolutional Neural Network (CNN)-based technique and a Deep Neural Network (DNN) are also proposed for effective classification. Other deep learning architectures such as LeNet, AlexNet, ResNet, and traditional approaches such as SVM are also implemented to analyze and compare the results. Age and gender biases are found to be more useful and play a key role in classification, and they can be considered essential factors in brain tumor analysis. It is also worth noting that, in most circumstances, the proposed technique outperforms both existing SVM and AlexNet. The overall accuracy obtained is 88% (LeNet Inspired Model) and 80% (CNN-DNN) compared to SVM (82%) and AlexNet (64%), with best accuracy of 100%, 92%, 92%, and 81%, respectively.


Introduction
The brain is the most complex organ present in the human body. It carries out different functions and controls the activities of other systems of the body. Additionally, the brain is comprised of complex structures including the cerebellum, cerebrum, and brain stem, which constitute the central nervous system [1,2]. The histology of the brain consists of brain cells and tissues. Brain cells are divided into neurons and neuroglia, and brain tissues into gray matter and white matter [2,3]. When cells of the brain grow abnormally and are not regulated correctly, it may result in a brain tumor. It is found that all variants of tumors are not cancerous. Fundamentally, cancer is a term used for malignant tumors, not benign tumors. Although benign tumors are less harmful than malignant tumors, the former still presents various problems in the brain [4]. There are many tests and medical imaging techniques that can be carried out for proper treatment. Some of the medical imaging techniques are Computed Tomography (CT), Magnetic Resonance Imaging (MRI), X-ray, etc. [5], but the standard way of evaluating a tumor is by using MRI due to its capability of achieving detailed images of the brain. A variety of brain conditions can be detected using MRI, including tumors, cysts, and other structural abnormalities. It can detect gray matter, white matter, and any damage or shunt present in the brain. Cerebrospinal fluids and the surrounding of tumors can be assessed by an MRI scan, which has a higher sensitivity for detecting the presence of a tumor. Detection of tumors at an early stage is essential, as it can be risky in many cases and can cause death in unfortunate circumstances. Therefore, prediction of the tumor using automated tools can be a great help in tumor identification and be the safest mode.
Detection of tumors can be accomplished by means of meticulous manual human analysis of MRI images one by one (slice by slice). This specific task needs to be performed for accurate identification of the region and the type of tumor. Additionally, tumors in the brain may affect certain other organs in a system (metastasis), which can be even more harmful. Detection of such tumors at an early stage is essential in selecting treatments in an efficient and effective decision-making capability on the part of the practitioner. Thus, proper analysis of brain MRI images is required to obtain valuable information which may be helpful in the early detection and diagnosis of diseases. In addition, early detection of tumors can lead to better diagnosis; to achieve this, the use of automated tools is the most reliable and aspiring contribution in medical science. Automated techniques have evolved in past decades in image processing, where traditional methods were used to solve such issues. This continues to shift towards more advanced techniques such as machine learning and eventually to deep learning, and other proposed methodologies [6].
Keeping the necessity of manual examination, this paper includes state-of-the-art automated approaches to classify MRI images as normal (nontumorous) or abnormal (tumorous). For this purpose, a proposed deep learning based CNN methodology was used and compared with the existing techniques due to their superior performance in Computer Vision. We also divided the brain MRI images into different genders, male and female, and different age groups for classification into normal or abnormal. We incorporated age and gender as attributes for the first time, in contrast to earlier classification methodologies. This is crucial in determining similarities and differences of the brain concerning shape and size for different age groups and genders. This is in order to find out whether age and gender can be the factors in achieving a better result in classification; by finding similar patterns between images of the same category. A flowchart depicting the usages of age and gender bias (depending on data availability) is shown in Figure 1, where the data are taken and preprocessed using filtering and cropping. Based on available data obtained, the images are divided into seven categories based on different age groups and gender. These are then classified using proposed CNN models where output can be normal or abnormal. The following categories of brain MRI images were considered: (i) Males between the ages of 20 and 70, (ii) Females between the ages of 50 and 70, (iii) Females between the ages of 20 and 70, (iv) Males between the ages of 10 and 80, (v) Females between the ages of 10 and 80, (vi) Males and Females between the ages of 20 and 70, and (vii) Males and Females between the ages of 10 and 80. This is then applied to various approaches for classification as normal or abnormal.

Motivation
Previous research has focused on brain diagnosis as classified as either normal or abnormal. In earlier attempts, SVM has been utilized and achieved effective results in classification into normal or abnormal. Despite this, no higher attributes were used in its implementation. Though the accuracy of the existing approach is satisfactory with 99.9% accuracy, it may not be suitable for accurate prediction/classification of tumors, as human brain structure varies based on age and gender [7,8]. The information obtained using higher attributes is a reliable way to treat any kind of deformity. Such delicacy must be handled precisely for the proper diagnosis of diseases. Therefore, usage of higher attributes such as age, gender, etc., is much needed for accurate prediction, which leads to an appropriate diagnosis. In this paper, age and gender are taken as attributes for predicting the presence of tumors in the hope of obtaining an accurate result using CNN-based methodologies. In order to keep the network computationally cheaper, a deeper CNN is not used here, and higher depth may lead to poor generalization. In contrast to previous spatial exploitationbased CNNs such as AlexNet or VGGNets, a LeNet inspired model was chosen for its simplicity and use of a lower filter (3 × 3). This is more suited than other Nets due to less training time and is more computationally inexpensive. Figure 1. An overall flowchart, depicting proposed classification approach by using age and gender as attributes.

Our Contributions
The main contributions of the paper are as follows: 1.
Figshare [9], Brainweb [10], and Radiopaedia [11] datasets are readily available online and can be used to classify brain MRI as normal or abnormal. We have taken all these datasets to create a heterogeneous combination of data that address the heterogeneity issue. A dataset from the same source is used for the majority of studies in brainrelated diagnosis. This form of heterogeneity has never been explored before, but it could be the beginning of correctly distinguishing images from different sources.

2.
Using higher attributes is always more informative with a higher expectancy of reliable and efficient results. Here, work based on age and gender is considered as an initiative to determine whether these can be helpful in further automated diagnosis. It is inspired by the paper given in [12,13]. In addition to employing various data to classify patients as normal or abnormal, Radiopaedia datasets are used to classify patients by age and gender.

3.
To categorize normal (absence of tumor) and abnormal (presence of tumor) images, two proposed CNN-based methodologies are applied. One is a model that is inspired from LeNet and the other is a Deep Neural Network based method. These proposed models are fast and more superficial compared to other comparable deep learning methods.

4.
Two alternative deep learning-based classifiers, LeNet and ResNet, are incorporated in addition to the proposed methodology for classification. During their reign, these two models were used for classification and had a significant impact. They are utilized because they are not as deep as VGG19, MobileNet, Inception, and other state-of-theart deep learning approaches, which are not ideal for our data as they are not massive and could lead to erroneous results and computational expense. To classify normal and abnormal images, the results are compared with Support Vector Machine and AlexNet, which were previously used to classify normal and abnormal images.

5.
Compared to traditional SVM (82% using age and gender attributes and 77% using heterogenous data without any attributes), the parameters used in this paper are higher with better results and accuracy (88% using age and gender attributes and 80% using heterogenous data). While comparing to AlexNet, the depth and number of convolutions are lesser in the proposed method, making it simpler with more efficient computation time. AlexNet obtained an accuracy of 64% using age and gender attributes and 65% using heterogenous data without any attributes. 6.
In this paper, data are not equally distributed for each group using age and gender. Data are unbalanced data, and cross-validation is used to solve this issue. This work is not clinically proven or tested, but it is performed to check the capability of a few deep-learning methodologies, mainly spatial CNN. This model might not work or perform well under different clinical settings, as data are obtained from online sources.

Organization of the Paper
This paper uses deep learning-based approaches to classify MRI images as normal or abnormal in a hope to see if using higher attributes can be beneficial. Section 2 includes works related to brain tumor classification and findings based on the anatomy of the brain of different individuals. Section 3 explains the types of methodologies used as well as the proposed method. Section 4 shows the result and findings, and in Section 5, the conclusion of the paper is given.

Related Works
Several existing works classify brain images into normal (tumorous) and abnormal (nontumorous). One such method can be seen in Rajesh et al. [14], where classification was implemented using Feed Forward Neural Network, consisting of three layers with 50 nodes in the hidden layer and one output node. Taie et al. [15] also performed the classification using Support Vector Machine (SVM), and comparative analysis can be seen in [16,17]. In another paper, Al-Baderneh et al. [18], also discussed the classification of brain MRI using Artificial Neural Network and K-Nearest Neighbor (KNN) with texture features, using 181 images of the abnormal brains and 94 images of normal brains. Other methodology includes Self Organizing Maps (SOM) which is discussed in [17,19]. Implementation of feedforward backpropagation for classification into normal or abnormal MRI images can be found in [20]. These methods are all supervised (classes are known), where features are needed to be extracted before classification. All of the above mentioned use traditional approaches with very few data with the efficient result but are not very informative and do not include age and gender bias.
Along with these methods, other state-of-the-art techniques using deep learning-based methodologies are evolving. Many of these works are not used to classify normal or abnormal but were included as the work was performed on brain imaging on different types of classification. In a paper by Pereira et al. [21] glioma detection was achieved using CNN. Kamnitsas et al. [22] used a deep learning method for the classification of ischemic stroke. In [23], a proposed method called Adaptive Network-based Fuzzy Inference System (ANFIS) for classification into five types of tumors was investigated. Another work focused on the classification and segmentation of tumors using pre-trained AlexNet, where features were extracted using the Gray-Level Co-Occurrence Matrix (GLCM) [24]. Other works include classification into different types of tumors using CNN [25][26][27][28][29], SVM [30], Graph cut [31], Recurrent Neural Network (RNN) [32,33], AlexNet transfer learning network of CNN [34], Deep Neural Network (DNN) [35][36][37], VGG-16, Inception V3 and ResNet50 [38], SVM and KNN [39], and CNN ensemble method [40].
In addition, other works include the MICCAI BRATS challenge; the most recent can be found in [41]. A comparative analysis of brain tumors can be seen in [42]. When it comes to differences in the human brain, an article by Brown [12] published studies on the human brain and differences in the structure of the brain and its morphology for individuals of the same age. Based on this, a model was developed using Pediatric Imaging, Neurocognition, and Genetics (PING) data to predict ages between 3 to 20 years old. It can also be seen that every individual brain measurement varies, even on a single brain at any specific time. This finding inspired us to investigate the brain structure further using an automated technique for identifying tumors according to gender and age. In the next section, we will discuss the different existing methods used for the classification of MRI into normal and abnormal.

A Brief Description on Existing Techniques Used in Classification of MRI into Normal and Abnormal
The most widely used machine learning algorithms for classification of brain MRI into normal and abnormal are Support Vector Machine (SVM) [15][16][17] and AlexNet [43]. A very brief description of each algorithm is presented in the next subsections.

Support Vector Machine (SVM)
The most recent existing method, SVM is one of the most widely used supervised learning algorithms [15][16][17]. The advantages of using SVM are its memory efficiency and effectiveness in high dimensional spaces. It can also be used for regression. The SVM methodology was taken from [15]. The image was first converted into array. A label is assigned for all the images, 0 for normal class and 1 for abnormal class. Using SVM RBF kernel, an output of 0 or 1 is attained. The RBF kernel on two samples X and X is represented as It is non parameterized, but using of 2σ 2 makes it parameterized and it is known as Gaussian Radial Basis Function. It is commonly used as it is localized and it is a general purpose kernel used when no prior information is available about the data. The output obtained is 0 or 1, 0 for abnormal and 1 for normal class.

AlexNet
AlexNet was designed by Alex Krizhevsky and is an award-winning architecture of ImageNet in 2012. It is a CNN based methodology that was originally used for classification of cats and dogs. The architecture can be seen in [43] consisting of five convolutional layers and three fully connected layers. A study which uses AlexNet as one of the steps in classification and segmentation of abnormalities can be seen in [24].
In this paper, we are going to classify the brain MRI images into normal or abnormal based on a specific range of ages, as it is already established by Brown [12] that the structure of the brain varies according to age. This will indeed help in finding a similar pattern of images of different ages. The main differences of our work from other existing works are the use of data from different sources and using age and gender as attributes in classification into normal or abnormal, which is the novelty of our work. Furthermore, compared to other works, our data usage is higher even though it is still considered a small dataset. Some comparisons based on related works are given in Table 1.

Classification of Brain MRI Images Using Deep Learning Architectures
Classification plays a crucial role as it organizes images into specific groups. It is the initial step for predicting an area or region containing abnormalities in diagnosing any disease. In this section, along with the proposed methodology, three other deep learning architectures (LeNet, AlexNet, and ResNet50) are briefly discussed. The proposed classification technique for brain MRI images was performed using CNN due to its effective performance in image classification that automatically detects essential features. The brain images were classified into normal or abnormal classes, and the whole process is depicted in Figure 2. One method is a CNN-based approach with all the layers being used as per observations and formulation based on Equations (2) and (3). Using this method, classification was performed for different ages and genders to determine their similarities and differences. The imaging technique utilized here is MRI Fluid Attenuated Inversion Recovery (FLAIR) [44]. It is similar to a T2 image with a longer echo (TE) and relaxation time (TR). This sequence is very sensitive to pathology and makes the differentiation between Cerebrospinal Fluid (CSF) and an abnormality much easier [44].

LeNet Inspired Model
The proposed classification is a CNN-based model where the convolutional, pooling, and fully connected layers were used, as shown in Figure 2. It is inspired by LeNet architecture with minute changes, which is simple and has five layers (convolution and pooling layer). The input image (X) is in color format and has a size of N × N × 3. Original images and augmented images are of different sizes. The images are cropped by selecting only the brain region. Our first step involves preprocessing to remove noises present in an image. It is carried out using median filtering. Median filtering is chosen to remove the outliers without affecting the information present in an image. After median filtering, the images are resized to a specific size of 194 × 194 × 1 to ensure the images are not too small; this is in order to maintain the ratio and helps in better training if sizes are all the same. The dimension of 194 is chosen as it is the smallest size of images available. The images are converted into a grayscale image for better learning of features. These images are then passed to the most important part of a CNN, which is the convolutional layer. In each convolutional layer, stride varies, as can be seen in Figure 2. Mathematically, inputs X 1 , X 2 , . . . X N with size N × N, using f × f filters will give an output of where W i is the window of the filter and output size can be obtained using + 1 ( f is the filter, p is the padding, and s is the stride; p and s ≥ 0, f > 1).
where H is the height, W is width, and D is the depth of an image. As there is no parameter inside ReLU, no parameter is learned during this layer. A stride of 2 × 2 is used which moves two positions of pixels vertically and horizontally. At each stride, a maximum of four numbers are taken and replaced by a single value. For example, for a 94 × 94 × 16 input size, an output of 46 × 46 × 16 is obtained, whereas a stride of 1 will not reduce much in size. Filter size was taken as 3 × 3 for local features learning and not a bigger filter size such as 11 × 11. Depth of 12 and 16, respectively, was chosen arbitrarily for deeper depth, as our image has a depth of 1. As our dataset in not that huge, convolution is taken as per our requirements with total of two convolutional layers. After every layer, the image is shrunk and edge information may be reduced. This is reduced using padding. In our work, no padding is applied as reduction is still needed until the last convolutional layer. Max pooling is applied for reduction in sizes with stride of 2 × 2. After the last convolutional layer, a fully connected layer is followed with a total of 23 × 23 × 32 = 16,928 number of neurons, which are then passed to another fully connected layer of size 800. Optimization was not performed using Gradient descent (GD) but using Adam optimizer (adaptive moment estimation). It is similar to GD, but it has an advantage over it as it maintains learning rate for each weight in a network. Dropout, which is a regularizer, is used in fully connected layers in our method. The rate of 0.5 is given for this purpose. A loss function that is used was binary cross-entropy loss function (log loss) [45]. It can be calculated using: where y is the label (1 for class 1 and 0 for class 2), p(y) is the probability of being a class 1 for all N inputs, and p(y i ) is the predicted probability for all N samples given any distribution q(y). Probability of each point is 1 N . For each y = 1, it adds log(p(y)), the probability of being in class 1 and for y = 0, log(1 − p(y)) the probability of being in class 2. This gives a better loss in comparison with any other loss in all cases. Lastly, with Adam optimizer, Softmax is used for classification where value < 0.5 is classified into [1 0] (abnormal) otherwise [0 1] (normal).

CNN Combined with DNN (CNN-DNN)
This method has been taken due to the simple approach, and it is not so widely used but applicable in many fields of computer vision. The diagram showing CNN-DNN is shown in Figure 3. The network starts with the input image being passed to a convolutional layer with a filter size of 3 × 3 stride of 2 × 2 after resizing into 194 × 194. Then, it is passed to a ReLU layer with the dropout rate of 50%, which is then passed to a fully connected layer with 962,312 nodes. It is then followed by a dense layer of 400 and 100 and a classification layer that classifies into 0 or 1 using a Softmax classifier.
Other than the proposed architectures, we have also implemented a few known deep learning architectures for effective comparison, which are provided next.

LeNet
LeNet is one of the most widely used and popular network architectures in deep learning. This model is popularly implemented for the classification of objects in different domains of computer vision and hand written text using MNIST dataset. The reason for this is its simplicity and smaller number of layers. The architecture with the same parameters are used with some minor changes. The changes made were based on batch size, loss function, and the number of epoch. The architecture of LeNet can be seen in [46].

ResNet50 (Transfer Learning)
ResNet won first place on the ILSVRC 2015 classification task using ImageNet data. The architecture can be seen in [47]. For this work, ResNet50, depth based CNN, is used as a model for transfer learning. Transfer learning is flexible where the pre-trained model is used directly for classifying images. The architecture stays the same with a flatten layer and two additional dense layers. Using the dataset considered for our work, the model is trained and modified into two-class problems where the output is class 0 (abnormal) and 1 (normal).
The parameters used are changed according to our dataset, and the same number of epoch is taken for all the cases, which is 100 as output converges at this point. The differences in parameters between our method and the others can be seen in Table 2.
A comparison can be made based on computational complexity. The computational complexity (CC) of a convolutional network is measured in terms of the total number of learnable parameters [48]. It can be expressed as: where X and Y are the height and width of the input image, respectively; w and h are the width and height of the convolution kernel, respectively; and c is the number of channels.

Experimental Results
A Python programming language is used to carry out the implementation. We are using a web application Google Colab, which is an open-source application. Libraries used are Keras and TensorFlow. SVM, LIM, CNN-DNN, LeNet, AlexNet, and ResNet50 are implemented to classify the images as normal or abnormal. The implementation is carried out in two parts; firstly, generalized classification into normal or abnormal without using age and gender, and secondly, classification into normal or abnormal using age range and gender. Two approaches are used, firstly, k fold cross-validation with k fold = 5 and 8 (arbitrarily chosen), and secondly, generalization approach, where the data in the training phase are not used in the testing phase.

Performance Metrics
Many performance metrics are considered by researchers in classification, based on which Accuracy is the popularly used performance metric. For checking the validity of our result, the parameters used are Accuracy, Precision, Sensitivity, Specificity, Negative Predictive Value, False Positive Rate, False Discovery Rate, False Negative Rate, F1 Score, Matthews Correlation Coefficient, and Loss Function [49]. The different performance metrics with their description are provided in Table 3. Table 3. Performance metrics used.

No. Performance Metric Description
1 Accuracy Accuracy is a measurement that gives the correctness of classification and loss is a measure indicating that how well a model behaves after every iteration.

Normal or Abnormal Classification
T1 weighted and FLAIR data were used in this work, collected from Figshare, Brainweb, and Radiopaedia. A total of 1130 images were used in Figshare, which contains abnormal data. Each slice of T1 weighted data in Brainweb contains 181 slices of normal and abnormal data. Cropping was used to increase the number of slices, resulting in 362 slices per image. In addition, 768 T1 images and FLAIR data were taken from Radiopaedia. For this case, no data augmentation has been used. For k fold cross-validation, there are 2530 images, with 806 and 1534 normal and abnormal images, respectively. A total of 506 images are utilized for testing purposes using the generalization approach. The output obtained using k fold cross-validation and a generalization method for LeNet, AlexNet, ResNet, SVM, LIM, and CNN-DNN is given in Table 4. From the output shown in Table 4 and Figure 4 it is observed that, for five-fold crossvalidation, Accuracy, Specificity, Sensitivity, Precision, FPR, FDR, FNR, F1 score, and MCC are better in the case of LIM, and NPV in the case of SVM. For an eight-fold comparison, LeNet has better Accuracy, Specificity, Sensitivity, and Precision, whereas LIM has better NPV, FPR, FDR, FNR, F1 score, and MCC. In generalization approach Accuracy, Specificity, Sensitivity, Precision, and FDR are better in LIM; NPV and FNR in SVM; and FPR, F1 score, and MCC, are better in LeNet; in SVM, the Accuracy attained is relatively low in some circumstances due to data heterogeneity. In most cases, employing a cross-fold validation and generalization approach, LIM and LeNet produce better results than SVM methodology. It is also worth noting that less dense Nets provide higher True Positive values than a denser network such as ResNet.

Range Based Classification
For both normal (nontumorous) and abnormal (tumorous) images, the data were collected from Radiopaedia [11]. The images obtained were not all from the same patient, ensuring that distinct tumors were present. The images were divided into several age groups to perform experiments based on male or female gender or both. The ranges are not sequentially ordered and are repeated when data for a specific age are not available or when there are not any data at all. In order to identify these images and conduct the experiment, it was assumed that the data gathered came from the same MRI scan.
Based on their ages and gender, the images were divided into distinct ranges. This aids in the identification of essential and robust logical conclusions about brain size similarities across different ranges: Male   . Due to the lower number of images used, these were all cropped for data augmentation. There are 1205 images, 786 of which are abnormal and 411 of which are normal. The generalization approach uses 328 images from the aggregate data for testing purposes. It becomes much more manageable by dividing it into ranges, and it confirms that age and gender as attributes can be used to detect similarities and classify into normal or abnormal class.
From Figure 5 and output obtained in Tables 5-10

Statistical Significance Test
The T-test and Analysis of Variance (ANOVA) test are two often-used statistical tests [50]. Statistical tests show the significance of the model. Here, we have performed the ANOVA test using Python programming library for statistical test (scipy.stats). From Table 11, for classification into normal or abnormal, both the models are significant, as the p-value is less than the significance level (0.05). There is a statistical improvement using LIM and CNN-DNN over SVM, AlexNet, and ResNet, but no improvement over LeNet. In the case of classification using gender and age, there seems to be a false discovery rate producing conflicting results. LIM shows a significant difference over other models considering majority cases, both values in green and bold, with no improvement over LeNet. The test indicates that the proposed LIM can be considered equal to LeNet and outperforms SVM. There is a difference between the groups, considering deeper networks such as AlexNet and ResNet feature in both classifications with different variance and are statistically significant.
It can be observed that the result using both males and females is more distinguishable, and males or females of all ranges as separate inputs show statistical significance difference, wherein we can say that age is a more dominating factor than gender. However, it is not enough to conclude if any individual variable is significant from our output. The p-value using the ANOVA test for samples between two models is high in age and gender classification into normal or abnormal because samples have a value of 0 and 1 with fewer testing samples, unlike classification without using age and gender having heterogeneous data with more testing samples.

Benefits and Drawbacks of Our Methods
The benefits of the proposed methodologies are their simplicity and fast implementation. Though they are not as deep as other Nets available, they are still comparable to LeNet and other basic CNNs. They are spatial exploitation-based approach CNNs, with fewer layers, less training time, and less computational expense. The main aim of these methods is to find the applicability of CNN in classification into normal or abnormal classes in the simplest form. Dropout is used for overfitting purposes, similar to that of AlexNet with ReLU and Softmax activation functions. This model has no advanced structures such as residual networks, pathways, or deep and dense networks. It is as simple as LeNet and AlexNet, with computational complexity in between the two.
Although this method proves to be equivalent to other machine learning approaches, this method might not perform well when the data used are different under different settings and different datasets. This work uses unbalanced data, which can also be different from using balanced data. It is a quest in determining the capability of using deep learning models that are not deeper or wider. This model is not dense enough, which is another drawback. Additionally, this work is technical, not clinical, and not under the supervision of an expert but based on the datasets provided on the websites. The data used were from freely available online data.
A brief discussion and interpretation of comparison between the five methodologies is given in the next section.

Summary
The following findings and discussion can be concluded based on the experimental results: 1.
Using age and gender as attributes with a range of ages is more informative, as it involves higher attributes and, as a result, is less biased. This helps in effective and efficient analysis of the brain and its abnormalities. 2.
In most instances, classification into normal or abnormal without using age and gender as attributes yields less accurate results. This shows that using age and gender attributes is relevant and valuable in the classification of brains into normal or abnormal class.

3.
The pattern obtained in the case of Female (20-70) and Male + Female (10-80) yielded better results than that of other age range in almost all methodologies which signifies that using age and gender as attributes are essential and can help in better classification of a tumor. Furthermore, the same applies in the case of Male + Female, where age acts as a significant factor in providing an efficient and reliable classification where, taking gender as a factor, the result is accurate in most cases.

4.
This can be interpreted as though the output is better differentiated when both male and female are taken as separate inputs. It can be observed that assumptions of the same age range of the same gender are likely to have similar patterns, as output is better in most cases. This is because brain volume varies by 50% even in the group of the same age and varies differently for different genders [7,8]. Gender as a factor has shown a more promising result.

5.
From performance metrics and ANOVA tests, using gender can be considered a relevant factor as the pattern and output are better when taking Male or Female as a separate input; also, when combining the gender of all ages, the pattern does not change much, which can imply that gender is a dominating factor over age. The pattern obtained in the case of Male (10-80) and Female (10-80) does not provide a better result than when combining the two genders in all methodologies (except in a few cases using statistical test), which shows that similarities between males and females could be differentiated better using gender as an attribute. Using both age and gender attributes thus acts as an essential factor in providing better accuracy in diagnosis as a whole. 6.
In most cases, the output is better when CNN-based methodologies are applied instead of the SVM method. In several cases, LIM is in first or second place. On the other hand, CNN-DNN can be comparable to SVM in output provided by the generalization and k fold cross-validation approaches. This shows that deep learning methodologies have the potential to achieve reliable results through further experiments in the future. The deep learning model has more layers and provides finer details at a deeper level about the images, which act as a tool for a better prognosis. 7.
Although gender is more dominating than age as per our utilized data and result, it is not enough to say whether any variable is statistically significant based on the ANOVA test. On the other hand, the model (LIM) is statistically significant. Using higher variables as a relevant factor is reasonable based on performance metrics and the ANOVA test.

Conclusions and Future Work
Finding a treatment for various types of brain tumors has become one of the most important areas of medical imaging. Considering Accuracy, Specificity, Sensitivity, Precision, Recall, F1 Score, NPV, FPR, FDR, FNR, and MCC, LIM performs better in this paper for the first case. In most cases, employing a cross-fold validation and generalization strategy, LIM and CNN-DNN produce better results than SVM and AlexNet when dealing with heterogeneous data. LIM follows a similar pattern to the original LeNet, but it is unable to overcome it. In the second case, it was discovered that brain classification works better for brains of different ages and genders than for the brains of the same gender using LIM, CNN-DNN, and the other four methodologies. It is due to the similarities patterns between the same genders. In other words, it can be concluded that the pattern and characteristic features of the same gender are likely to be similar. Additionally, from statistical tests and performance metrics, gender can be considered a factor in the future analysis of the brain, with age as a factor as well. The accuracy is not high due to the presence of noise and heterogeneity in the data, where the methods could not differentiate between normal and abnormal images properly. An overall Accuracy using age and gender as attributes of SVM, AlexNet, ResNet, LeNet, LIM, and CNN-DNN is 82%, 64%, 44%, 87%, 88%, and 80%, respectively, and best accuracy of 92%, 81%, 52%, 97%, 100%, and 92%, respectively. Deeper networks, such as AlexNet and ResNet, were unable to produce the desired results due to their capacity for handling large amounts of data, which was limited in our case, and different setting. In addition, the data used in our case are unbalanced data which usually provide lower accuracy compared to using balanced data. Using gender as a factor, the result was more promising and is a reasonably good factor to be taken into consideration in the automated diagnosis of the brain. Overall, both age and gender are significant factors for obtaining effective and efficient results. Classifying normal or abnormal brain MRI data will be more informative and accurate with age as an attribute.
The application of deep learning-based methodologies such as CNN outperforms traditional methods, including SVM, which has the highest classification accuracy to date. More tests on brain size may be performed using large amounts of data, taking gender and suitable age range as attributes, as this can be used to reach a higher level of accuracy than a generalized classification. Classification and segmentation-based works are engaging; however, a more efficient method is needed for these purposes. Researchers are still looking for a way to reduce human effort and make the processes of detecting brain tumors and other abnormalities more efficient. Deep learning has the potential to tackle and provide higher accuracy, dependability, and efficiency.