Classification of Brain MRI Tumor Images Based on Deep Learning PGGAN Augmentation

The wide prevalence of brain tumors in all age groups necessitates having the ability to make an early and accurate identification of the tumor type and thus select the most appropriate treatment plans. The application of convolutional neural networks (CNNs) has helped radiologists to more accurately classify the type of brain tumor from magnetic resonance images (MRIs). The learning of CNN suffers from overfitting if a suboptimal number of MRIs are introduced to the system. Recognized as the current best solution to this problem, the augmentation method allows for the optimization of the learning stage and thus maximizes the overall efficiency. The main objective of this study is to examine the efficacy of a new approach to the classification of brain tumor MRIs through the use of a VGG19 features extractor coupled with one of three types of classifiers. A progressive growing generative adversarial network (PGGAN) augmentation model is used to produce ‘realistic’ MRIs of brain tumors and help overcome the shortage of images needed for deep learning. Results indicated the ability of our framework to classify gliomas, meningiomas, and pituitary tumors more accurately than in previous studies with an accuracy of 98.54%. Other performance metrics were also examined.


Introduction
Cancer is considered one of the most widespread causes of mortality around the world. Cancers are complex diseases that may afflict every part of the human body. Poor classification of tumors may be attributable owing to the wide variation in severity of the disease, duration of illness, location of tumors, and degrees of sensitivity or resistance to the various chemotherapeutic drugs [1]. There has been a significant rise in the number of patients with brain tumors over the past decade, with cancers of the brain becoming the 10th most common type of tumor in children and adults alike. Accurate classification of brain tumors would clearly result in the most effective medical intervention and can greatly increase survival among patients.
Computer-aided diagnosis (CAD) techniques have been helping neuro-oncologists in multiple respects. CAD techniques have aided in the early detection and classification of brain tumors [2]. Physicians aided by CAD are able to make more accurate classifications compared with those relying on visual comparisons alone [3]. MRIs contain valuable information regarding the type, size, shape, and position of brain tumors without subjecting the patient to harmful ionizing radiation [4]. MRIs provide higher contrast of soft tissues compared with computerized tomography (CT) scans. Thus, coupled with a CAD system, MRIs can quickly help identify the location and size of tumors [5,6].
Brain tumors, according to World Health organization (WHO), can be classified into four grades (grade I to grade IV): grades I and II are classified as low-grade tumors, while grades III and IV are termed as high-grade tumors. Metastatic brain tumor treatment requires crucial medication parameters of non-chemotherapeutics, chemotherapeutics, radiation, and surgical techniques [28]. Brain tumor classification has been one of the most promising research areas of deep learning for medical purposes and is the primary basis of deciding on the primary form of treatment and the treatment process as well as predicting the possible success rate of treatment, and mapping out follow-up of the disease [29]. Some of the approaches used to perform feature selection and classification are explained in Table 1. Table 1. Summarization of the previous models.
This work is focused on brain tumor classification with the following contributions: Three different deep learning models with the high ability to classify multiple brain tumors (glioma, meningioma, and pituitary tumors) using MRIs are discussed. They are VGG19 + CNN, VGG19 + Gated Recurrent Unit (GRU), and VGG19 + Bi-Directional Gated Recurrent unit (Bi-GRU). Two methods of data augmentation, namely, classic data augmentation and PGGAN data augmentation, are utilized to increase the data set size. A complete comparison of the models and their applications is presented. The study also discusses the shared data set introduced by Cheng et al. [30].

Materials and Methods
Gliomas, meningiomas, and pituitary tumors are three types of fatal brain tumors detectable in MRIs. If undetected in their early stage, prognoses of these types of tumors can be dire [4]. Along with meningiomas, gliomas are recognized as one of the most common primary brain tumors [40]. Pituitary adenomas are the most frequent intracranial tumors. They are associated with high rates of morbidity and mortality [41]. Each tumor type is classified by a distinct classification scheme and is of different degrees, sizes, and shapes. Based on these differences, tumor types are identified [42]. MRIs provide a superior method of classification of brain tumors. Accuracy in classifying the tumor type is essential to allow physicians to decide on the best type of treatment and the treatment plan, accurately predict prognosis, and design follow-up plans.
This section provides a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.
A deep learning framework was developed for the purposes of this study. As shown in Figure 1, the framework consisted of three main stages: a data set pre-processing stage; a deep learning model for feature extraction, and a classification stage. Three different models were used to classify brain tumors: VGG19 + CNN, VGG19 + GRU, and VGG19 + Bi-GRU. By accurately classifying brain tumors, this framework will reduce a radiologist's workload and allow for more timely decision making. a deep learning model for feature extraction, and a classification stage. Three di models were used to classify brain tumors: VGG19 + CNN, VGG19 + GRU, and VG Bi-GRU. By accurately classifying brain tumors, this framework will reduce a radiol workload and allow for more timely decision making.

Data set for the Study
We used the public data set introduced by Cheng et al. [30] in this work. Th set was created in order to synthesize images for the training of deep learning frame and the evaluation of how accurately they identify brain tumors on MRIs. As seen ure 2, the data set consisted of 3064 T1-CE MR images from 233 patients who suffer of three different brain tumor types: gliomas (1426 images), meningiomas (708 im and pituitary tumors (930 images). The data set's MR images covered three planes sagittal, and coronal) with an image size of 512 × 512 pixels. A sample of images three tumor types from three planes is shown in Figure 3.

Data Set for the Study
We used the public data set introduced by Cheng et al. [30] in this work. This data set was created in order to synthesize images for the training of deep learning frameworks and the evaluation of how accurately they identify brain tumors on MRIs. As seen in Figure 2, the data set consisted of 3064 T1-CE MR images from 233 patients who suffered one of three different brain tumor types: gliomas (1426 images), meningiomas (708 images), and pituitary tumors (930 images). The data set's MR images covered three planes (axial, sagittal, and coronal) with an image size of 512 × 512 pixels. A sample of images of the three tumor types from three planes is shown in Figure 3. a deep learning model for feature extraction, and a classification stage. Three different models were used to classify brain tumors: VGG19 + CNN, VGG19 + GRU, and VGG19 + Bi-GRU. By accurately classifying brain tumors, this framework will reduce a radiologist's workload and allow for more timely decision making.

Data set for the Study
We used the public data set introduced by Cheng et al. [30] in this work. This data set was created in order to synthesize images for the training of deep learning frameworks and the evaluation of how accurately they identify brain tumors on MRIs. As seen in Figure 2, the data set consisted of 3064 T1-CE MR images from 233 patients who suffered one of three different brain tumor types: gliomas (1426 images), meningiomas (708 images), and pituitary tumors (930 images). The data set's MR images covered three planes (axial, sagittal, and coronal) with an image size of 512 × 512 pixels. A sample of images of the three tumor types from three planes is shown in Figure 3.

Pre-Processing and Image Augmentation
Even the smallest CNNs have thousands of parameters upon which their layers to be trained. When using a small number of images, CNN models face the probl overfitting. To counter this problem, we employed two methods of data augmen (DA): classic augmentation and PGGAN-based augmentation.
The best results are typically obtained when no preprocessing is used with th images. Therefore, image intensities were scaled to the range [−1, 1]. The data se randomly divided into three groups with their target label: a training group (70%), idation group (15%), and a testing group (15%).

Classic Data Augmentation
In this phase, the data set size was increased using an augmenter. This was ach by changing the brain's position enough to avoid a model memorizing the location brain. Classic DA applied to MRIs typically includes the following operations:

Pre-Processing and Image Augmentation
Even the smallest CNNs have thousands of parameters upon which their layers need to be trained. When using a small number of images, CNN models face the problem of overfitting. To counter this problem, we employed two methods of data augmentation (DA): classic augmentation and PGGAN-based augmentation.
The best results are typically obtained when no preprocessing is used with the MR images. Therefore, image intensities were scaled to the range [−1, 1]. The data set was randomly divided into three groups with their target label: a training group (70%), a validation group (15%), and a testing group (15%).

Classic Data Augmentation
In this phase, the data set size was increased using an augmenter. This was achieved by changing the brain's position enough to avoid a model memorizing the location of the brain. Classic DA applied to MRIs typically includes the following operations:

Pre-Processing and Image Augmentation
Even the smallest CNNs have thousands of parameters upon which their layers need to be trained. When using a small number of images, CNN models face the problem of overfitting. To counter this problem, we employed two methods of data augmentation (DA): classic augmentation and PGGAN-based augmentation.
The best results are typically obtained when no preprocessing is used with the MR images. Therefore, image intensities were scaled to the range [−1, 1]. The data set was randomly divided into three groups with their target label: a training group (70%), a validation group (15%), and a testing group (15%).

Classic Data Augmentation
In this phase, the data set size was increased using an augmenter. This was achieved by changing the brain's position enough to avoid a model memorizing the location of the brain. Classic DA applied to MRIs typically includes the following operations: • Rotation: Rotation of image without cropping because a cropped image may not contain the whole tumor. Images were rotated at 90, 180, and 270 degrees;

PGGAN-Based Data Augmentation
This research investigated the possibility of using the GAN technique as another method of augmentation in order to correct the imbalance in medical data sets. PGGAN

PGGAN-Based Data Augmentation
This research investigated the possibility of using the GAN technique as another method of augmentation in order to correct the imbalance in medical data sets. PGGAN [43] is a GAN model that progressively grows off a generator and discriminator. It chooses training stability, synthesizing images up to 1024 × 1024 pixels in resolution.
We used PGGAN to overcome the lack of sufficient images in the data set, synthesizing brain tumor MRIs in three planes: axial, sagittal, and coronal for three tumor types that had the same structure. Images were synthesized from low 4 × 4 pixel images using a 512 latent vector. Images reached a resolution of 256 × 256 pixels, increasing in resolution at each step by the power of 2 ( Figure 5). We trained the model and generated each dimension in each class image separately using the main images from the data set [30].
Diagnostics 2021, 11, x FOR PEER REVIEW [43] is a GAN model that progressively grows off a generator and discriminator. training stability, synthesizing images up to 1024 × 1024 pixels in resolution.
We used PGGAN to overcome the lack of sufficient images in the data set, ing brain tumor MRIs in three planes: axial, sagittal, and coronal for three tu that had the same structure. Images were synthesized from low 4 × 4 pixel ima a 512 latent vector. Images reached a resolution of 256 × 256 pixels, increasing in at each step by the power of 2 ( Figure 5). We trained the model and generate mension in each class image separately using the main images from the data se PGGAN implementation details: The original PGGAN introduced in [43] Using this PGGAN, we trained the model to synthetize MRIs of brain tumors high-resolution MRI is facilitated from a latent vector of 512. The architecture o used to synthesize the brain tumor consisted of two models: the generator an criminator. Variable batch size was changed during training for each resolution 4→128, 8→128, 16→128, 32→ 64, 64→32, 128→16, and 256→4, to prevent the sy exceeding available memory. An Adam optimizer [44] with a learning rate = 0.0 and β2 = 0.99, and epsilon = 1 × 10 −8 was chosen as it gives the best accuracy. W WGAN-GP as a loss function [45]. As in the study by [21], pixelwise feature v malization was applied to the generator model after each convolution layer, w ception of the final output layer. The activation layer was a Leaky ReLU (LReL leakiness of 0.2. It was used as both generator and discriminator.

Proposed Deep Learning Models
Different tumor types may afflict the human brain, including, as aforement omas, meningiomas, and pituitary tumors. MRIs represent the best radiologic Using this PGGAN, we trained the model to synthetize MRIs of brain tumors. Ideally, a high-resolution MRI is facilitated from a latent vector of 512. The architecture of PGGAN used to synthesize the brain tumor consisted of two models: the generator and the discriminator. Variable batch size was changed during training for each resolution as follows: 4→128, 8→128, 16→128, 32→ 64, 64→32, 128→16, and 256→4, to prevent the system from exceeding available memory. An Adam optimizer [44] with a learning rate = 0.001, β1 = 0, and β2 = 0.99, and epsilon = 1 × 10 −8 was chosen as it gives the best accuracy. We used a WGAN-GP as a loss function [45]. As in the study by [21], pixelwise feature vector normalization was applied to the generator model after each convolution layer, with the exception of the final output layer. The activation layer was a Leaky ReLU (LReLU) with a leakiness of 0.2. It was used as both generator and discriminator.

Proposed Deep Learning Models
Different tumor types may afflict the human brain, including, as aforementioned, gliomas, meningiomas, and pituitary tumors. MRIs represent the best radiological method of identifying and classifying these brain tumors. The application of deep machine learning has advanced the classification process of MRIs of brain tumors. The objective of this study was to develop an intelligent system that could better classify the MRIs of gliomas, meningiomas, and pituitary tumors into these three classes of tumors compared with currently existing models. To achieve this aim, first, a VGG19 was used as feature extraction.
VGG19 [46] is a widely used CNN architecture model composed of 19 layers with 3 × 3 convolution filters and a stride of 1 designed to achieve high accuracy in large-scale image applications. As a large-scale image features extractor with high accuracy, a VGG19 was essential to our framework. Several deep learning classifiers were integrated with the VGG19 extractor in order to reach maximum accuracy of brain tumor classification.
To decide on the best combination of VGG19 and classifier (VGG19 + CNN or a combination of VGG19 and an RNN model, such as GRU and Bi-GRU), we tested three different architectures: VGG19 + CNN, VGG19 + GRU, and VGG19 + Bi-GRU. We compared their performances and chose the best of them. Finally, we tested the same three models coupled with one of two types of augmentation models: a classical augmenter and a PGGAN-based augmenter (augmenters are designed to produce more realistic MRIs).
Pre-processing: Input image size stood at 224 × 224 pixels. We resized all MRIs in the data set to the same size. All PGGAN-generated images were 256 × 256 pixels in size. We thus resized those images to 224 × 224 pixels.
Data augmentation (DA) setup: To test the performance of our models, we used the following two DA setups with a sufficient number of images for the three classes of tumor:

VGG19 and CNN Deep Learning Model
A deep learning model that consisted of a VGG19 + CNN model followed by CNN was designed to classify brain tumors on MRIs. It employed feature extraction and classification. In this model, VGG19 + CNN was used as a feature extraction model. The whole number of parameters of VGG19 + CNN stood at 139,581,379, which were all trainable parameters. The structure of the model is shown in Table 2; the input layer for the model was 224 × 224 × 1 brain tumor MRIs. The feature extraction consisted of a VGG19 followed by two blocks, each block having a convolution layer with 4096 neurons followed by a dropout layer with the rate of 0.5, a ReLU activation function, and finally, a SoftMax layer that functioned to classify the output into one of the three tumor types. Table 2. Architecture details of the VGG19+CNN model. The maxpooling layer is applied after each convolution layer.

VGG19 + CNN Model
Output Shapes

VGG19 and GRU Deep Learning Model
GRU is an RNN architecture with a number of advantageous features, such as its simplicity and need for less training time, in addition to its ability to store information irrelevant to the predictions made for extended periods of time. Our second deep model, which consisted of a VGG19 model followed by GRU, was designed to classify the MRIs of brain tumors. It employed feature extraction and classification. To our second model, we added GRU as a classifier after VGG19 having been used as feature extraction. Beyond the layers of the VGG19, the output passed to a reshape layer followed by a GRU layer with 512 units, then two blocks consisting of a convolution layer of 1024 neurons followed by a dropout layer, and finally a SoftMax layer, as shown in Table 3 (27,895,747 total parameters). Table 3. Architecture details of the VGG19 + GRU model.

VGG19 and GRU Bidirectional Deep Learning Model
Our third model consisted of a VGG19 followed by a Bi-GRU, which was used as a feature extraction model and classifier. The model consisted of a VGG19 followed by a reshaping layer, a Bi-GRU layer, a dropout layer, a dense layer, a dropout layer, and a dense layer with a SoftMax activation function. The structure of the model is shown in Table 4 (34,714,563 total parameters). Table 4. Architecture details of the VGG19 + Bi-GRU model.

VGG19 + Bi-GRU Model
Output Shapes Figure 6 compares the number of parameters for the three models, and the VGG19 + GRU model was shown to give a lower number of parameters than the other two models. This indicated that it had the lowest complexity. The VGG19 + CNN had a high number of parameters. Diagnostics 2021, 11, x FOR PEER REVIEW 9 of 19 Figure 6. Number of trainable parameter-wise distributions for each of the three models.

Results
All classification models in the framework created for this study were run using TensorFlow and Keras frameworks and trained using Google Colab with the following specification: 2 TB storage, 12 GB RAM, and at a minimum graphical processing of unit (GPU) P100. Before completing the final model evaluation, we conducted a number of tests to select the best hyperparameters. We ran our model for 100 epochs with a batch size of 32 and categorical cross-entropy loss for multi-classification without ImageNet pre-training. To select the best optimizer from a choice of Adam [44], Adamax [44], RMSprop [47], or Nadam [48] for the three models, we tested the accuracy of the models using these four optimizers. We used a constant learning rate of 0.00001.

Performance Metrics
A number of recent studies used confusion matrices to analyze models and assess the level of performance of the classification process, categorizing relationships between data and distributions. By using different confusion matrices, classification models may be assessed extensively [49]. In our study, four primary keys-a true positive (Tp), a true negative (Tn), a false positive (Fp), and a false negative value (Fn)-were used to test the classifier. Then, based on the four outcomes, the performance of the model was computed in terms of accuracy (ACC), sensitivity, specificity (SPC), precision (PPV), negative predictive values (NPVs), the F1-score, and Matthew's correlation coefficient (MCC).
Accuracy, as given in Equation (1), is defined as the number of correctly predicted samples to the total predicted sample.
Sensitivity (Recall), as given in Equation (2), is the number of samples labeled as positive out of the total number of positive samples.
Specificity of the true negative rate, as given in Equation (3), is the number of samples predicted as negative out of the total number of negative samples.

Results
All classification models in the framework created for this study were run using TensorFlow and Keras frameworks and trained using Google Colab with the following specification: 2 TB storage, 12 GB RAM, and at a minimum graphical processing of unit (GPU) P100. Before completing the final model evaluation, we conducted a number of tests to select the best hyperparameters. We ran our model for 100 epochs with a batch size of 32 and categorical cross-entropy loss for multi-classification without ImageNet pre-training. To select the best optimizer from a choice of Adam [44], Adamax [44], RMSprop [47], or Nadam [48] for the three models, we tested the accuracy of the models using these four optimizers. We used a constant learning rate of 0.00001.

Performance Metrics
A number of recent studies used confusion matrices to analyze models and assess the level of performance of the classification process, categorizing relationships between data and distributions. By using different confusion matrices, classification models may be assessed extensively [49]. In our study, four primary keys-a true positive (Tp), a true negative (Tn), a false positive (Fp), and a false negative value (Fn)-were used to test the classifier. Then, based on the four outcomes, the performance of the model was computed in terms of accuracy (ACC), sensitivity, specificity (SPC), precision (PPV), negative predictive values (NPVs), the F1-score, and Matthew's correlation coefficient (MCC).
Accuracy, as given in Equation (1), is defined as the number of correctly predicted samples to the total predicted sample.
Sensitivity (Recall), as given in Equation (2), is the number of samples labeled as positive out of the total number of positive samples.
Specificity of the true negative rate, as given in Equation (3), is the number of samples predicted as negative out of the total number of negative samples.
Precision, as in Equation (4), represents the number of samples truly positive that were predicted as such out of the total number of samples predicted as positive.
Conversely, the negative predictive value (NPV) is the number of truly negative samples that were predicted as such, out of the total number of samples predicted as negative, given in Equation (5).
The harmonic means of precision and recall, known as the F1-score, is shown in Equation (6).
Finally, Matthew's correlation coefficient range allows one to gauge how well the classification model performs.
In a number of previous studies, researchers chose two methods to design a classification model for brain tumors. The first method only involved axial images [31,33,38] while the second method involved images in three planes (axial, sagittal, and coronal) [4,23,30,32,[34][35][36][37]39]. In our study, we used the second method to generalize the model.

Scenario I: Deep Learning Models with Classic Augmentation
To set the generalized design for the classification of brain tumors in all three planes, we needed to increase the total number of images in the data set. This was achieved by using a classical augmenter for all three planes. For the VGG19 + CNN model, as seen in Table 5, the highest accuracy of 96.59% was achieved using the Nadam optimizer. The other performance metrics achieved by the VGG19 + CNN model for the three classes of brain tumors are shown in Table 6. The highest performance metric involved pituitary tumors. The model achieved the highest sensitivity in identifying gliomas, for which NPV values were also the highest. The bold values indicate the best accuracy value that was achieved. Figure 7a shows the confusion matrix of the VGG19 + CNN model, in which our framework accurately identified gliomas in 100% of cases, meningiomas in 90.2% cases, and pituitary tumors in 96.92% of cases. We used a combination of CNN and RNN models to classify brain tumors in MRIs. We combined the VGG19 + GRU model; this combination enabled our model to process the characteristics of the information contained in the image and effectively learn the structural features of brain tumors.  Figure 7a shows the confusion matrix of the VGG19 + CNN model, in which our framework accurately identified gliomas in 100% of cases, meningiomas in 90.2% cases, and pituitary tumors in 96.92% of cases. We used a combination of CNN and RNN models to classify brain tumors in MRIs. We combined the VGG19 + GRU model; this combination enabled our model to process the characteristics of the information contained in the image and effectively learn the structural features of brain tumors.   Figure 7a shows the confusion matrix of the VGG19 + CNN model, in which our framework accurately identified gliomas in 100% of cases, meningiomas in 90.2% cases, and pituitary tumors in 96.92% of cases. We used a combination of CNN and RNN models to classify brain tumors in MRIs. We combined the VGG19 + GRU model; this combination enabled our model to process the characteristics of the information contained in the image and effectively learn the structural features of brain tumors. The second model (VGG19 + GRU) combined the VGG19 as a feature extractor with GRU as a classifier. The highest accuracy of 94.89% was achieved with the Adam optimizer, as seen in Table 5, with the highest accuracy metric remaining that of pituitary tumors. Figure 7b shows the VGG19 + GRU model confusion matrix, indicating that the model successfully and accurately predicated gliomas, meningiomas, and pituitary tumors at rates of 92.74, 94.12, and 98.46%, respectively. For the third model (VGG19 + Bi-GRU), we used Bidirectional-GRU combined with VGG19 as a classifier of brain tumors. Table 5 shows that the VGG19 + Bi-GRU combination yielded promising results, reaching an accuracy level of 95.62% when coupled with the RMSprop optimizer. The best metrics for accuracy, precision, F1-score, specificity, and MCC were achieved in relation to the identification of pituitary tumors, while sensitivity and NPV were highest in relation to the detection of gliomas. The VGG19 + Bi-GRU model's confusion matrix indicated successful prediction rates of 97.21, 90.20, and 97.69% of gliomas, meningiomas, and pituitary tumors, respectively. Figure 7c shows the VGG19 + Bi-GRU model confusion matrix. The losses of the three models during the entire training for the test and validation process versus epochs for the three best models is shown in Figure 8. The second model (VGG19 + GRU) combined the VGG19 as a feature extractor with GRU as a classifier. The highest accuracy of 94.89% was achieved with the Adam optimizer, as seen in Table 5, with the highest accuracy metric remaining that of pituitary tumors. Figure 7b shows the VGG19 + GRU model confusion matrix, indicating that the model successfully and accurately predicated gliomas, meningiomas, and pituitary tumors at rates of 92.74, 94.12, and 98.46%, respectively. For the third model (VGG19 + Bi-GRU), we used Bidirectional-GRU combined with VGG19 as a classifier of brain tumors. Table 5 shows that the VGG19 + Bi-GRU combination yielded promising results, reaching an accuracy level of 95.62% when coupled with the RMSprop optimizer. The best metrics for accuracy, precision, F1-score, specificity, and MCC were achieved in relation to the identification of pituitary tumors, while sensitivity and NPV were highest in relation to the detection of gliomas. The VGG19 + Bi-GRU model's confusion matrix indicated successful prediction rates of 97.21, 90.20, and 97.69% of gliomas, meningiomas, and pituitary tumors, respectively. Figure 7c shows the VGG19 + Bi-GRU model confusion matrix. The losses of the three models during the entire training for the test and validation process versus epochs for the three best models is shown in Figure 8. Even though our models achieved a high level of performance at 96.59%, they initially possessed a number of shortcomings that would have affected their application clinically. For example, training the models for the classification of a brain tumor was conducted using a limited number of images with the same characteristics and 'increased' in number using a classic augmenter. Therefore, we used PGGAN as an augmenter in order to improve the performance metrics of our models and make our models more realistic and generalized. Re-runs of the three models with synthesized MRIs were conducted and results compared to those of previous studies.

Scenario II: Deep Learning Models with PGGAN-Based DA
Increasing the accuracy of a model's ability to classify brain tumors will naturally make our framework more reliable and clinically applicable. Therefore, the second phase of the experiment, as aforementioned, was to make synthetic MR images of brain tumors by employing the PGGAN model in order to make classification more realistic. We used nine PGGAN models: three PGGAN models for each tumor type, one model for each plane.
Clinical validation of synthesized MRIs by an experienced radiologist: To ensure that the output of PGGAN mimicked realistic tumor MRIs and that correct features were generated, an experienced radiologist reviewed the images. The radiologist was asked to confirm that the three target tumor types were realistically represented through the generated images and to help identify poorly generated or 'wrong' images, which were then discarded. Figures 9 and 10 illustrate samples of generated MR images ('realistic' and 'wrong' generated images, respectively). New 'realistic' MRIs of brain tumors generated using the PGGAN model were all added to the original training data set, then the three models were retrained and the performance of each model was checked. Even though our models achieved a high level of performance at 96.59%, they initially possessed a number of shortcomings that would have affected their application clinically. For example, training the models for the classification of a brain tumor was conducted using a limited number of images with the same characteristics and 'increased' in number using a classic augmenter. Therefore, we used PGGAN as an augmenter in order to improve the performance metrics of our models and make our models more realistic and generalized. Re-runs of the three models with synthesized MRIs were conducted and results compared to those of previous studies.

Scenario II: Deep Learning Models with PGGAN-Based DA
Increasing the accuracy of a model's ability to classify brain tumors will naturally make our framework more reliable and clinically applicable. Therefore, the second phase of the experiment, as aforementioned, was to make synthetic MR images of brain tumors by employing the PGGAN model in order to make classification more realistic. We used nine PGGAN models: three PGGAN models for each tumor type, one model for each plane.
Clinical validation of synthesized MRIs by an experienced radiologist: To ensure that the output of PGGAN mimicked realistic tumor MRIs and that correct features were generated, an experienced radiologist reviewed the images. The radiologist was asked to confirm that the three target tumor types were realistically represented through the generated images and to help identify poorly generated or 'wrong' images, which were then discarded. Figures 9 and 10 illustrate samples of generated MR images ('realistic' and 'wrong' generated images, respectively). New 'realistic' MRIs of brain tumors generated using the PGGAN model were all added to the original training data set, then the three models were retrained and the performance of each model was checked. Even though our models achieved a high level of performance at 96.59%, they initially possessed a number of shortcomings that would have affected their application clinically. For example, training the models for the classification of a brain tumor was conducted using a limited number of images with the same characteristics and 'increased' in number using a classic augmenter. Therefore, we used PGGAN as an augmenter in order to improve the performance metrics of our models and make our models more realistic and generalized. Re-runs of the three models with synthesized MRIs were conducted and results compared to those of previous studies.

Scenario II: Deep Learning Models with PGGAN-Based DA
Increasing the accuracy of a model's ability to classify brain tumors will naturally make our framework more reliable and clinically applicable. Therefore, the second phase of the experiment, as aforementioned, was to make synthetic MR images of brain tumors by employing the PGGAN model in order to make classification more realistic. We used nine PGGAN models: three PGGAN models for each tumor type, one model for each plane.
Clinical validation of synthesized MRIs by an experienced radiologist: To ensure that the output of PGGAN mimicked realistic tumor MRIs and that correct features were generated, an experienced radiologist reviewed the images. The radiologist was asked to confirm that the three target tumor types were realistically represented through the generated images and to help identify poorly generated or 'wrong' images, which were then discarded. Figures 9 and 10 illustrate samples of generated MR images ('realistic' and 'wrong' generated images, respectively). New 'realistic' MRIs of brain tumors generated using the PGGAN model were all added to the original training data set, then the three models were retrained and the performance of each model was checked.  With the VGG19 + CNN model, the Adam optimizer achieved the highest accuracy which stood at 98.54%, which was 4.38% higher than that of classic DA. Additionally overall performance metrics also rose in value, especially with respect to the pituitary tu mors illustrated in Tables 7 and 8. Figure 11a shows the confusion matrix of the VGG19 CNN model. It is worth noting that, even with the best model, six images were misclass fied. Four images of gliomas were erroneously classified as meningiomas and two men ingiomas were wrongly classified as gliomas.  The bold values indicate the best accuracy value that was achieved. With the VGG19 + CNN model, the Adam optimizer achieved the highest accuracy, which stood at 98.54%, which was 4.38% higher than that of classic DA. Additionally, overall performance metrics also rose in value, especially with respect to the pituitary tumors illustrated in Tables 7 and 8. Figure 11a shows the confusion matrix of the VGG19 + CNN model. It is worth noting that, even with the best model, six images were misclassified. Four images of gliomas were erroneously classified as meningiomas and two meningiomas were wrongly classified as gliomas. The bold values indicate the best accuracy value that was achieved. With the VGG19 + CNN model, the Adam optimizer achieved the highest accuracy, which stood at 98.54%, which was 4.38% higher than that of classic DA. Additionally, overall performance metrics also rose in value, especially with respect to the pituitary tumors illustrated in Tables 7 and 8. Figure 11a shows the confusion matrix of the VGG19 + CNN model. It is worth noting that, even with the best model, six images were misclassified. Four images of gliomas were erroneously classified as meningiomas and two meningiomas were wrongly classified as gliomas.  The bold values indicate the best accuracy value that was achieved. For the second model, the Nadam optimizer algorithm achieved the highest accuracy, which stood at 96.59%, as shown in Table 7. Performance metrics are detailed in Table 8. Figure 11b shows the related confusion matrix with a 97.77% accuracy rate for gliomas, 94.12% for meningiomas, and 96.92% for pituitary tumors.
For the third model, the overall performance metrics improved, as illustrated in Tables 7 and 8. The model's accuracy rose to 96.84%, with a 1.22% accuracy improvement over classical DA. Figure 11c shows the confusion matrix for the VGG19 + Bi-GRU model.
Lastly, we compared the three proposed models in terms of accuracy and loss to determine the best of the three. Figure 12 shows the training and validation loss versus epochs for the three best-performing models. Figure 13 shows the accuracy rates of the three models, indicating that the VGG19 model had the highest performance accuracy and the lowest loss value of the three proposed models.  For the second model, the Nadam optimizer algorithm achieved the highest accuracy, which stood at 96.59%, as shown in Table 7. Performance metrics are detailed in Table 8. Figure 11b shows the related confusion matrix with a 97.77% accuracy rate for gliomas, 94.12% for meningiomas, and 96.92% for pituitary tumors.
For the third model, the overall performance metrics improved, as illustrated in Tables 7 and 8. The model's accuracy rose to 96.84%, with a 1.22% accuracy improvement over classical DA. Figure 11c shows the confusion matrix for the VGG19 + Bi-GRU model.
Lastly, we compared the three proposed models in terms of accuracy and loss to determine the best of the three. Figure 12 shows the training and validation loss versus epochs for the three best-performing models. Figure 13 shows the accuracy rates of the three models, indicating that the VGG19 model had the highest performance accuracy and the lowest loss value of the three proposed models.    For the second model, the Nadam optimizer algorithm achieved the highest accuracy, which stood at 96.59%, as shown in Table 7. Performance metrics are detailed in Table 8. Figure 11b shows the related confusion matrix with a 97.77% accuracy rate for gliomas, 94.12% for meningiomas, and 96.92% for pituitary tumors.
For the third model, the overall performance metrics improved, as illustrated in Tables 7 and 8. The model's accuracy rose to 96.84%, with a 1.22% accuracy improvement over classical DA. Figure 11c shows the confusion matrix for the VGG19 + Bi-GRU model.
Lastly, we compared the three proposed models in terms of accuracy and loss to determine the best of the three. Figure 12 shows the training and validation loss versus epochs for the three best-performing models. Figure 13 shows the accuracy rates of the three models, indicating that the VGG19 model had the highest performance accuracy and the lowest loss value of the three proposed models.

Discussion
This section compares the performances of our framework with its three proposed models (VGG19 + CNN, VGG19 + GRU, and VGG19 + Bi-GRU) with eight recent stateof-the-art models [4,23,[30][31][32]34,38,39]. Based on the experimental results reported in these papers, we found that our proposed combined structure of VGG19 + CNN with the PGGAN data augmentation model was superior to all recent works that tackled brain tumor MRI classification.
We first compared the results of studies of four state-of-the-art models that used classical augmentation [4,23,31,38] with the results of our best comparable model: VGG19 + CNN) combined with classic augmentation. The authors of [31] developed a CNN model for the classification of MRIs of a number of brain tumors. Only axial 989 MRIs of brain tumors were used. Three types of operations, rotating, shifting, and mirroring, were used for the purposes of data augmentation in order to increase the data set. The authors applied three classifier models: a convolutional neural network (CNN), a fully connected network, and a random forest. The CNN yielded the best classifier when applied to 256 × 256 images with an accuracy of 91.43%. A genetic algorithm (GA) was used to choose the best CNN architecture to classify the different grades of gliomas [38]. The CNN was comprised of 12 layers, increasing accuracy to 94.2%. A new classifier based on the GAN model was introduced by [4]. A deep convolutional model was pre-trained as a discriminator within the DCGAN model. The discriminator was designed to distinguish between actual and generated images and define image features, and it was then fine-tuned as the classifier of brain tumors. This model achieved a 95.6% accuracy. A CNN-based deep learning model was developed by [23] in order to classify MRIs of tumors. Images of 128 × 128 pixels were fed into a 16-layer model. The investigated DL model achieved an accuracy of 96.13%.
As shown in Table 9, our VGG19 + CNN model with classic augmentation multiplied the data set images by nearly 6-fold and achieved a 96.59% accuracy, higher than the four models with which it was compared. This indicated the superiority of our framework even when using classical augmentation. Moreover, using the additional images generated by the PGGAN model helped to increase the accuracy of the VGG19 + CNN model by a further 1.96%. Table 9. Comparison of our framework with other works based on classic augmentation.

Ref. Augmentation Operation
Data Set Size after Augmentation Data Set Division ACC% [31] Rotation random angle between (0 and 360 angle). Shifted randomly by −4 to 4 pixels left or right and up or down. Mirror across its y-axis. As shown in Table 10, we compared the accuracy of our VGG19 + CNN model with all seven recent models that were designed to classify MRIs of brain tumors into three types (gliomas, meningiomas, and pituitary tumors). The accuracy rate of our combined VGG19 + CNN and PGGAN data augmentation framework surpassed that of the seven other models, achieving an accuracy of 98.54%. To further assess the performance of our framework in comparison with the seven other models, we compared all remaining metrics (precision, sensitivity, and specificity). We calculated the performance metrics (precision, sensitivity, and specificity) calculated with our VGG19 + CNN model and PGGAN data augmentation framework. Metrics pertaining to the identification of meningiomas stood at 96.15% for precision, 98.04% for sensitivity, and 98.71% for specificity. Glioma-related metrics stood at 98.87, 97.77, and 99.14% for precision, sensitivity, and specificity, respectively. Pituitary tumor-related metrics stood at a perfect 100%.
We lastly compared our VGG19 + CNN and PGGAN model with previous models that used the same Cheng et al. [30] data set. Results are shown in Table 10, including a comparison of accuracy and most of the performance metrics that were assessed in the proposed framework based on the VGG19 + CNN model and PGGAN augmentation. We achieved high performance with the VGG19 + CNN model and PGGAN augmentation without the use of manual segmentation. Our studies reported the use of manual segmentation of tumor regions to improve the performance of their models [30,34]. Ref. [32] reported inferior results for their KE-CNN model, compared with our framework. Importantly, a model in which the CNN was designed using a genetic algorithm [38] still yielded inferior results in comparison to those of our framework. We also compared our framework to a model based on DCGAN [4], where the PGGAN model could reportedly generate high-resolution im-ages that helped to increase accuracy [23]. A classical augmentation was used in scenario I to improve performance, In scenario II, PGGAN was used for augmentation and the results showed that PGGAN augmentation plus the VGG19 + CNN model outperformed the other proposed models in our work. Ref. [39] used GoogLeNet + KNN to classify brain tumors. Prior to our study, they reported the highest accuracy in the literature. Our proposed VGG19 + CNN and PGGAN augmentation framework outperformed the [39] model. Finally, using a PGGAN model to generate high-resolution images with 'realistic' features helped us overcome the problem of overfitting with better images than those produced by classical augmentation, resulting in higher levels of accuracy.

Conclusions
The recent research work in tumor classification based on MR images has witnessed some challenges, such as the number of images in the data set, and the low accuracy of the designed models. This work proposed a complete framework based on a deep learning model as feature extractors with different classifier models designed to classify MRIs of gliomas, meningiomas, and pituitary tumors using different types of augmentations. The feature extractor VGG19 was used to extract features of brain tumor MRIs. Three types of classifiers were then tested (CNN, and the recursive neural networks GRU and Bi-GRU). In our work, the number of images in the data set was increased by using different methods for image augmentation: PGGAN and classic augmentation methods such as rotation, mirror, and flipping. The proposed models achieved more accuracy than the recent introduced models.
The CNN classifier yielded the best accuracy performance (98.54%). The VGG19 + CNN model and PGGAN augmentation framework outperformed the other models in all previous work, with accuracy values of 98.54, 98.54, and 100% for gliomas, meningiomas, and pituitary tumors, respectively.
Our study tested the classification of only three general brain tumor types: meningiomas, gliomas, and pituitary tumors. This constitutes a limitation of the study since other types of brain tumors exist. Moreover, our synthetic images were 256 × 256, while the primary data set images were all 512 × 512. Because of limitations of computational resources, the input for the model was resized to 224 × 224. In future studies, we aim to work on acquiring primary images with appropriate size in order to make the images as realistic as possible to a radiologist.