1. Introduction
With the advancement of technology, imaging techniques have significantly improved the efficiency of clinical diagnosis and treatment. Magnetic resonance imaging (MRI), known for its high resolution and radiation-free nature, has become an essential tool for brain tumor diagnosis. However, the diverse morphology and structure of brain tumors, coupled with an average survival period of only about 15 months, necessitate effective early diagnoses and treatment methods. Traditional imaging recognition and segmentation methods rely on the manual annotation of tumor locations by physicians, which is time-consuming, subjective, and limits clinical efficiency.
Brain tumors are masses formed by the abnormal proliferation of brain cells, potentially compressing brain tissue and causing functional impairment and symptoms. Based on cellular characteristics, brain tumors are classified into benign and malignant types. Benign tumors have well-defined boundaries and do not metastasize but can cause damage due to their growth. Malignant tumors, on the other hand, are highly invasive and pose greater threats. Brain tumors are classified by origin into primary and secondary types: primary tumors develop in the brain, while secondary tumors result from metastasis of cancer cells from other body parts (e.g., lung or breast cancer). Gliomas are the most common brain tumors, accounting for 43.9% [
1,
2], including highly malignant types such as glioblastoma in adults and medulloblastoma in children [
1].
Recently, the rapid development of deep learning has demonstrated significant potential in image processing. Convolutional Neural Networks (CNNs), visual geometry group network 19 (VGGNet 9), residual network 101 version 2 (ResNet101V2), and efficient network version 2 b2 (EfficientNetV2B2), known for their superior feature extraction capabilities, have become pivotal in brain tumor classification research. We utilized multiple publicly available MRI datasets provided by Kaggle for model training and testing. The training dataset was the Brain Tumor MRI Dataset [
1], containing 5712 images, supplemented with non-tumor images from the Brain MRI ND-5 Dataset [
2] to enhance diversity. The testing dataset comprised 1311 images from the Brain Tumor MRI Dataset [
1], and the validation dataset integrated data from Tumor Classification Images [
2] and Brain Tumor Classification (MRI) by SARTAJ [
3].
We classified and identified brain tumors using four deep learning models: a CNN, VGGNet19, ResNet101V2, and EfficientNetV2B2. The classification includes four categories: no tumor, glioma, meningioma, and pituitary. The accuracy of brain tumor classification and segmentation was enhanced, and clinical decision-making time was reduced, while treatment outcomes were enhanced. The CNN and VGGNet19 showed an accuracy of 99.47 and 99.81% with high stability. EfficientNetV2B2 demonstrated an accuracy of 99.22%, although its recall rate was slightly lower for certain categories. ResNet101V2 showed a 93.53% accuracy in the small-sample scenario due to overfitting of its deep network structure. The CNN and VGGNet19 showed the best performance, being appropriate for clinical applications in brain tumor classification, while ResNet101V2 and EfficientNetV2B2 required parameter optimization and learning strategy enhancements to improve model performance and generalization capabilities.
2. Related Works
2.1. Early Brain Tumor Imaging Studies and Technological Advances
With the increasing number of brain tumor cases and the rapid accumulation of MRI data, the manual analysis of brain tumors has challenges in terms of accuracy and efficiency. The high variability in tumor shape, size, contrast, and intensity further complicates diagnosis, underscoring the importance of computer-aided diagnosis (CAD) systems [
4]. CAD systems assist doctors in automated image analysis, facilitating classification and treatment planning. For instance, Sachdeva et al. developed a CAD system that integrated image segmentation and feature extraction to classify six types of brain tumors, achieving 85% accuracy [
5]. Similarly, Dandıl et al. differentiated between benign and malignant tumors, attaining 91.49% accuracy and 94.74% specificity using support vector machines (SVM) [
6]. Additionally, Gumaei et al. proposed a hybrid feature extraction method combined with a regularized extreme learning machine (RELM), demonstrating higher stability and classification accuracy [
7].
2.2. Deep Learning
Deep learning techniques, with their ability to automatically learn features, have gradually replaced traditional methods for image analysis. For example, Paul et al. designed a brain tumor classification system using fully connected neural networks (FCNNs) and a CNN, achieving 91.43% accuracy [
8]. Díaz-Pernas et al. developed a multiscale CNN model that further improved classification performance, achieving 97.3% accuracy in classifying gliomas, meningiomas, and pituitary tumors [
9]. Ayadi et al. utilized data augmentation techniques and a deep CNN, achieving classification accuracies of 95.23, 95.43, and 98.43% for meningiomas, gliomas, and pituitary tumors, respectively [
10].
2.3. CNN
CNNs, a cornerstone of deep learning, excel in feature extraction for image processing tasks. Inspired by the structure of an animal visual cortex, a CNN uses receptive fields to capture local features, with convolutional, pooling, and fully connected layers for image classification. CNNs have been widely applied in diagnosing gliomas and brain tumors and have spawned numerous derivative architectures, such as VGGNet, ResNet, and InceptionV3 [
10]. These architectures balance accuracy, model depth, and computational efficiency, making them widely used in image classification and object detection tasks.
3. Methodology and Results
3.1. Data Preparation and Preprocessing
In data preparation and preprocessing, we consolidated and integrated brain tumor image data from multiple sources, including four categories: normal, glioma, meningioma, and pituitary tumors. Images that failed to meet quality standards (e.g., low resolution or excessive noise) were excluded to ensure data reliability and usability. Subsequently, all images were resized to a fixed dimension (e.g., 240 × 240 pixels) and converted to grayscale to reduce computational load and maintain input dimensional consistency. Additionally, pixel values were normalized to prevent gradient oscillations during training, ensuring stable model convergence. To further enhance image diversity, various data augmentation techniques were applied, including spatial transformations such as rotation, translation, and flipping. The augmentation intensity was adjusted according to the characteristics of the data to maintain the medical interpretability of the images. Each original image generated multiple augmented versions, categorized and stored separately to ensure balanced class distribution in the training and testing datasets. Automated batch processing was employed to avoid duplicate augmentations and naming conflicts. In data splitting, an 8:2 ratio (or similar proportion) was used to divide the data into training and testing sets, with additional validation sets created if necessary. Random splitting with fixed seeds ensured consistent data distribution across experiments.
3.2. Performance Metrics
To monitor the model’s performance during training, accuracy, precision, and recall were calculated at each training iteration (epoch). The training data and metrics were saved in a “training_history.json” file for subsequent evaluation and comparison. After training the models, the results of the four deep learning models were analyzed using the accuracy, precision, recall, and F1-score as evaluation metrics. These metrics were calculated based on the following equations:
- 2.
Precision: Precision represents the percentage of true positive predictions among all positive predictions.
- 3.
Recall: Recall indicates the proportion of actual positive samples correctly identified by the model.
- 4.
F1-score: F1-score is the harmonic mean of precision and recall, providing a balanced measure of a model’s performance, particularly when dealing with imbalanced datasets.
Using the equations, the metrics were calculated, and the results were visualized to compare the performance of the four models comprehensively. The result provides an understanding of each model’s strengths and weaknesses in the context of this research task.
3.3. Results
The experimental results for the four models are shown in
Figure 1,
Figure 2,
Figure 3 and
Figure 4. The analysis results of the training curves and final evaluation metrics presented the following observations:
Figure 1.
CNN results in training.
Figure 1.
CNN results in training.
CNN: The training loss decreased gradually and smoothly, while the validation loss remained relatively stable. The total average of the test set accuracy, precision, and recall was 0.9928, indicating rapid convergence and high consistency between training and testing, with excellent generalization capabilities.
EfficientNetB2V2: Although the training loss reached extremely low levels, the validation loss showed significant fluctuations. Test set accuracy, precision, and recall averaged at 0.9901, reflecting sensitivities to specific batches or images, as well as dependence on data batches and hyperparameters.
VGGNet19: Demonstrated the most stable performance overall. Training loss decreased to a low level, and validation loss showed minimal fluctuations. The total average of the test set accuracy, precision, recall, and F1-score was 0.9979, highlighting its outstanding performance in this classification task.
ResNet101V2: While training loss continued to decrease, the validation loss and metrics exhibited noticeable fluctuations. The test set accuracy was 0.9388, with relatively lower precision and recall, suggesting overfitting or insufficient feature learning in this small dataset scenario.
Figure 2.
EfficeintnetV2B2 results in training.
Figure 2.
EfficeintnetV2B2 results in training.
Figure 3.
VGGNet19 During results in training.
Figure 3.
VGGNet19 During results in training.
Figure 4.
ResNet101V2 results in training.
Figure 4.
ResNet101V2 results in training.
3.4. Confusion Matrix
CNN, EfficientNetB2V2, and VGGNet19: These models demonstrated high accuracy across most classes. Notably, VGGNet19 had no significant misclassifications (e.g., almost zero confusion between glioma and no tumor). The primary errors were concentrated between glioma and meningioma classes, possibly due to partial similarities in image features. EfficientNetB2V2 showed minor misclassifications in the pituitary class but within acceptable limits.
ResNet101V2: This model exhibited a pronounced confusion matrix, especially between glioma and meningioma classes. There were notable misclassifications in other categories (e.g., pituitary and no tumor), indicating room for improvement in feature extraction and handling of this dataset.
The experimental results are summarized in
Figure 1, revealing the following observations:
VGGNet19 achieved the highest scores across all metrics (accuracy, precision, recall, and F1-score) with an average of 0.9979. Its performance was stable and highly accurate.
The CNN showed an accuracy of 0.9928, and the recall and F1-score demonstrated excellent stability and accuracy.
EfficientNetB2V2 achieved an accuracy of 0.9901, with other metrics at similar levels. It balanced well between performance and model complexity.
ResNet101V2 scored significantly lower than the other models across all metrics, with an accuracy of 0.9388, indicating the need for further parameter optimization and regularization strategies.
In summary, VGGNet19 and the CNN presented the best performance in terms of stability and accuracy. EfficientNetB2V2 maintained a robust performance, while ResNet101V2 showed substantial room for improvement under the constraints of the small dataset environment.
Figure 5.
Confusion matrix heat map of CNN.
Figure 5.
Confusion matrix heat map of CNN.
Figure 6.
Confusion matrix heat map of EfficeintnetV2B2.
Figure 6.
Confusion matrix heat map of EfficeintnetV2B2.
Figure 7.
Confusion matrix heat map of VGGNet19.
Figure 7.
Confusion matrix heat map of VGGNet19.
Figure 8.
Confusion matrix heat map of ResNet101V2.
Figure 8.
Confusion matrix heat map of ResNet101V2.
4. Conclusions
By applying deep learning techniques to the classification and recognition of brain tumor images, the challenges posed by the complexity of tumor morphology and the limited average survival period of patients were addressed. The results can be used to meet the clinical need for efficient and accurate automated diagnostic assistance. Traditional manual annotation and diagnostic processes are time-consuming and constrained by physician expertise, making it difficult to handle large-scale image processing and achieve precise identification. Four deep learning models, a CNN, VGGNet19, ResNet101V2, and EfficientNetV2B2, accurately classified brain tumors into four categories (no tumor, glioma, meningioma, and pituitary). The workflow, encompassing data preprocessing, augmentation, model training, testing, and confusion matrix analysis, demonstrated the feasibility and high performance of automated image segmentation and recognition. VGGNet19 showed the best performance across multiple evaluation metrics (accuracy, precision, recall, and F1-score) followed by the CNN, with its smaller parameter size and rapid convergence capabilities. EfficientNetV2B2 maintained stable performance despite its simplified structure, while ResNet101V2 showed challenges such as overfitting and insufficient feature learning in a small-sample environment, underscoring the need for more targeted regularization and hyperparameter tuning strategies. Deep learning methods significantly reduce clinical interpretation time while providing high recognition rates, offering clinicians timely and reliable decision support for surgical planning and treatment strategy development. The result of this study provides a basis to automate efficient workflow design and model comparisons for four-class tumor classification, which aligns with real-world clinical needs. It also provides a replicable experimental framework, expanding the possibilities for clinical applications and laying a solid foundation for future research.
Author Contributions
Conceptualization, H.-Y.C. and C.-H.L.; methodology, Z.-Y.W. and H.-F.L.; soft-ware, H.-F.L.; validation, S.-W.F. and C.-H.L.; formal analysis, H.-Y.C.; investigation, H.-Y.C.; resources, H.-Y.C., C.-H.L. and S.-W.F.; data curation, Z.-Y.W.; writing: original draft preparation, H.-Y.C.; writing: review and editing, C.-H.L.; visualization, Z.-Y.W.; supervision, C.-H.L.; project ad-ministration, C.-H.L.; funding acquisition, C.-H.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Science and Technology Council, Taipei, Taiwan, under Grant Numbers NSTC 113-2635-E-150-001-.
Institutional Review Board Statement
Ethical review and approval were waived for this study due to the use of publicly available datasets with no personally identifiable information.
Informed Consent Statement
Not applicable.
Data Availability Statement
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Nickparvar, M. Brain Tumor MRI Dataset. Available online: https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset (accessed on 1 September 2024).
- Safwan, M.N.; Rahman, S.; Mahadi, M.H.; Jabir, T.M.; Mobin, I. Brain MRI ND-5 Dataset. IEEE Dataport. Available online: https://ieee-dataport.org/documents/brain-mri-nd-5-dataset (accessed on 1 September 2024).
- Chakrabarty, N. Brain MRI Images for Brain Tumor Detection. Available online: https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-detection?select=no (accessed on 1 September 2024).
- Kang, J.; Ullah, Z.; Gwak, J. MRI-Based Brain Tumor Classification Using Ensemble of Deep Features and Machine Learning Classifiers. Sensors. 2021, 21, 2222. [Google Scholar] [CrossRef] [PubMed]
- Sachdeva, J.; Kumar, V.; Gupta, I.; Khandelwal, N.; Ahuja, C.K. Segmentation, feature extraction, and multiclass brain tumor classification. J. Digit. Imaging. 2013, 26, 1141–1150. [Google Scholar] [CrossRef] [PubMed]
- Dandıl, E.; Çakıroğlu, M.; Ekşi, Z. Computer-Aided Diagnosis of Malign and Benign Brain Tumors on MR Images. In Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2015; pp. 157–166. [Google Scholar]
- Gumaei, A.; Hassan, M.M.; Hassan, M.R.; Alelaiwi, A.; Fortino, G. A Hybrid Feature Extraction Method With Regularized Extreme Learning Machine for Brain Tumor Classification. IEEE Access. 2019, 7, 36266–36273. [Google Scholar] [CrossRef]
- Paul, J.S.; Plassard, A.J.; Landman, B.A.; Fabbri, D. Deep learning for brain tumor classification. In Proceedings of the Medical Imaging 2017: Biomedical Applications in Molecular, Structural, and Functional Imaging, Orlando, FL, USA, 13–16 February 2017; Volume 10137, p. 1013710. [Google Scholar] [CrossRef]
- Díaz-Pernas, F.J.; Martínez-Zarzuela, M.; Antón-Rodríguez, M.; González-Ortega, D. A Deep Learning Approach for Brain Tumor Classification and Segmentation Using a Multiscale Convolutional Neural Network. Healthcare. 2021, 9, 153. [Google Scholar] [CrossRef] [PubMed]
- Ayadi, W.; Elhamzi, W.; Charfi, I.; Atri, M. Deep CNN for Brain Tumor Classification. Neural Process. Lett. 2021, 53, 671–700. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).