An Interpretable Deep Learning Approach for Brain Tumor Classification Using a Bangladeshi Brain MRI Dataset

Polash, Md. Saymon Hosen; Saykat, Md. Tamim Hasan; Haque, Md. Ehsanul; Maniruzzaman, Md.; Zabin, Mahe; Uddin, Jia

doi:10.3390/biomedinformatics6020019

Open AccessArticle

An Interpretable Deep Learning Approach for Brain Tumor Classification Using a Bangladeshi Brain MRI Dataset

by

Md. Saymon Hosen Polash

¹

,

Md. Tamim Hasan Saykat

¹

,

Md. Ehsanul Haque

²

,

Md. Maniruzzaman

³

,

Mahe Zabin

^4,* and

Jia Uddin

^5,*

¹

Department of Computer Science and Engineering, East West University, Dhaka 1212, Bangladesh

²

Program of Data Science and Analytics, Department of Mathematical and Physical Sciences, East West University, Dhaka 1212, Bangladesh

³

Department of Electrical and Computer Engineering, School of Engineering, San Francisco Bay University, Fremont, CA 94539, USA

⁴

Human and Digital Interface Department, JW Kim College of Future Studies, Woosong University, Daejeon 34606, Republic of Korea

⁵

Department of Computer Science and Engineering, Woosong University, Daejeon 34606, Republic of Korea

^*

Authors to whom correspondence should be addressed.

BioMedInformatics 2026, 6(2), 19; https://doi.org/10.3390/biomedinformatics6020019

Submission received: 11 February 2026 / Revised: 26 March 2026 / Accepted: 1 April 2026 / Published: 7 April 2026

Download

Browse Figures

Versions Notes

Abstract

Magnetic resonance imaging (MRI) is a critical clinical tool that requires precise and reliable interpretation for effective brain tumor diagnosis and timely treatment planning. Deep learning methods have advanced automated tumor classification greatly in the last few years, but many of the current methods are still challenged by a lack of interpretability, a lack of testing on region-focused data, and a lack of model robustness testing. Such limitations reduce clinical trust and limit the practice of automated diagnostic systems. To address these challenges, this study proposes an interpretable deep learning model for classifying brain tumors using the PMRAM dataset, which is a Bangladeshi brain MRI collection containing four categories: glioma, meningioma, pituitary tumor, and normal brain.. The proposed pipeline combines image preprocessing and feature enhancement methods, and then it trains a series of squeeze-and-excitation (SE)-enhanced convolutional neural networks such as VGG19, DenseNet201, MobileNetV3-Large, InceptionV3, and EfficientNetB3. The SE-enhanced EfficientNetB3 performed best, with 98.70% accuracy, 98.77% precision, 98.70% recall, and 98.70% F1-score. Cross-validation also demonstrated stable performance, with a mean accuracy of 96.89%. The model also exhibited efficient inference with low GPU memory consumption, enabling predictions in about 2–4 s per MRI image. Grad-CAM++ and saliency maps were used to improve the transparency of the results, and it was found that the network was concentrated on the clinically significant parts of the tumor, which affected the model predictions. Further robustness analysis and cross-dataset testing are additional evidence of the generalization possibility of the model. An online application was also implemented to allow real-time prediction and visual explanation of brain tumors. Overall, the proposed framework offers a precise, interpretable, and promising solution to automated brain tumor classification using MRI images.

Keywords:

clinical decision support; deep learning; explainable AI; squeeze-and-excitation; brain tumor classification

1. Introduction

Brain tumors, which are among the most severe neurological disorders, may have a strong and severe impact on the cognitive and physiological processes of the human brain. Early and precise diagnosis plays a critical role in the planning of effective treatment, prognosis evaluation, and patient survival [1,2]. Magnetic resonance imaging (MRI) has become a popular non-invasive imaging modality for identifying and examining brain tumors due to its high-resolution structural brain tissue data. Manual analysis of MRI scans by radiologists is, however, time-consuming and can be subject to variability because tumor structures are complex and modern healthcare systems produce vast amounts of medical imaging data [3,4]. Accordingly, dependable and automated computer-aided diagnostic systems have gained importance in assisting clinical decision-making. The recent developments in deep learning, specifically, convolutional neural networks (CNNs), have greatly enhanced the accuracy of automated brain tumor classification using MRI images [5]. These models can explicitly learn hierarchical feature representations using raw medical images and thus provide high classification rates across various types of tumors. Although these have favorable dynamics, a number of issues are still present. The majority of current works are based on available public datasets and aim to achieve high accuracy, without paying much attention to aspects such as the interpretability of the model, behavior under different image conditions, and inter-dataset generalization. Moreover, deep learning models have been accused of being black box systems, and clinicians often do not know the rationale behind them. Such a lack of transparency may constrain clinical trust and reduce the implementation of automated diagnostic systems in clinical settings. The limitation of the existing literature is that it does not include evaluation on region-specific datasets. The majority of past research used data obtained through a few sources that might not entirely reflect the differences in imaging characteristics in various healthcare settings. Furthermore, most of the literature did not explore robustness to perturbations or perform cross-dataset validation to test whether learned features are useful when applied to new data distributions. Such shortcomings underscore the necessity to develop interpretable, strong, and generalizable deep learning models that can be used effectively under a range of imaging conditions. Inspired by these issues, this paper presents an interpretable deep learning model for brain tumors based on the PMRAM dataset, a Bangladeshi brain MRI dataset with four types of image: glioma, meningioma, pituitary tumor, and normal brain. The proposed framework combines image preprocessing and feature improvement methods with a variety of squeeze-and-excitation (SE)-enhanced convolutional neural network designs. The SE attention mechanism enables the models to focus on informative feature channels adaptively and enhances the capacity of the network to acquire tumor-related patterns from MRI images. Furthermore, explainable AI methods are also used to explain the predictions of the model visually, to enhance transparency and interpretability.

The main contributions of this work can be summarized as follows:

An interpretable deep learning framework for brain tumor classification is developed using the PMRAM Bangladeshi MRI dataset, which is relatively underexplored in the existing research.
Several baseline convolutional neural network architectures, including VGG19, DenseNet201, MobileNetV3-Large, InceptionV3, and EfficientNetB3, are enhanced using squeeze-and-excitation attention modules to improve channel-wise feature representation.
A comprehensive evaluation framework is implemented, including robustness analysis under image perturbations, cross-dataset testing using the Sartaj dataset, and statistical significance analysis to examine model reliability and generalization capability.
Explainable AI techniques such as Grad-CAM++ and saliency maps are integrated, and a real-time web-based application is deployed to enable interactive MRI-based brain tumor prediction with visual explanations.

In general, the framework proposed is a valid, interpretable, and practically implementable solution to automated brain tumor classification from MRI images. This study is a step in the right direction towards creating credible computer-aided diagnostic systems that can help clinicians in brain tumor detection and decision-making as it integrates attention-enhanced deep learning models with explainable AI and comprehensive evaluation strategies. The remainder of this paper is organized as follows: Section 2 reviews the related literature on brain tumor classification using deep learning techniques. Section 3 describes the proposed methodology, including dataset preparation, preprocessing, model architecture, and evaluation metrics. Section 4 presents the experimental results and discussion, while Section 5 concludes the paper and outlines potential future research directions.

2. Related Works

The rapid advancement of machine learning and deep learning techniques has substantially transformed medical data analysis, particularly disease classification tasks using both tabular clinical records and medical imaging modalities [6,7,8,9]. Recent developments in the field of deep learning have significantly enhanced the effectiveness and precision of diagnosing brain tumors using MRI scans. This section reviews recent studies on brain tumor diagnosis using MRI images to analyze model architectures, evaluation strategies, and robustness and interpretability limitations.

Recently, Khaliki et al. [10] conducted an intensive study, constructing CNN and transfer learning models which achieved 98% accuracy on VGG-16. Nonetheless, their work lacked augmentation with data, cross-validation (CV), or explainable AI (XAI), which are critical for model robustness and clinical trust.

A hyperparametric form of CNN proposed by Aamir et al. [11] proportionally increased the accuracy of networks, achieving no less than 97% in validation over three datasets; however, again, it was not subject to XAI or robust CV. The augmented Agarwal et al. [12] model achieved an accuracy of 98.89% through a hybrid InceptionV3 deep learning and Auto Contrast technique but is functionally limited to a single dataset and requires high computing capabilities. Rasheed et al. [13] trained a CNN and used the Gaussian blur sharpening and CLAHE techniques to achieve 97.84% accuracy but did not mention XAI or indicate how deployment should take place or how well it will perform.

Martínez-Del-Río-Ortega et al. [14] reported 97.5% accuracy using CNNs, but their model lacked explainability and deployment frameworks. Krishnan et al. [15] addressed rotational variance with a Rotation-Invariant Vision Transformer (RViT), achieving 98.6% accuracy; however, its high computational cost and absence of CV capabilities limit its practical use. In contrast, Hosny et al. [16] integrated Grad-CAM into an ensemble of DenseNet-121 and Inception-V3, achieving 99.02% accuracy. While they emphasized interpretability, the study lacked robust cross-validation and deployable pipelines.

Continuing to discuss explainability, Musthafa et al. [17] used Grad-CAM in ResNet-50, which yielded 98.52% accuracy and enhanced interpretability. Nonetheless, they did not have a variety of datasets, which implies that they require a more extensive validation. Mathivanan et al. [18] combined CLAHE and data augmentation with Secure-Net, resulting in the Brain Tumor Detection Network (BTDN), which achieved accuracies of 99.68%, 98.81%, and 95.33% on three datasets. The BTDN performed better than six models and thus has good clinical potential. Nhlapho et al. [19] emphasized the need for interpretability through Grad-CAM and Grad-CAM++ and EfficientNetB0 and DenseNet121, achieving a 98% accuracy rate in clinical practice potentials.

Further improvement was made on ensemble methods: Asif et al. [20] demonstrated a new model of combining InceptionV3 and Xception with 98.50% training and 98.30% validation accuracy. Their approach addresses the significant shortcomings of current methods, although it also highlights the problems relating to dataset variety and high computational expenses. Then followed a transition to lightweight and interpretable models: Nahiduzzaman et al. [21] used a hybrid explainable model based on PDSCNN and ridge regression extreme learning machines (RRELM) to achieve a testing accuracy of 99.30% and SHAP-based interpretability with just 0.53M parameters. Also, Saeedi et al. [22] compared the use of convolutional auto-encoders and 2D CNNs as methods of brain tumor detection, attaining 96.47 and 95.63% accuracy, respectively, and showed the importance of deep learning approaches for brain tumor detection compared to conventional ML methods due to better early diagnosis efficacy. Last but not least, Tonni et al. [23] proposed a hybrid transfer learning framework that integrates VGG16 and ResNet152V2, backed by Grad-CAM and SHAP as explainability techniques. Their model achieved an accuracy of 99.47%, which is a promising result that can be implemented in resource-constrained areas, especially in low-income countries.

Research Gaps Identified

Although there has been tremendous growth in the literature on the classification of brain tumors by deep learning, the literature still has several key gaps: Explainable AI (XAI) techniques are not sufficiently incorporated in most existing studies, which obstructs the model transparency and diminishes the clinical trustworthiness. Many of the models are said to be highly accurate, but they have not been strongly cross-validated; therefore, it is doubtful that they will be generalized and reliable on unseen data. Moreover, not all works utilize data augmentation techniques effectively, which can lead to inadequacy in extracting additional information to strengthen the model and address dataset imbalance issues. Other methods are very accurate, but rely on heavy architectures that can overfit and are computationally costly, restricting their practical application in the real world. Additionally, few studies investigate the combination of lightweight CNNs with attention mechanisms or other hybrid models to achieve a balance between performance and efficiency. Other practical issues, such as deployment and integration into clinical processes and interfaces (i.e., web applications), are also not well studied in the literature. In response to these issues, this study develops an interpretable deep learning framework based on squeeze-and-excitation-enhanced convolutional neural networks for brain tumor classification. The proposed framework combines robust preprocessing, attention-enhanced feature learning, explainable AI, cross-validation, robustness analysis, and deployment through a real-time web application.

3. Methodology

In this section, the architecture and working of the proposed brain tumor classification framework are described. Figure 1 represents the architecture of the whole system.

3.1. Data Collection

The dataset used in this study was the PMRAM: Bangladeshi Brain Cancer—MRI Dataset, available at Mendeley Data [24]. It includes 1600 raw single-channel grayscale brain MRI images with equitable classifications into four classes: Glioma, Meningioma, Pituitary, and Normal. The source of the photos is patients from different hospitals in Bangladesh, and all photos are anonymized. This balanced data helped in the corresponding training and testing of the proposed classification model.

3.2. Image Preprocessing and Feature Enhancement

All raw MRI images were resized to a uniform resolution of

224 \times 224

pixels to ensure that the model was fed with similar images. To address the changes in the intensity of different scans, the images were all converted to LAB color space, and Contrast Limited Adaptive Histogram Equalization (CLAHE) was applied only to the Luminance (L) channel with clipLimit = 3.0 and tileGridSize = (4,4). This step enhanced local contrast without over-amplification of noise. The improved L channel was then rejoined to the original A and B channels, and the image was converted back to BGR. Each image was then converted into a PyTorch tensor (values in

[0, 1]

) and transferred to the GPU for faster processing. Gaussian smoothing (kernel size 5,

σ = 1.0

) was applied via depth-wise convolution to reduce high-frequency noise. Unsharp masking was used to recover fine details as follows: the original image was scaled by

1.5

and the blurred version by

0.5

, then they were clamped to

[0, 1]

. Finally, contrast stretching was performed by linearly mapping pixel values to the full

[0, 255]

range based on the per-image minimum and maximum (with

ϵ = 10^{- 8}

to avoid division by zero). The final outcome of these steps was clean, contrast-enhanced, and noise-reduced RGB images, which provide a strong basis for feature extraction and accurate tumor classification. All preprocessing steps were performed strictly on a per-image basis, and no global dataset-level statistics were used, ensuring a leakage-free pipeline.

3.3. Dataset Splitting and Data Augmentation

After preprocessing, the dataset was split into training, validation, and test sets using an 80%–10%–10% ratio. This was done class by class to maintain a balanced distribution across all four categories. The split was performed before any augmentation to ensure that the validation and test sets remained completely unseen during training and contained only original (non-augmented) images. Table 1 shows the number of images per class in each subset after splitting.

Augmentation was applied only to the training set to increase its diversity, help prevent overfitting, and improve the model’s ability to generalize to new data. For every original training image, two additional versions were created by randomly selecting and applying two out of the following three transformations:

Horizontal flip: The image was flipped left-to-right (along the vertical axis).
Random rotation: The image was rotated by a random angle between $- 25 °$ and $+ 25 °$ around its center, using reflection padding at the borders.
Random zoom: The image was scaled by a random factor between 0.8 and 1.2, with appropriate cropping or padding to maintain the original $224 \times 224$ size.

This approach effectively tripled the size of the training set while keeping the validation and test sets unchanged. The final number of images per class in the augmented training set is given in Table 2.

3.4. Baseline Models with Squeeze-and-Excitation Block (SE)

In this work, a number of baseline convolutional neural network models were built on with the squeeze-and-excitation (SE) module to help the model better learn to capture channel-wise feature dependencies. In particular, the SE block was integrated into five popular deep learning models, namely, VGG19, DenseNet201, EfficientNetB3, MobileNetV3-Large, and InceptionV3, to enhance channel-wise feature representation. The squeeze-and-excitation model is a channel-wise feature recalibration which directly models the interdependence of feature channels. It initially combines world-wide spatial data by a squeeze operation with global average pooling and then an excitation process which learns the adaptive weights of channels by using fully connected layers. The learned weights are then used to apply the weights to the feature maps to amplify informative features and de-emphasize irrelevant features. The models can incorporate SE blocks into the underlying architectures to pay more attention to discriminative patterns in MRI images, and this is of great significance in the correct differentiation of various types of brain tumors. Details of the implementation of and architectural changes to each SE-enhanced baseline model are referred to in the following subsections.

3.5. Squeeze-and-Excitation (SE) Attention Mechanism

The squeeze-and-excitation (SE) method improves the representational power of convolutional neural networks by training feature map channel dependencies [25]. It works in two basic steps: the squeeze operation which summarizes spatial data with global average pooling to produce a channel descriptor and the excitation operation which learns channel-wise weights by a small fully connected bottleneck network. These learnt weights are then applied to recalibrate the feature maps by attaching weight to useful channels and de-emphasizing less useful ones. Such adaptive behavior reweighting enables the network to concentrate on more distinguishing patterns that exist in the input data. Figure 2 illustrates the architecture and working principle of the SE block.

3.5.1. SE-VGG19

The SE-VGG19 model is based on the idea of incorporating the squeeze-and-excitation attention system into the VGG19 framework to boost the channel-wise features representation. The convolutional layers make up the network, which consists of 18 convolutional layers that enable the hierarchy of feature extraction with 5 max-pooling layers that gradually decrease the spatial resolution. Moreover, the architecture incorporates 1 fully connected (linear) layer and 1 dropout layer to classify the final results and regularize it. The spatial information is aggregated using an adaptive average pooling layer prior to the classification stage. The addition of one SE block allows channel-wise feature recalibration, which, in turn, allows the model to focus more on important tumor-related patterns and less on the irrelevant activations.

3.5.2. SE-DenseNet201

SE-DenseNet201 is a network that integrates the dense network of DenseNet201 and the channel attention ability that the squeeze-and-excitation mechanism offers. The network has 202 convolutional layers which are closely related to each other to enhance reuse of features and enhance gradient propagation. Moreover, the architecture consists of 3 layers of average pooling, 1 layer of max pooling, and 1 layer of adaptive average pooling to gradually decrease the feature dimensions. Final classification and regularization are performed with the help of a fully connected (linear) layer and a dropout layer. The integrated SE block performs channel-wise feature recalibration, whereby the model focuses on any channel that has significant tumor-related features.

3.5.3. SE-MobileNetV3-Large

The SE-MobileNetV3-Large model is built on the MobileNetV3-Large architecture with an extra squeeze-and-excitation block to enhance channel attention. The network has 64 convolutional layers that can make use of effective depth-wise separable convolutions to lower the computational complexity. Also, 10 layers of pooling, specifically, adaptive average pooling, are employed to summarize the information about spatial features at various network stages. A dropout layer and classification and regularization are also found in the architecture as a fully connected (linear) layer. The SE block inserted provides adaptive channel-wise weighting of feature maps, which enhances the network with the facility to give emphasis to the discriminative imagery features.

3.5.4. SE-InceptionV3

In the SE-InceptionV3 architecture, the squeeze-and-excitation attention mechanism is added to the InceptionV3 architecture to improve channel-wise feature learning. The model has 98 convolutional layers which are arranged in inception modules which capture multi-scale spatial features. Additionally, the architecture has 2 max-pooling layers and 3 adaptive average pooling layers for gradual spatial dimensions. Classification and regularization are performed using three fully connected (linear) layers and two dropout layers. The feature channels can be recalibrated when a single SE block is added, which enables better highlighting of informative tumor-related patterns by the network.

3.5.5. Proposed SE-EfficientNetB3

In this work, the EfficientNetB3 architecture is improved by adding another attention module (squeeze-and-excitation, SE) to emphasize the channel-wise representation of features. EfficientNetB3 is a convolutional neural network and is made up of Mobile Inverted Bottleneck Convolution (MBConv) blocks which are designed to extract hierarchical features of input images in an efficient way. Although the original EfficientNetB3 incorporates SE blocks within its MBConv layers, we added an additional SE module after the feature extraction stage for further channel recalibration. The SE mechanism can perform global feature aggregation by using a squeeze operation and an excitation step which learns the relative importance of the individual channels. This process can help the network to highlight more information feature channels and suppress less relevant ones and facilitate the model to concentrate on the tumor-specific patterns of brain MRI images. The architecture is fed with an input image of size

300 \times 300 \times 3

, which is then passed through a series of convolutional layers and MBConv blocks with varying kernel sizes to extract deep semantic features. The obtained feature maps are subsequently subjected to the further SE module to optimize the importance of channel features and sent to the classification phase. Lastly, spatial information is aggregated by use of adaptive average pooling, which is then followed by a dropout layer and a fully connected layer to generate predictions of the four image types, namely, Glioma, Meningioma, Pituitary, and Normal. A total of 163 key layers are included in the proposed SE-EfficientNetB3 architecture: 132 Conv2d layers, 1 linear layer, 1 dropout layer, 28 AdaptiveAvgPool2d layers, and 1 SEBlock. Figure 3 shows the architecture of the proposed SE-EfficientNetB3 model used for brain tumor classification.

3.5.6. Training Settings

Table 3 summarizes the training configuration used for all SE-enhanced deep learning models evaluated in this study. To ensure a fair comparison among architectures, identical training settings were applied to all evaluated models.

3.6. Explainable AI (Grad-CAM++ and Saliency Maps)

Having chosen the most effective model, we applied Grad-CAM++ and saliency maps to understand what exactly the network was focusing on in the MRI images when we made its predictions. The heatmaps obtained by such methods demonstrate the most significant areas per each class and also reveal whether the model is targeting the real tumor area or whether it is being distracted by normal brain tissue or the background. Such visualization will help create trust in the results and provide doctors and researchers with a better understanding of what the model has studied based on the images.

3.7. Demo Web Application

The demo web application developed using the Python library Gradio is a convenient tool for developing machine learning interfaces. It was implemented on the Hugging Face platform, which allows simple access and scaling. This web app also enables users to upload MRI images and obtain a real-time classification outcome accompanied by a visual explanation generated by Grad-CAM. This visual explanation, which medical professionals and researchers can easily view, provides a user-friendly interface.

3.8. Cross-Dataset Testing

In order to further test the generalization ability of the proposed model, a cross-dataset experiment was performed. Here, the SE-enhanced EfficientNetB3 was directly trained with the PMRAM dataset and directly evaluated on the Sartaj dataset with no further retraining or fine-tuning. The aims of this experiment were to determine the extent to which the learned features were transferable to an independent dataset and to evaluate the extent to which the model remains a reliable classifier when it is presented with data it has never been shown before.

3.9. Evaluation Metrics

To thoroughly assess the performance of the evaluated model for brain tumor classification, several key evaluation metrics were employed:

Accuracy: Accuracy represents the proportion of correctly classified instances over the total number of predictions. It is defined as

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(1)

Precision: Precision measures the proportion of true-positive predictions among all positive predictions made by the model:

Precision = \frac{T P}{T P + F P}

(2)

Recall (sensitivity): Recall measures the model’s ability to identify all relevant cases (true positives) correctly:

Recall = \frac{T P}{T P + F N}

(3)

F1-score: F1-score is the harmonic mean of precision and recall, providing a balance between them:

F 1 - Score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(4)

Cohen’s Kappa (

κ

): This metric evaluates the agreement between predicted and actual labels, accounting for chance agreement:

κ = \frac{p_{o} - p_{e}}{1 - p_{e}}

(5)

where

p_{o}

is the observed agreement and

p_{e}

is the expected agreement by chance.

Confusion matrix: A confusion matrix summarizes prediction results, showing the counts of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) for each class. It is used to derive the above metrics and understand per-class performance.

Area Under the ROC Curve (AUC-ROC): AUC measures a model’s ability to distinguish between classes. A higher AUC indicates better classification performance.

Wilcoxon Signed-Rank Test: A non-parametric statistical test used to compare two related samples, such as the prediction accuracies of two models applied to the same dataset, to assess whether their population mean ranks differ. It is beneficial when the data does not follow a normal distribution. The test ranks the absolute differences between paired observations, then analyzes the signs of these ranks to compute the test statistic:

W = min (W^{+}, W^{-})

(6)

where

W^{+}

and

W^{-}

are the sums of the positive and negative signed ranks, respectively.

These evaluation metrics collectively provide a comprehensive view of the proposed model’s effectiveness, highlighting not only its accuracy but also its reliability, robustness, and statistical significance in brain tumor classification.

4. Results and Discussion

4.1. Comparative Evaluation of Model Performance and Detailed Classification Metrics

As shown in Table 4, all SE-enhanced models achieved strong classification performance, confirming the effectiveness of channel attention mechanisms in brain tumor MRI analysis. EfficientNetB3 achieved the highest overall performance, with an accuracy of 98.70%, weighted precision of 98.77, weighted recall of 98.70, and weighted F1-score of 98.70. Such a high performance suggests that EfficientNetB3 is the most efficient model for learning discriminative and strong feature representations from MRI images. DenseNet201 was also very successful, with an accuracy of 98.05% and a weighted F1-score of 98.05%, indicating its strong ability to propagate and reuse features. MobileNetV3-Large was also very good, achieving 97.40% accuracy and a 97.40% weighted F1-score, which shows that computationally efficient models can still be highly effective. VGG19 achieved an accuracy of 96.75%, demonstrating that conventional deep CNN architectures can still compete in this task. InceptionV3 performed the worst among the tested models, with an accuracy of 96.10% and a weighted F1-score of 96.08%, although its performance remained satisfactory. The narrow difference between the models indicates that SE-based channel recalibration consistently enhanced the feature representation across different architectures. However, the clear advantage of EfficientNetB3 suggests that its systematic scaling of network depth, width, and resolution facilitated more efficient extraction of tumor-relevant patterns. These results show that, although all models were capable of classifying brain tumors with high reliability, EfficientNetB3 offered the most accurate and balanced performance on the studied dataset.

4.2. Analysis of Accuracy and Loss Curve

Figure 4 presents the training and validation loss and accuracy curves for the evaluated deep learning models across training epochs. Such curves help us to understand how each architecture converges, whether it is stable in learning or not, and what kind of generalization it achieves. In the case of VGG19, the training and validation loss values follow a steady negative tendency within the first few epochs, whereas the accuracy curves grow at a high rate and stabilize at a high level. The slight difference between the training and validation measures implies consistent convergence and minor overfitting. The faster convergence of DenseNet201 with SE attention is more favorable than that of other models, as it can reach near-optimal validation accuracy in a few epochs. The fact that the training and validation curves are very close shows that the features propagate well and the generalization performance is strong. The training dynamics of InceptionV3 are also stable, with a gradual reduction in loss and a gradual increase in classification accuracy. On the same note, MobileNetV3-Large can attain high accuracy with comparatively smooth convergence, which suggests that lightweight architectures can retain strong representational power when enhanced with attention mechanisms. EfficientNetB3 shows more significant fluctuations in the validation loss curve at some epochs, which can be explained by its deeper architecture and sensitivity to optimization dynamics. However, the accuracy curves remain consistently high, which proves the capability of the model to learn discriminative features from MRI images. In general, the curves show that all models achieve successful convergence and reach high classification performance. The small differences between training and validation metrics across most architectures indicate that the adopted preprocessing, augmentation, and regularization techniques help enhance model generalization and maintain stable learning behavior.

4.3. Statistical Performance Analysis and Reliability Assessment

A statistical comparison of the tested deep learning models is given in Table 5 and employed a set of reliability and calibration measures including the Matthews Correlation Coefficient (MCC), Brier score, positive predictive value (PPV), negative predictive value (NPV), sensitivity, specificity, and confidence intervals. EfficientNetB3 was found to be the best-performing model overall, with the highest MCC value of 0.9829 and the lowest Brier value of 0.0045, which means that it is more reliable in terms of classification and provides well-calibrated probability estimates. Also, EfficientNetB3 recorded the highest values for the PPV, NPV, sensitivity, and specificity, showing that it has a great ability to accurately detect both non-tumor and tumor cases. DenseNet201 also performed well, with an MCC of 0.9741 and a Brier score of 0.0078, which indicates a consistent predictive performance as well as stable predictive results that are reliable in probabilities. The results of MobileNetV3-Large also demonstrated good performance, with an MCC equal to 0.9655 and average sensitivity and specificity rates, which demonstrate the efficiency of lightweight models in medical image analysis applications. VGG19 had an MCC of 0.9572 and a Brier score of 0.0110, which means that it is a reliable classifier despite its relatively less complex structure. Conversely, InceptionV3 had marginally lower statistical results, with an MCC of 0.9482 and a greater Brier score of 0.0155, and yet its general predictive performance was satisfactory. The confidence intervals also verify the strength of the tested models, as EfficientNetB3 and DenseNet201 have the smallest and the largest confidence ranges, respectively. On the whole, these findings indicate that EfficientNetB3 is the most reliable and well calibrated in predicting brain tumor classification, and all models considered achieved good statistical performance and generalization ability.

To further validate the statistical reliability of the obtained results, additional significance tests were performed using the Chi-square and Friedman non-parametric tests, as summarized in Table 6. The Chi-square test evaluates the dependency between predicted labels and the true class distributions, while the Friedman test examines whether statistically significant performance differences exist across classes. As shown in Table 6, all evaluated models produced very large Chi-square statistics with extremely small p-values (p < 0.001), confirming a strong association between predicted and actual class labels. Among the evaluated architectures, EfficientNetB3 achieved the highest Chi-square statistic (446.4130) with the smallest p-value (

1.68 \times 10^{- 90}

), indicating the strongest agreement between predicted labels and the ground truth classes. This result suggests that EfficientNetB3 provides the most statistically reliable classification outcomes among the compared models. DenseNet201 and MobileNetV3-Large also produced high Chi-square values of 438.4264 and 430.8195, respectively, confirming their strong predictive capability, while VGG19 and InceptionV3 exhibited slightly lower but still highly significant results. The Friedman test results further indicate statistically significant differences in class-wise prediction performance across the evaluated models. VGG19 and DenseNet201 produced relatively higher Friedman statistics, reflecting greater variation across tumor classes. However, EfficientNetB3 maintained statistically significant results while demonstrating strong predictive stability across the classes. This behavior suggests that EfficientNetB3 not only achieved high predictive accuracy but also maintained consistent performance across different tumor categories. Overall, the extremely small p-values obtained from both statistical tests confirm that the classification outcomes of all evaluated models were statistically significant and unlikely to occur by chance. Notably, the superior Chi-square statistic obtained by EfficientNetB3 further highlights its robustness and reliability, supporting its effectiveness as the best-performing model for brain tumor MRI classification in this study.

We also conducted McNemar’s test to determine whether the differences in the predictions of the EfficientNetB3 and other SE-enhanced models were statistically significant (Table 7). The results show that none of the comparisons was statistically significant at the 0.05 level. This implies that, even though EfficientNetB3 has the best overall performance, its improvements over VGG19 + SE, DenseNet201 + SE, InceptionV3 + SE, and MobileNetV3-Large + SE were not statistically significant. This outcome can be attributed to the relatively small performance gap between the models; the differences in prediction outcomes were minimal. As a result, the number of discordant predictions between models was still limited, and the results of the McNemar test were non-significant. Therefore, the competing models show a similar behavior in terms of prediction, and EfficientNetB3 is the most stable and best-performing model regarding the observed evaluation metrics.

4.4. Analysis of Computational Cost

The computational efficiency of the tested models in terms of inference time, training time, GPU memory, and required RAM is shown in Figure 5. All in all, the inference latency and memory requirement in the GPUs of all the models are relatively small, indicating that the architectures can be utilized in realistic and practical clinical settings. Of the discussed models, MobileNetV3-Large is the most computationally efficient with the lowest inference time, the shortest training time, and the least used GPU memory, which is especially applicable to resource-constrained systems. Conversely, EfficientNetB3 takes more time to train because it has a deeper and more elaborate architecture which, in past settings, had the highest predictive performance. DenseNet201 and VGG19 have moderate computational requirements and do not have unstable performance. Meanwhile, InceptionV3 exhibits more resource consumption, in terms of both GPU and RAM usage, which implies that the model requires more significant resource consumption during execution. Generally, the findings point to the natural trade-off between computational efficiency and predictive performance, where models with low resource use and computational speed characterize lightweight architectures and more resource-intensive models are characterized by greater classification ability.

4.5. Analysis of Confusion Matrix

Figure 6 presents the confusion matrices obtained from the test dataset for five deep learning architectures: EfficientNetB3, MobileNetV3 Large, InceptionV3, DenseNet201, and VGG19. The matrices show the performance of the classification on the four types of brain images, namely, Glioma, Meningioma, Normal, and Pituitary. The diagonal elements denote the samples that were properly classified, and the off-diagonal elements denote the samples that were wrongly classified. In general, the majority of predictions are centered on the diagonal of the matrices, which means all models gave high and effective classification performance. Among the tested models, EfficientNetB3 had the least unstable results with the lowest misclassification among all types of tumors. This model has been shown to correctly place most of the samples in their respective classes with only a very few errors. The Normal class was perfectly estimated in all the models and all samples were predicted appropriately. This implies that the dataset of the Normal brain MRI images is very distinct from those of the tumor classes.

In the same way, the Pituitary tumor class was classified with extremely high classification accuracy, and only a few misclassifications were observed in some architectures. In the Glioma category, EfficientNetB3 correctly classified all the samples except two cases which it erroneously classified as Meningioma. The same trends were observed in other models, but InceptionV3 indicated slightly more confusion with respect to this category, with some predictions spread across other groups. This is an indication that glioma may share some of the visual features of other types of tumors, especially meningioma. Minor misclassification was also evident in several models for the Meningioma class. MobileNetV3, DenseNet201, and VGG19 predicted some samples as Glioma, which indicates that there is a moderate overlap of visual features between these two types of tumors.

In general, the analysis of the confusion matrix supports the idea that the investigated deep learning models can be successfully used to differentiate between various types of brain tumors. EfficientNetB3 was the most stable and reliable and had the lowest misclassification rate, while the remaining architectures also demonstrated strong classification capability with slight confusion primarily between the Glioma and Meningioma classes.

4.6. Identification of the Best-Performing Model

According to the extensive experimental analysis, the SE-improved EfficientNetB3 model is the most successful system among the deep learning networks for brain tumor MRI classification. The comparative findings indicate that strong predictive performance was achieved by all of the SE-enhanced architectures, but EfficientNetB3 continued to perform better than the rest of the models in various evaluation metrics, such as accuracy, precision, recall, F1-score, and several statistical reliability measures. EfficientNetB3 recorded the highest classification accuracy of 98.70% and the highest values in weighted precision, recall, and F1-score. The performance advantage means that this architecture is specifically efficient in learning discriminative feature representations from brain MRI images. The model captures both lower-level structural details and higher-level semantic features because the compound scaling strategy of EfficientNet achieves a trade-off between network depth, width, and input resolution to simultaneously balance fine structural features and higher-level semantics. The network can further enhance its capacity to highlight informative feature channels related to tumor structures when used in conjunction with SE attention modules and minimize the effect of irrelevant information. The statistical analysis also supports the excellence of the SE-enhanced EfficientNetB3 model. It had the largest Matthews Correlation Coefficient (MCC) and the smallest Brier score when compared with all of the tested architectures, which implies high classification reliability and well-calibrated probability predictions. Moreover, EfficientNetB3 yielded the best values for the positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity, thus proving its performance in correctly distinguishing between tumor and non-tumor classes. The narrow confidence interval associated with this model also indicates that the model is predictive across the evaluated samples. The reliability of the obtained predictions is also supported by statistical significance tests. The Chi-square test produced very large statistics with very small p-values in all the tested models, which validates a strong relationship between the predicted labels and the true class distributions. EfficientNetB3 obtained the largest Chi-square value and the lowest p-value, which indicates the highest agreement between predicted labels and ground truth classes. Likewise, the Friedman non-parametric test also indicated statistically significant differences across tumor classes, and EfficientNetB3 showed consistent prediction performance across the tested categories. The confusion matrix analysis provides further evidence of the better classification capability of the model. The majority of predictions in the evaluated architectures are concentrated along the diagonal in the matrices, meaning they were correct classifications. Nevertheless, EfficientNetB3 had the fewest misclassifications among the models. In the case of Glioma and Meningioma, some confusion was observed, which can be explained by the visual resemblance of these tumor types in MRI images. Nevertheless, across all tumor types, EfficientNetB3 demonstrated the best consistency of classification behavior, with low inference time and memory consumption. Slight increases in computational cost were compensated by its superior predictive performance and statistical stability. The given trade-off reveals the benefit of more sophisticated architectures in deriving complex tumor-related patterns from medical imaging data. In general, the experimental findings make it clear that the SE-enhanced EfficientNetB3 model offers the highest accuracy, reliability, and balanced performance among all the considered architectures in the context of brain tumor MRI classification. These results suggest that the suggested model can be used as an effective automated decision support system to help clinicians with brain tumor diagnosis.

4.6.1. Discrimination and Confidence Analysis of EfficientNetB3

Figure 7 presents the Receiver Operating Characteristic (ROC) curves, precision–recall (PR) curves, and confidence distribution histogram for the SE-enhanced EfficientNetB3 model. These plots give further insight into the discrimination potential and prediction confidence of the model. The ROC curves show an excellent separation of classes of all the tumor groups. The AUC values are very high, with Glioma scoring 0.9973, Meningioma 0.9984, and Normal and Pituitary classes 1.0000. The curves are placed near the upper-left part of the plot and indicate a high true-positive rate with extremely low numbers of false positives. In a similar manner, the precision-recall curves reveal a high PR-AUC value and solid classification performance in all classes. Glioma and Meningioma attain PR-AUC scores of 0.9936 and 0.9945, respectively, with Normal and Pituitary again attaining 1.0000. This implies that the model is very accurate at the expense of high recall. The histogram of the confidence distribution indicates that the majority of predictions have values close to 1.0, suggesting that the model makes highly confident predictions for most samples. Comprehensively, these findings support once again the high discriminative power and stability of the SE-enhanced EfficientNetB3 model.

4.6.2. Generalization Analysis Using 5-Fold Cross-Validation

The generalization ability of the proposed SE-enhanced EfficientNetB3 model was further tested using a 5-fold cross-validation experiment. The data was split into five equal subsets, in which, at every iteration, four folds were used to train and one to test. This was done five times such that each fold was the test set one time.

Table 8 demonstrates that the SE-enhanced EfficientNetB3 model demonstrated a high mean accuracy of 96.89% and a low standard deviation, thus showing consistency in different data splits. The large value of Cohen’s Kappa and the Matthews Correlation Coefficient also confirm that there is strong similarity between the predicted and true labels. Even though small differences can be noticed within folds, the high average values of accuracy prove that the model has a good level of generalization and stable results when trying to classify data in different subsets.

4.6.3. Robustness Analysis of SE-Enhanced EfficientNetB3 Under Image Perturbations

Table 9 presents the evaluation of the robustness of the EfficientNetB3 model under different perturbation strategies applied to the test data. The findings suggest that the model can continue to achieve high classification accuracy in cases when the input images are distorted under different transformation conditions. The accuracy of the test of the evaluated conditions reveals that the intensity shift method had the highest test accuracy of 98.70%, implying that EfficientNetB3 can withstand moderate changes in the brightness and contrast of images. The CutMix patch mixing method had a slightly lower accuracy of 96.75%, indicating that adding areas of different images brings with it more complexity for classification, but the model continued to perform highly. In the same way, the pixel erase method recorded a 96.10% test accuracy, which proves that EfficientNetB3 can still classify tumor types accurately when some parts of the image have been deleted. In general, the findings support the high robustness and ability to generalize of the SE-enhanced EfficientNetB3 when it is subjected to various image perturbations.

4.6.4. Seed-Wise Performance Stability and Reproducibility Analysis of SE-Enhanced EfficientNetB3

The best overall classification performance in terms of accuracy and statistical reliability was attained by EfficientNetB3 of all the evaluated models. To further investigate the robustness and reproducibility of this best-performing architecture, further experiments were performed with different random initialization seeds. As shown in Table 10, the SE-enhanced EfficientNetB3 model showed a very consistent performance among five independent runs. The evaluation in terms of the seeds shows that the test accuracy was tightly bounded between 98.05% and 98.70% and other performance measures (precision, recall, F1-score, and MCC) also exhibited a stable behavior. The low standard deviation (e.g., ±0.26 for accuracy and ±0.0034 for MCC) suggests low sensitivity to random initialization. This consistency gives confidence that the SE-enhanced EfficientNetB3 model has stable performance in prediction and does not depend much on stochastic variations in the training process. Such behavior demonstrates the robustness and reliability of the proposed model, which strengthens its usefulness for correct brain tumor classification in a practical environment.

4.6.5. Ablation Study of EfficientNetB3 Without SE

To evaluate the contribution of the squeeze-and-excitation (SE) attention mechanism, an ablation experiment was conducted using the baseline EfficientNetB3 architecture without the SE modules. The model was trained and evaluated under the same experimental conditions to analyze the impact of removing channel attention on classification performance.

As shown in Table 11, the EfficientNetB3 model without SE modules achieved an overall accuracy of 96.10%. Although the model still demonstrated strong performance, the results were lower than those obtained with the SE-enhanced EfficientNetB3 architecture. This comparison highlights the effectiveness of SE attention in improving feature representation and classification performance for brain tumor MRI images.

4.7. Interpretability Analysis of SE-Enhanced EfficientNetB3 Using Grad-CAM++ and Saliency Maps

Figure 8 illustrates the interpretability analysis of the SE-enhanced EfficientNetB3 model using Grad-CAM++ and saliency map visualization techniques. These explainable artificial intelligence processes were used to identify the areas of the MRI images that made the most contribution to the classification decisions made by the model. The Grad-CAM++ visualizations show that the model concentrates on the tumor areas in the brain MRI images. The areas of the heatmap (highlighted in red and yellow) clearly represent the tumor locations in the cases of meningioma, glioma, and pituitary tumor. This means that the model learns useful tumor-related features and not irrelevant background data. In the same way, the saliency maps also verify that the model focuses on significant pixel areas in the brain region, especially those around the tumor structures. The saliency maps show bright highlighted areas that represent the locations that have a significant impact on the prediction outcome. To further confirm the reliability of these explanations, the highlighted regions were visually inspected and compared with the anticipated locations of the tumors. Domain experts also reviewed the visual explanations and found that the model consistently focused on clinically relevant brain regions related to the presence of tumors. This consistency between the model explanations and the expert observations indicates that the SE-enhanced EfficientNetB3 model provides interpretable and reliable results for brain tumor classification.

4.8. Performance Comparison with State-of-the-Art Transformer Architectures

Table 12 presents the classification accuracy of several state-of-the-art Transformer-based architectures evaluated on the same dataset. As can be seen, the Transformer models showed relatively poorer performance in comparison to the tested convolutional architectures. Among the Transformer-based methods, PoolFormer-S36 (MetaFormer) was the most accurate, at 97.40%, then DeiT-Base at 96.10%. Swin-Tiny and ConvNeXt-Tiny, on the contrary, had lower accuracies of 86.36% and 89.61%, respectively. Even though Transformer architectures have high representation ability, they cannot perform as well as the most popular convolutional models in this research. This tendency can be explained by the fact that the size of the dataset is rather small, and convolutional networks tend to be more efficient at extracting local spatial features in medical images. Therefore, the experimental findings indicate that CNN-based architectures, and, in particular, the SE-enhanced EfficientNetB3 model, have more reliable and accurate classification behavior for brain tumor MRI analysis than the evaluated Transformer-based models.

4.9. Cross-Dataset Evaluation with Sartaj Dataset

The cross-dataset evaluation of the suggested model was performed on the Sartaj brain tumor dataset to explore the generalization ability of the given model further. The SE-enhanced EfficientNetB3 model was trained on the PMRAM dataset and directly applied to the Sartaj dataset and not retrained or fine-tuned in this experiment. This test environment presented a more challenging situation since the training and testing samples were based on different datasets and thus might differ in terms of imaging features, acquisition conditions, and underlying data distributions. According to the normalized confusion matrix in Figure 9 that was attained after this cross-dataset assessment, it is evident that the model still retained a high level of classification in most of the tumor types. The Glioma, Normal and Pituitary classes demonstrated very high correct classification rates, which suggests that the model is able to learn representative tumor-related features, which are consistent across the datasets. Nevertheless, a relatively greater amount of misclassification was observed in the Meningioma class, with some of the samples being predicted to be Glioma or Pituitary. Such behavior can be explained by the similarity of the structure of some tumor patterns in MRI images and differences in imaging properties between the PMRAM and Sartaj datasets. The disparities in the acquisition protocol, scanner settings, and annotation practices could have caused distribution shifts, which have a specific impact on the Meningioma category. It should be mentioned that the model was trained only on the PMRAM dataset and tested on a separate dataset without any adaptation. In these circumstances, slight performance deterioration of some classes will occur. However, the model still displayed a good classification behavior for the rest of the tumor types. So, the findings suggest that the SE-enhanced EfficientNetB3 model remained stable and reliable in the case of a cross-dataset evaluation but reflects the intrinsic difficulties related to the variability of the dataset.

4.10. Online Visualization and Prediction Tool

Figure 10 demonstrates the web application deployed to classify brain tumors interactively. The user can upload MRI images, and the system automatically predicts cancer, producing Grad-CAM heatmaps that outline the critical areas informing the model’s decision-making process. The figure illustrates the accurate predictions of both a routine brain image scan and a glioma case, demonstrating the model’s accuracy across different classes. The average time required to handle and forecast a new image with its help is 2–4 s, which is relatively fast and convenient. Its two outputs of classification and explainability contribute to trust and user confidence as well as ease of clinical validation. The interface itself is comfortable and allows for easy work and the exporting of results, which can be beneficial to clinicians or researchers, facilitating quick diagnosis and joint discussions. This deployment represents a typical transition of a model to applied practice, indicating its potential to support real-world brain tumor assessment workflows and their effectiveness. The web application can be accessed at https://huggingface.co/spaces/polash7899/Brain_tumor_mri_mdpi (accessed on 5 December 2025).

4.11. Comparative Analysis with Previous Studies

Table 13 compares recent studies on classifying brain tumors in terms of their use of datasets, optimal-performing model, classification success, explainable artificial intelligence (XAI) usage, cross-dataset testing, and a demonstration application running in real-time. As demonstrated in the table, a number of studies demonstrated very high classification rates, in many cases over 98% when tested on popular datasets like Figshare, Sartaj, and Br35H. It should, however, be noted that the datasets employed in these studies varied in terms of size, composition, and imaging features. As such, the results that were reported cannot be directly compared, since each of the models was tested in various experimental conditions and data distributions. Unlike most of the current literature, which is mainly concerned with accuracy measurement on a given dataset, the proposed study offers a more holistic landscape of model performance. In addition to classification accuracy, the research also incorporated robustness testing in various image perturbation settings, cross-dataset testing to check the generalization behavior, and interpretability testing based on explainable AI methods. Moreover, the computational properties of the model such as inference behavior and considerations of the efficiency were examined to give information about the practical deployment considerations of the proposed architecture. In spite of the fact that the described accuracy of the SE-enhanced EfficientNetB3 model can be compared to a few other recent models, the current work focused on a more comprehensive assessment framework, which includes a performance analysis, robustness testing, interpretability assessment, and cross-dataset validation. As the PMRAM dataset is not a well-researched topic in earlier works, the presented experimental results in this work also create a reference point to be used in future studies carried out on the same dataset. All together, the suggested framework adds a more detailed evaluation pipeline, and it shows a competitive classification performance.

5. Conclusions

This paper introduced an interpretable deep learning architecture for automatic brain tumor classification using MRI images from the PMRAM Bangladeshi brain tumor dataset. The proposed pipeline combines image preprocessing and feature enhancement with several squeeze-and-excitation (SE)-enhanced convolutional neural network architectures. Among the evaluated models, the SE-enhanced EfficientNetB3 achieved the best performance, with an accuracy of 98.70%, along with high precision, recall, and F1-score values. Further statistical evaluation, robustness analysis under image perturbations, and cross-dataset testing further supported the reliability and consistency of the model. The integration of Grad-CAM++ and saliency maps provided visual explanations of the model’s predictions, allowing clear identification of tumor-related regions in MRI images. Moreover, a web-based application was developed to enable real-time prediction and visual explanation, demonstrating the practical usability of the proposed framework as a potential clinical decision support tool. Despite these promising results, several limitations should be acknowledged: The analysis was conducted using a publicly available dataset that lacks patient-level identifiers and longitudinal information. Consequently, patient-specific analysis or subject-wise validation could not be performed, which may have affected the reliability of the evaluation. Future work will focus on evaluating the proposed framework using larger multi-institutional clinical datasets with patient-level annotations. In addition, incorporating multimodal imaging data and clinically validated deployment pipelines may further improve the robustness and practical applicability of automated brain tumor diagnosis systems.

Author Contributions

Conceptualization, J.U. and M.E.H.; methodology, M.S.H.P., M.T.H.S. and M.E.H.; software, M.S.H.P. and M.T.H.S.; validation, M.E.H., M.Z. and M.M.; formal analysis, M.S.H.P. and M.T.H.S.; investigation, M.T.H.S. and M.Z.; resources, J.U.; visualization, M.E.H.; writing—original draft preparation, M.S.H.P., M.T.H.S. and M.E.H.; writing—review and editing, M.M., M.Z. and J.U.; supervision, J.U. and M.Z.; project administration, J.U.; funding acquisition, J.U. and M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by Woosong University Academic Research 2026.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The dataset used in this study is publicly available at Mendeley Data at https://data.mendeley.com/datasets/m7w55sw88b/1 (accessed on 5 December 2025) (DOI: 10.17632/m7w55sw88b.1) under the title “PMRAM: Bangladeshi Brain Cancer—MRI Dataset”.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chieffo, D.P.R.; Lino, F.; Ferrarese, D.; Belella, D.; Della Pepa, G.M.; Doglietto, F. Brain Tumor at Diagnosis: From Cognition and Behavior to Quality of Life. Diagnostics 2023, 13, 541. [Google Scholar] [CrossRef] [PubMed]
Bar-Letkiewicz, I.; Pieczyńska, A.; Dudzic, M.; Szkudlarek, M.; Adamska, K.; Hojan, K. Advanced Neuroimaging and Emerging Systemic Therapies in Glioblastoma: Current Evidence and Future Directions. Biomedicines 2025, 13, 2597. [Google Scholar] [CrossRef]
Alyami, J. Computer-aided analysis of radiological images for cancer diagnosis: Performance analysis on benchmark datasets, challenges, and directions. EJNMMI Rep. 2024, 8, 7. [Google Scholar] [CrossRef]
Khan, R.; Taj, S.; Khan, Z.U.; Khan, S.U.; Khan, J.; Arshad, T.; Ayouni, S. High-precision brain tumor classification from MRI images using an advanced hybrid deep learning method with minimal radiation exposure. J. Radiat. Res. Appl. Sci. 2025, 18, 101858. [Google Scholar] [CrossRef]
Sumona, R.B.; Biswas, J.P.; Shafkat, A.; Rahman, M.M.; Faruk, M.O.; Majeed, Y. An Integrated Deep Learning Approach for Enhancing Brain Tumor Diagnosis. Healthc. Anal. 2025, 8, 100421. [Google Scholar] [CrossRef]
Haque, M.E.; Nurul Absur, M.; Al Farid, F.; Uddin, J.; Abdul Karim, H. A novel interpretable and real-time dengue prediction framework using clinical blood parameters with genetic and GAN-based optimization. Front. Artif. Intell. 2025, 8, 1626699. [Google Scholar] [CrossRef]
Anaya-Isaza, A.; Mera-Jiménez, L.; Zequera-Diaz, M. An overview of deep learning in medical imaging. Inform. Med. Unlocked 2021, 26, 100723. [Google Scholar] [CrossRef]
Saykat, T.H.; Al Emon, M.; Al-Imran, M.; Haque, M.E. Machine Learning and Explainable AI for Predicting Intubation Needs in an Intensive Care Unit. In Proceedings of the 2025 6th International Conference on Big Data Analytics and Practices (IBDAP), Chiang Mai, Thailand, 1–3 August 2025; pp. 227–232. [Google Scholar] [CrossRef]
Rahman, T.; Islam, M.S.; Uddin, J. MRI-Based Brain Tumor Classification Using a Dilated Parallel Deep Convolutional Neural Network. Digital 2024, 4, 529–554. [Google Scholar] [CrossRef]
Khaliki, M.; Başarslan, M. Brain tumor detection from images and comparison with transfer learning methods and 3-layer CNN. Sci. Rep. 2024, 14, 2664. [Google Scholar] [CrossRef] [PubMed]
Aamir, M.; Namoun, A.; Munir, S.; Aljohani, N.; Alanazi, M.H.; Alsahafi, Y.; Alotibi, F. Brain Tumor Detection and Classification Using an Optimized Convolutional Neural Network. Diagnostics 2024, 14, 1714. [Google Scholar] [CrossRef]
Agarwal, M.; Rani, G.; Kumar, A.; Kumar, P.K.; Manikandan, R.; Gandomi, A.H. Deep learning for enhanced brain tumor detection and classification. Results Eng. 2024, 22, 102117. [Google Scholar] [CrossRef]
Rasheed, Z.; Ma, Y.K.; Ullah, I.; Ghadi, Y.Y.; Khan, M.Z.; Khan, M.A.; Abdusalomov, A.; Alqahtani, F.; Shehata, A.M. Brain Tumor Classification from MRI Using Image Enhancement and Convolutional Neural Network Techniques. Brain Sci. 2023, 13, 1320. [Google Scholar] [CrossRef] [PubMed]
Martínez-Del-Río-Ortega, R.; Civit-Masot, J.; Luna-Perejón, F.; Domínguez-Morales, M. Brain Tumor Detection Using Magnetic Resonance Imaging and Convolutional Neural Networks. Big Data Cogn. Comput. 2024, 8, 123. [Google Scholar] [CrossRef]
Krishnan, P.T.; Krishnadoss, P.; Khandelwal, M.; Gupta, D.; Nihaal, A.; Kumar, T.S. Enhancing brain tumor detection in MRI with a rotation invariant Vision Transformer. Front. Neuroinform. 2024, 18, 1414925. [Google Scholar] [CrossRef] [PubMed]
Hosny, K.M.; Mohammed, M.A.; Salama, R.A.; Elshewey, A.M. Explainable ensemble deep learning-based model for brain tumor detection and classification. Neural Comput. Appl. 2024, 37, 1289–1306. [Google Scholar] [CrossRef]
Mohamed Musthafa, M.; Mahesh, T.R.; Vinoth Kumar, V.; Guluwadi, S. Enhancing brain tumor detection in MRI images through explainable AI using Grad-CAM with Resnet 50. BMC Med. Imaging 2024, 24, 107. [Google Scholar] [CrossRef]
Mathivanan, S.K.; Srinivasan, S.; Koti, M.S.; Kushwah, V.S.; Joseph, R.B.; Shah, M.A. A secure hybrid deep learning framework for brain tumor detection and classification. J. Big Data 2025, 12, 72. [Google Scholar] [CrossRef]
Nhlapho, W.; Atemkeng, M.; Brima, Y.; Ndogmo, J.C. Bridging the Gap: Exploring Interpretability in Deep Learning Models for Brain Tumor Detection and Diagnosis from MRI Images. Information 2024, 15, 182. [Google Scholar] [CrossRef]
Asif, R.N.; Naseem, M.T.; Ahmad, M.; Mazhar, T.; Khan, M.A.; Khan, M.A.; Al-Rasheed, A.; Hamam, H. Brain tumor detection empowered with ensemble deep learning approaches from MRI scan images. Sci. Rep. 2025, 15, 15002. [Google Scholar] [CrossRef]
Nahiduzzaman, M.; Abdulrazak, L.F.; Kibria, H.B.; Khandakar, A.; Ayari, M.A.; Ahamed, M.F.; Ahsan, M.; Haider, J.; Moni, M.A.; Kowalski, M. A hybrid explainable model based on advanced machine learning and deep learning models for classifying brain tumors using MRI images. Sci. Rep. 2025, 15, 1649. [Google Scholar] [CrossRef]
Saeedi, S.; Rezayi, S.; Keshavarz, H.; Kalhori, S.R.N. MRI-based brain tumor detection using convolutional deep learning methods and chosen machine learning techniques. BMC Med. Inform. Decis. Mak. 2023, 23, 16. [Google Scholar] [CrossRef]
Tonni, S.I.; Sheakh, M.A.; Tahosin, M.S.; Hasan, M.Z.; Shuva, T.F.; Bhuiyan, T.; Almoyad, M.A.A.; Orka, N.A.; Rahman, M.T.; Khan, R.T.; et al. A Hybrid Transfer Learning Framework for Brain Tumor Diagnosis. Adv. Intell. Syst. 2025, 7, 2400495. [Google Scholar] [CrossRef]
Mannan, M.S.P.; Chowdhury, M.; Rahman, R.; Tamim, A.U.; Rahman, M.M. PMRAM: Bangladeshi Brain Cancer—MRI Dataset. 2024. Available online: https://data.mendeley.com/datasets/m7w55sw88b/1 (accessed on 5 December 2025).
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. arXiv 2019, arXiv:1709.01507. [Google Scholar] [CrossRef]

Figure 1. Proposed workflow for brain tumor classification using deep learning and explainable AI.

Figure 2. Architecture of the squeeze-and-excitation (SE) block showing the squeeze, excitation, and channel-wise feature recalibration process. The symbol ∗ denotes channel-wise multiplication (element-wise scaling), where each feature channel is multiplied by its corresponding learned weight to emphasize informative features and suppress less useful ones.

Figure 3. Architecture of the proposed SE-EfficientNetB3 model.

Figure 4. Training and validation loss (left column) and accuracy (right column) curves of the evaluated deep learning models across epochs. Each row represents a specific model, illustrating the convergence behavior and generalization performance during training.

Figure 5. Radar chart illustrating the computational efficiency of the evaluated deep learning models in terms of inference time, training time, GPU memory consumption, and RAM usage.

Figure 6. Confusion matrices of different deep learning models for brain tumor classification.

Figure 7. Receiver Operating Characteristic (ROC) curves (top left), precision–recall (PR) curves (top-right), and confidence distribution histogram (bottom) of the SE-enhanced EfficientNetB3 model for brain tumor classification.

Figure 8. Grad-CAM++ (top row) and saliency map (bottom row) visualizations of the SE-enhanced EfficientNetB3 model for different brain tumor types.

Figure 9. Normalized confusion matrix obtained from the cross-dataset evaluation where the SE-enhanced EfficientNetB3 model was trained on the PMRAM dataset and tested on the Sartaj dataset. The matrix illustrates the class-wise prediction distribution across the four categories.

Figure 10. Web application interface for brain tumor prediction with Grad-CAM interpretability visualization using uploaded MRI images.

Table 1. Number of images per class after splitting (before augmentation).

Class	Train	Val	Test
Glioma	298	37	38
Meningioma	290	36	37
Normal	316	39	41
Pituitary	298	37	38

Table 2. Number of images per class in the augmented training set.

Class	Total Images (Original + Augmented)
Glioma	894
Meningioma	870
Normal	948
Pituitary	894

Table 3. Training configuration used for the SE-enhanced deep learning models.

Parameter	Setting
Models	VGG19, DenseNet201, EfficientNetB3, MobileNetV3-Large, InceptionV3
Input Image Size	$224 \times 224$ (VGG19, DenseNet201, MobileNetV3-Large)
	$300 \times 300$ (EfficientNetB3), $299 \times 299$ (InceptionV3)
Batch Size	32
Number of Epochs	50
Early Stopping Patience	7
Optimizer	AdamW
Learning Rate	$1 \times 10^{- 4}$
Weight Decay	$1 \times 10^{- 4}$
Loss Function	Cross-Entropy Loss
Learning Rate Scheduler	ReduceLROnPlateau
Scheduler Patience	2
Scheduler Factor	0.5
Dropout Rate	0.2
SE Reduction Ratio	16
Data Loader Workers	2
Platform	Kaggle Notebook
Hardware	NVIDIA Tesla P100 GPU

Table 4. Performance comparison of SE-enhanced deep learning models for brain tumor classification.

Model	Accuracy	Precision	Recall	F1-Score
VGG19	0.9675	0.9691	0.9675	0.9676
DenseNet201	0.9805	0.9807	0.9805	0.9805
InceptionV3	0.9610	0.9610	0.9610	0.9608
MobileNetV3-Large	0.9740	0.9743	0.9740	0.9740
EfficientNetB3	0.9870	0.9877	0.9870	0.9870

Table 5. Statistical performance comparison of the evaluated deep learning models for brain tumor classification.

Model	MCC	Brier Score	Mean PPV	Mean NPV	Sensitivity	Specificity	95% CI
VGG19	0.9572	0.0110	0.9675	0.9892	0.9675	0.9892	[0.9351, 0.9935]
DenseNet201	0.9741	0.0078	0.9805	0.9935	0.9805	0.9935	[0.9545, 1.0000]
InceptionV3	0.9482	0.0155	0.9610	0.9870	0.9610	0.9870	[0.9286, 0.9870]
MobileNetV3-Large	0.9655	0.0110	0.9740	0.9913	0.9740	0.9913	[0.9481, 0.9935]
EfficientNetB3	0.9829	0.0045	0.9870	0.9957	0.9870	0.9957	[0.9675, 1.0000]

Table 6. Statistical significance analysis of the evaluated models using Chi-square and Friedman tests.

Model	Chi-Square Statistic	p-Value	Friedman Statistic	Friedman p-Value
VGG19	423.7080	$1.19 \times 10^{- 85}$	37.1922	$4.19 \times 10^{- 8}$
DenseNet201	438.4264	$8.54 \times 10^{- 89}$	35.3377	$1.03 \times 10^{- 7}$
InceptionV3	415.8009	$5.81 \times 10^{- 84}$	28.7532	$2.52 \times 10^{- 6}$
MobileNetV3-Large	430.8195	$3.60 \times 10^{- 87}$	11.9922	$7.40 \times 10^{- 3}$
EfficientNetB3	446.4130	$1.68 \times 10^{- 90}$	15.8571	$1.21 \times 10^{- 3}$

Table 7. McNemar significance test between EfficientNetB3 and other SE-enhanced models.

Model Comparison	p-Value	Significance
EfficientNetB3 vs. VGG19 + SE	0.125000	Not Significant
EfficientNetB3 vs. DenseNet201 + SE	0.625000	Not Significant
EfficientNetB3 vs. InceptionV3 + SE	0.375000	Not Significant
EfficientNetB3 vs. MobileNetV3-Large + SE	0.125000	Not Significant

Table 8. Five-fold cross-validation performance of the SE-enhanced EfficientNetB3 model.

Metric/Fold	Value
Accuracy (mean ± std)	$0.9689 \pm 0.0244$
Cohen’s Kappa (mean ± std)	$0.9586 \pm 0.0325$
Matthews Correlation Coefficient (mean ± std)	$0.9587 \pm 0.0324$
Fold 1 Accuracy	0.9903
Fold 2 Accuracy	0.9903
Fold 3 Accuracy	0.9612
Fold 4 Accuracy	0.9251
Fold 5 Accuracy	0.9778

Table 9. Performance of SE-enhanced EfficientNetB3 under different image perturbation conditions.

Perturbation Technique	Parameters	Test Accuracy (%)
Intensity Shift	$b = 0.02, c = 1.15$	98.70
CutMix Patch Mix	$α = 1.0$	96.75
Pixel Erase	$p = 1.0$	96.10

Table 10. Performance of SE-enhanced EfficientNetB3 across different random seeds.

Seed	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	MCC
11	98.70	98.75	98.71	98.71	0.9828
22	98.70	98.68	98.71	98.69	0.9827
33	98.70	98.75	98.78	98.73	0.9829
44	98.70	98.70	98.78	98.72	0.9828
55	98.05	98.07	98.10	98.06	0.9742
Average ± Std	98.57 ± 0.26	98.59 ± 0.26	98.62 ± 0.26	98.58 ± 0.26	0.9811 ± 0.0034

Table 11. Overall performance of EfficientNetB3 without SE modules.

Metric	Precision	Recall	F1-Score
Macro Average	0.9629	0.9598	0.9603
Weighted Average	0.9636	0.9610	0.9613
Accuracy	0.9610

Table 12. Performance comparison with state-of-the-art Transformer-based models.

Model	Accuracy (%)
DeiT-Base	96.10
Swin-Tiny	86.36
PoolFormer-S36 (MetaFormer)	97.40
ConvNeXt-Tiny	89.61

Table 13. Comparison of recent brain tumor classification studies in terms of dataset, best-performing model, accuracy (%), use of XAI, cross-dataset evaluation, and availability of a real-time demo application.

Citation	Dataset	Best Model	Accuracy (%)	XAI Used	Cross-Dataset Testing	Real-Time Demo App
[10]	SARTAJ	VGG16	98.00	No	No	No
[12]	Figshare	InceptionV3	98.89	No	No	No
[15]	Kaggle	RViT	98.60	No	No	No
[16]	Figshare	DenseNet121	99.02	Yes	No	No
[18]	Br35H + BraTS + Kaggle	BTDN	99.68	Yes	No	No
[19]	SARTAJ	EfficientNetB0	98.00	Yes	No	No
[20]	Kaggle	InceptionV3	98.50	No	No	No
[21]	Figshare + SARTAJ + Br35H	PDSCNN	99.30	Yes	No	No
[23]	Figshare + SARTAJ + Br35H	ResNet152V2	99.47	Yes	No	No
Our Study	PMRAM	SE-Enhanced EfficientNetB3	98.70	Yes	Yes	Yes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Polash, M.S.H.; Saykat, M.T.H.; Haque, M.E.; Maniruzzaman, M.; Zabin, M.; Uddin, J. An Interpretable Deep Learning Approach for Brain Tumor Classification Using a Bangladeshi Brain MRI Dataset. BioMedInformatics 2026, 6, 19. https://doi.org/10.3390/biomedinformatics6020019

AMA Style

Polash MSH, Saykat MTH, Haque ME, Maniruzzaman M, Zabin M, Uddin J. An Interpretable Deep Learning Approach for Brain Tumor Classification Using a Bangladeshi Brain MRI Dataset. BioMedInformatics. 2026; 6(2):19. https://doi.org/10.3390/biomedinformatics6020019

Chicago/Turabian Style

Polash, Md. Saymon Hosen, Md. Tamim Hasan Saykat, Md. Ehsanul Haque, Md. Maniruzzaman, Mahe Zabin, and Jia Uddin. 2026. "An Interpretable Deep Learning Approach for Brain Tumor Classification Using a Bangladeshi Brain MRI Dataset" BioMedInformatics 6, no. 2: 19. https://doi.org/10.3390/biomedinformatics6020019

APA Style

Polash, M. S. H., Saykat, M. T. H., Haque, M. E., Maniruzzaman, M., Zabin, M., & Uddin, J. (2026). An Interpretable Deep Learning Approach for Brain Tumor Classification Using a Bangladeshi Brain MRI Dataset. BioMedInformatics, 6(2), 19. https://doi.org/10.3390/biomedinformatics6020019

Article Menu

An Interpretable Deep Learning Approach for Brain Tumor Classification Using a Bangladeshi Brain MRI Dataset

Abstract

1. Introduction

2. Related Works

Research Gaps Identified

3. Methodology

3.1. Data Collection

3.2. Image Preprocessing and Feature Enhancement

3.3. Dataset Splitting and Data Augmentation

3.4. Baseline Models with Squeeze-and-Excitation Block (SE)

3.5. Squeeze-and-Excitation (SE) Attention Mechanism

3.5.1. SE-VGG19

3.5.2. SE-DenseNet201

3.5.3. SE-MobileNetV3-Large

3.5.4. SE-InceptionV3

3.5.5. Proposed SE-EfficientNetB3

3.5.6. Training Settings

3.6. Explainable AI (Grad-CAM++ and Saliency Maps)

3.7. Demo Web Application

3.8. Cross-Dataset Testing

3.9. Evaluation Metrics

4. Results and Discussion

4.1. Comparative Evaluation of Model Performance and Detailed Classification Metrics

4.2. Analysis of Accuracy and Loss Curve

4.3. Statistical Performance Analysis and Reliability Assessment

4.4. Analysis of Computational Cost

4.5. Analysis of Confusion Matrix

4.6. Identification of the Best-Performing Model

4.6.1. Discrimination and Confidence Analysis of EfficientNetB3

4.6.2. Generalization Analysis Using 5-Fold Cross-Validation

4.6.3. Robustness Analysis of SE-Enhanced EfficientNetB3 Under Image Perturbations

4.6.4. Seed-Wise Performance Stability and Reproducibility Analysis of SE-Enhanced EfficientNetB3

4.6.5. Ablation Study of EfficientNetB3 Without SE

4.7. Interpretability Analysis of SE-Enhanced EfficientNetB3 Using Grad-CAM++ and Saliency Maps

4.8. Performance Comparison with State-of-the-Art Transformer Architectures

4.9. Cross-Dataset Evaluation with Sartaj Dataset

4.10. Online Visualization and Prediction Tool

4.11. Comparative Analysis with Previous Studies

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI