1. Introduction
The brain consists of nerves and connective tissue and is the most complex and vital human organ. As a biological neural network, the brain contains a system of more than 89 billion communicating neurons with the transmission rate of 10 to 100 signals per second [
1]. The brain controls most physical tasks, including awareness, movements, breathing, digestion, sensations, thoughts, speech, and memory. The diseases that disrupt brain function have a profound negative impact on people’s lives, their families, and communities. Brain tumors develop when aggressive, rapid, and uncontrollable cell growths happen in the brain. This disease causes alteration to neuronal networks, has the highest mortality rate [
2], and is the 10th leading cause of death for adults worldwide [
3]. Without medical intervention, this condition can lead to cognitive and motor deficits, visual impairment, speech impairment, paralysis, and often death [
4]. There exist more than 100 forms of different brain tumors [
5] that can be classified into primary and secondary categories depending on their source. Primary brain tumors originate in the primary brain tissues and they can be malignant (cancerous) or benign (non-cancerous). Internal pressure and compromised functioning of the brain are the results of the slowly growing but less aggressive benign tumors [
6]. The 5-year relative survival rate of patients with malignant brain tumors, however, is only 36% [
7]. For example, 2019 has seen 347,992 new cases of brain cancer with 187,491 (54%) males and 160,501 (46%) females. According to [
8] in 2019, 138,605 male and 107,648 female deaths attributed to brain cancer were recorded worldwide.
Secondary or metastatic brain tumors usually spread from other areas such as lungs, breasts, skin, colon, etc., towards the brain [
2]. The secondary tumors are composed of the same type of cells as the primary tumors.
The main categories of brain tumors are glioma, meningioma, and pituitary tumors as shown in
Figure 1. Glioma originates in the glial cells and is associated with a higher mortality degree [
2]. Low-grade glioma (LGG) and high-grade glioma (HGG) are the two forms of glioma where LGG is less lethal and has generally a better outcome than HGG. Meningioma originates in the meninges—the outer three tissue layers that separate the brain and the skull [
9]. Pituitary tumors are abnormal cellular cysts in the pituitary gland, which produces the endocrine hormones. Most pituitary tumors are not cancerous [
9].
Surgery, chemotherapy, and radiotherapy are often required to treat brain tumors. Timely diagnosis of brain pathology is crucial for the patient’s survival. Magnetic Resonance Imaging (MRI) is the standard for radiation-free diagnostic neuroimaging with an excellent soft tissue contrast, high resolution, and multiplanar capability. Thus, the evaluation of soft tissue tumors is now predominantly performed through the MRI, which has replaced computed tomography (CT). MRI brain scans are first used to classify images into cancerous (tumor) and normal (no tumor). Then, further classification of tumors is needed as outlined above. However, tumors often have hazy boundaries and diverse morphology presenting significant visual recognition challenges for manual MRI image analysis by radiologists. Additionally, there is the problem of background noise and low contrast in MRI images [
11]. Finally, the manual assessment of radiographs is time-consuming, non-reproducible, and even subjective in some cases.
AI and deep learning can greatly assist in brain tumor classification and triage solutions to detect benign and malignant cases. In practice, clinicians rely on real-time data to make decisions given the life-changing consequences for patients. Thus, edge devices that can provide close to real-time analysis results are preferred. This requires CPU- and GPU-enabled computational platforms with embedded machine and deep learning algorithms, often referred to as medical edge computing. There are several advantages associated with medical edge computing, such as enhanced patient care, low latency for critical applications, effective use of computational resources, scalability, and flexibility. Additionally, instead of sending images to centralized servers for analysis in medical diagnostics, data can be processed locally on the edge device, thus accelerating diagnosis and treatment.
Accurate brain tumor classification from MRI images using deep learning has recently received great attention in the literature [
2,
5,
11,
12,
13,
14,
15]. However, there are comparably fewer works that consider medical edge computing applied to brain tumor classification [
3,
16,
17,
18] and that explore the complexity and efficiency of the deep learning models.
Our main contributions can be summarized as follows:
We propose a shallow convolutional neural network (CNN) architectural framework for brain tumor classification in MRI data. This architecture includes spatial and channel attention ensembling and is particularly suited for medical edge computing.
We investigate several state-of-the-art model compression techniques which substantially reduce model complexity and training and inference times.
We quantify the tradeoff between the drop in accuracy and the gain in inference time.
To test the generality of our model, we perform validation on three different public brain tumor MRI datasets.
This paper is organized as follows.
Section 1 provides an overview of the most recent work on brain tumor classification in MRI data. Our methodology in
Section 2 outlines the proposed architecture, datasets used, experimental setup, and implementation details. The experimental
Section 3 describes the experiments on the three datasets used. We finish with the Discussion in
Section 4 and the Conclusion in
Section 5.
Recent works on brain tumor classification in MRI have predominantly focused on improving model performance by leveraging advanced deep learning techniques. As seen in
Table 1, the research from 2024 and 2025 can be broadly categorized into several key approaches.
A significant portion of the work utilizes transfer learning from pre-trained models. This approach, as demonstrated by Agrawal et al. [
19] and Khaliki et al. [
14], leverages the feature-extraction capabilities of large-scale models such as InceptionV3 and VGG16 to achieve high accuracies (e.g., 99%). Similarly, Mathivanan et al. [
20], Shoaib et al. [
21], and Bibi et al. [
22] all employed various pre-trained backbones (e.g., DenseNet, MobileNet, and EfficientNet) to fine-tune their models for tumor classification.
Table 1.
Summary of the reviewed papers. We report the best performing models clustered into two groups according to the use of datasets and further ranked in descending classification accuracy. The Cheng et al. [
10] dataset is also known as the Figshare dataset.
Table 1.
Summary of the reviewed papers. We report the best performing models clustered into two groups according to the use of datasets and further ranked in descending classification accuracy. The Cheng et al. [
10] dataset is also known as the Figshare dataset.
| Author | Citation | Year | Dataset | Model | Accuracy | Precision | Recall | F1 Score |
|---|
| Ullah et al. | [23] | 2024 | Cheng et al. [10] | InceptionResnetV2 | 99.80% | - | - | - |
| Pacal et al. | [24] | 2025 | Cheng et al. [10] | NeXtBrain | 99.78% | 99.69% | 99.84% | 99.77% |
| Mathivanan et al. | [20] | 2024 | Cheng et al. [10] | MobileNetv3 | 99.61% | 98.28% | 99.75% | 99.00% |
| Agrawal et al. | [19] | 2024 | Cheng et al. [10] | InceptionV3 | 99.57% | 99.56% | 99.46% | 99.50% |
| Haque et al. | [25] | 2024 | Cheng et al. [10] | NeuroNet19 | 99.31% | 99.27% | 99.27% | 99.27% |
| Khaniki et al. | [26] | 2024 | Cheng et al. [10] | Cross ViT | 99.24% | 99.20% | 99.27% | 99.23% |
| Kadhim et al. | [27] | 2024 | Cheng et al. [10] | ResNet50 | 98.85% | - | - | 98.10% |
| Shaik et al. | [28] | 2024 | Cheng et al. [10] | MedTransNet | 98.37% | 98.18% | 98.27% | 98.16% |
| Dutta et al. | [29] | 2024 | Cheng et al. [10] | GT-Net | 97.11% | - | - | 96.39% |
| Dutta et al. | [30] | 2024 | Cheng et al. [10] | ARM-Net | 96.64% | 96.46% | 96.09% | 96.20% |
| Mohanty et al. | [31] | 2024 | Cheng et al. [10] | Dolphin-SCA-based | 95.10% | 95.11% | 95.06% | 95.14% |
| Hekmat et al. | [32] | 2025 | Cheng et al. [10] | CLAHE + DWT | 94.28% | 94.33% | 92.33% | 93.00% |
| Shoaib et al. | [21] | 2024 | Nickparvar [33] | DenseNet201 | 100% | 100% | 100% | 100% |
| Ullah et al. | [34] | 2024 | BraTS2020-21 [35,36] | ResNet-50 | 99.80% | - | - | - |
| Ishfaq et al. | [37] | 2025 | Kaggle [33,38,39] | Custom CNN | 99.751% | 98.67% | - | 98.65% |
| Rastogi et al. | [40] | 2024 | Br35H [41] | Multi-branch Net | 99.30% | - | - | 95.64% |
| Asiri et al. | [42] | 2024 | Pashaei et al. [43] | ICA Model | 98.90% | - | - | - |
| Bibi et al. | [22] | 2024 | SARTAJ [38] and Br35H [41] | InceptionV4 | 98.70% | 99.00% | 98.20% | 99.10% |
| Krishnan et al. | [44] | 2024 | Nickparvar [33] | RViT | 98.60% | 98.40% | 98.75% | 98.60% |
| Khaliki et al. | [14] | 2024 | Brain Tumor Dataset [38] | VGG16 | 98.00% | 98.00% | 98.00% | 97.00% |
| Remzan et al. | [45] | 2024 | Nickparvar [33] | MLP Ensemble | 97.71% | 97.71% | 97.71% | 97.70% |
| Oztel | [46] | 2025 | Bhuvaji [38] | Ensemble CNNs-ViT | 84.35% | 87.32% | 85.28% | 84.06% |
Another major trend is the integration of attention mechanisms and Transformers. These models are designed to improve feature selection by focusing on informative regions. Dutta et al. [
30] incorporated spatial attention to enhance their ARM-Net’s feature detection, while Dutta et al. [
29] used a Global Transformer Module (GTM) to better select features across different dimensions. More advanced Vision Transformer (ViT) architectures have also been applied, with Khaniki et al. [
26] introducing a selective cross-attention mechanism and Krishnan et al. [
44] developing a Rotation-Invariant ViT (RViT) to handle different image orientations.
Hybrid and ensemble models are also popular as they combine the strengths of multiple architectures. For example, Remzan et al. [
45] used both feature and stacking ensembles of ResNet-50 and DenseNet-121, achieving high AUC and accuracy scores. A recent work by Pacal et al. [
24] further exemplifies this by combining CNN and Transformer architectures to create a powerful hybrid model named NeXtBrain. Additionally, some authors [
21,
42] combined deep learning feature extraction with classical machine learning methods like SVM for the final classification.
Beyond architectural design, other works focused on optimization and data preprocessing. This includes implementing custom, often lightweight, CNNs [
19,
31] and using optimization algorithms. For instance, Kadhim et al. [
27] used Particle Swarm Optimization (PSO) to improve feature selection, while Ullah et al. [
23,
34] used sparse autoencoders to handle imbalanced datasets and various evolutionary algorithms for hyperparameter tuning. Other preprocessing techniques, such as adaptive filtering for noise reduction [
42] and histogram equalization for contrast enhancement [
32], were also explored to improve model performance.
Most recent 2025 studies on brain tumor classification from MRI data are converging on hybrid, multi-faceted approaches to achieve superior performance. Ishfaq et al. [
37] and Ismail Oztel [
46] both employ transfer learning. While Ishfaq et al. use a custom CNN focused on computational efficiency for a ten-class prediction system, Ismail Oztel enhance their approach with image preprocessing using wavelet transforms to capture more detailed features and then use an ensemble of top-performing models. Hekmat et al. [
32] also utilize image preprocessing with CLAHE and DWT for feature enhancement before a feature fusion architecture using DenseNet models. The most advanced method, NeXtBrain by Pacal et al. [
24], represents a sophisticated hybrid architecture that combines a specialized convolutional block to capture local details and a Transformer block to model global spatial relationships achieving remarkable 99.78% classification accuracy.
As is clear from the above, while there has been a large number of recent works on designing accurate classification models, there is in contrast no attempts to date to investigate and to quantify the effects of model compression on accuracy and inference speed for brain tumor classification in MRI data. This leaves a significant gap in the literature.
4. Discussion
The results across all experiments highlight several important findings regarding the performance of the proposed ANSA Ensemble and its compressed variants, especially when compared to state-of-the-art models. In Dataset 1, the ANSA Ensemble achieved competitive accuracy, only slightly lower than the best CNN Ensemble model, while maintaining significantly reduced inference times compared to large architectures such as fine-tuned VGG16. This shows the efficiency of the model’s attention-guided design, which balances accuracy and speed effectively.
The five-fold cross-validation experiments further confirm the robustness of the ANSA Ensemble architecture, showing stable performance with low variance in precision, recall, F1 score, and specificity. The DSC-compressed models showed slight drops in accuracy, but they consistently reduced inference time and model complexity, demonstrating the effectiveness of depthwise-separable convolutions in achieving better speed–accuracy tradeoffs compared to pruning, which consistently yielded the poorest results both in accuracy retention and tradeoff factor.
The tradeoff factor analysis across all three datasets provides quantitative support for these observations. The DSC and RFM methods consistently delivered the smallest accuracy drop along with notable speed gains, offering the most balanced compromise between performance and efficiency. By contrast, pruning substantially degraded accuracy, suggesting that aggressive weight removal may disproportionately affect feature extraction.
Monte Carlo simulation results with both 70-10-20 data splits and five-fold cross-validation further highlight the stability of the ANSA Ensemble architecture, with standard deviations remaining low across runs. Despite the slight accuracy reductions observed in models compressed using DSC, the improvements in computational efficiency and reduced parameter counts are likely beneficial in real-world settings where inference speed and deployment feasibility are critical, such as in point-of-care systems.
Explainability analyses using Grad-CAM revealed that both the baseline and compressed models consistently focused on tumor regions across the Cheng, Bhuvaji, and Sherif datasets. This alignment between model attention and clinically relevant regions enhances trust in the decision-making process and underscores the suitability of ANSA-based architectures for medical applications.
Cross-dataset generalization experiments showed a notable performance drop when models were tested on datasets with different class distributions, particularly when moving from a three-class to a four-class problem. This suggests that while the core features learned by the ANSA Ensemble are transferable, fine-tuning the final classification layers remains necessary to account for domain and label distribution shifts.
The ablation study provides strong evidence of the importance of attention blocks and the integration of Gaussian Context Transformer (GCT) modules. Incrementally adding attention blocks consistently improved accuracy, precision, recall, and F1 score across all datasets. Furthermore, the presence of GCT consistently boosted performance over configurations without it, indicating that the spatial and channel attention mechanisms contribute significantly to the ability of the model to capture clinically relevant features in MRI brain scans.
Overall, the proposed ANSA Ensemble demonstrates a compelling balance between accuracy, efficiency, and interpretability. While the highest absolute accuracy was achieved by larger ensemble CNN models, the ANSA Ensemble model with its integrated spatial awareness offers a practical alternative for real-world clinical deployment.
Although the proposed model demonstrates promising performance, its accuracy is likely to improve further with an increase in the number of attention blocks and the overall depth of the network. However, deeper architectures inherently introduce a larger number of trainable parameters, which leads to increased computational cost and memory consumption. To address this tradeoff between model complexity and efficiency, future work will explore advanced model compression and optimization strategies.
In particular, we plan to investigate the Lottery Ticket Hypothesis [
65]-based model compression technique. This approach suggests that dense neural networks contain smaller subnetworks, which are also known as winning tickets, which can match the performance of the full model when trained independently. This idea enables the extraction of sparse, efficient architectures that retain accuracy while substantially lowering parameter count and computational cost. Such models offer a promising path toward lightweight, high-performing systems suitable for deployment in resource-constrained environments.
5. Conclusions
This work presents the ANSA Ensemble, an attention-guided deep learning model designed for brain tumor classification from MRI images. The key novelty of the proposed method lies in its integration of spatial awareness through l2-normalized attention blocks combined with Gaussian Context Transformers (GCTs), which together enable precise and robust focus on tumor regions. This architecture outperforms several existing models by effectively capturing both spatial and channel-wise contextual information, resulting in improved classification accuracy and interpretability.
The ANSA Ensemble achieves competitive accuracy compared to state-of-the-art CNN ensembles while offering significant gains in computational efficiency and reduced inference time. Among model compression techniques, depthwise-separable convolutions (DSCs) prove most effective at maintaining accuracy with improved speed and lower parameter counts, highlighting the potential for deploying lightweight yet powerful models in clinical practice.
Comprehensive evaluations using multiple datasets, various validation methods, and Monte Carlo simulations demonstrate the robustness and stability of the proposed models. The explainability analyses using Grad-CAM further reinforce the clinical validity of the model by showing its attention aligns with medically relevant tumor regions. Ablation studies confirm the added value of each attention block and GCT component in boosting classification performance.
Overall, the ANSA Ensemble model offers a unique combination of accuracy, interpretability, and efficiency, making it a promising candidate for real-world clinical applications in brain tumor diagnosis. Its novel attention mechanisms and effective compression strategies provide tangible value in enhancing diagnostic precision while enabling practical deployment on resource-constrained systems.