Morphology-Aware Multi-Scale Deep Representation Learning for Interpretable Knowledge Extraction in Brain Tumor MRI

AlShehri, Helala; Busaleh, Mariam

doi:10.3390/make8050119

Open AccessArticle

Morphology-Aware Multi-Scale Deep Representation Learning for Interpretable Knowledge Extraction in Brain Tumor MRI

by

Helala AlShehri

^*

and

Mariam Busaleh

Computer and Information Technology Department, Jubail Industrial College, Jubail 35718, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Mach. Learn. Knowl. Extr. 2026, 8(5), 119; https://doi.org/10.3390/make8050119

Submission received: 1 March 2026 / Revised: 22 April 2026 / Accepted: 29 April 2026 / Published: 1 May 2026

(This article belongs to the Section Learning)

Download

Browse Figures

Versions Notes

Abstract

Robust brain tumor classification from magnetic resonance imaging (MRI) remains challenging due to complex structural heterogeneity and subtle inter-class variability. Beyond predictive accuracy, conventional convolutional neural networks predominantly rely on texture-dominant features and fixed receptive fields, which may limit the extraction of clinically meaningful structural information. This study proposes a morphology-aware multi-scale deep representation learning framework that embeds morphological inductive bias directly within hierarchical feature extraction. The proposed architecture synergistically integrates trainable morphological operations with multi-scale convolutional feature learning inside a unified residual framework, supported by an in-block morphological refinement mechanism and a morphology-aware downsampling module. Unlike prior approaches that treat morphological operators as preprocessing or auxiliary branches, the proposed design incorporates differentiable dilation and erosion into the core feature hierarchy to guide structure-aware representation formation. The model was evaluated using five-fold cross-validation and an independent test set, achieving an overall test accuracy of 99.31% with consistently high macro-averaged precision, recall, F1-score, and AUC values. Grad-CAM analysis further demonstrates that the learned representations emphasize clinically relevant tumor regions, supporting interpretable structural knowledge extraction. Ablation studies confirm that performance improvements arise from the synergistic integration of multi-scale learning and morphology-aware refinement. Overall, embedding structural inductive bias within multi-scale deep representation learning enhances robustness, stability, and interpretable knowledge extraction for brain tumor MRI analysis.

Keywords:

morphology-aware learning; multi-scale deep representation learning; structural inductive bias; interpretable machine learning; knowledge extraction; brain tumor MRI

1. Introduction

Medical image classification has become a central component of modern computer-aided diagnosis systems, enabling automated analysis that can support clinical decision-making and reduce diagnostic workload. In particular, brain tumor classification from magnetic resonance imaging (MRI) plays a critical role in early detection, treatment planning, and outcome assessment [1,2]. However, accurate multi-class classification remains challenging due to high intra-class variability, inter-class differences, and the complex structural patterns present in medical images. Beyond classification accuracy, extracting clinically meaningful structural knowledge from MRI data remains an open challenge in deep representation learning. Recent reviews have highlighted that deep learning methods, especially convolutional neural networks (CNNs), have considerably advanced automated tumor classification, but there remain critical gaps in handling subtle morphological variations and capturing comprehensive contextual features across classes [3].

Deep convolutional neural networks (CNNs) have demonstrated remarkable success in medical image analysis by automatically learning hierarchical feature representations [4,5,6]. Nevertheless, conventional CNN architectures primarily rely on texture-based features and fixed receptive fields, which may limit their ability to capture the structural and shape-related characteristics that are often crucial for distinguishing medical conditions [7,8]. This texture-dominant inductive bias may hinder the extraction of structure-aware representations necessary for interpretable knowledge discovery. Moreover, single-scale convolutional designs can struggle to simultaneously model fine-grained local details and broader contextual information, both of which are essential for robust tumor characterization [9,10].

To address these limitations, multi-scale feature extraction has been widely explored as a means to enhance representational richness by combining information from different spatial resolutions [11]. While multi-scale CNNs improve contextual awareness, they still largely depend on standard convolutional operations and pooling mechanisms that may not explicitly emphasize morphological structures. As a result, the learned feature hierarchies often remain implicitly texture-driven rather than explicitly structure-guided. Recent work by Ke et al. [12] introduced a multi-scale channel attention CNN integrated with an SVM classifier, demonstrating that explicit multi-scale mechanisms improve discriminative capability for brain tumor classification. In medical imaging, where class distinctions often arise from subtle boundary variations and shape irregularities, incorporating morphology-aware representations can be particularly beneficial.

Morphological operations, such as dilation and erosion, are well known for their ability to highlight structural patterns and geometric properties in images. However, traditional morphological operators are non-differentiable and thus incompatible with end-to-end deep learning frameworks. Recent efforts have sought to integrate differentiable morphological approximations into neural networks, enabling structure-aware feature learning while maintaining trainability [13,14,15,16]. However, these prior approaches typically treat morphological operators as standalone preprocessing layers or append them as peripheral branches, failing to deeply couple them with hierarchical, multi-scale feature learning. Consequently, the potential for morphological operations to guide and refine features synergistically across multiple scales, and during critical operations like spatial downsampling, remains largely unexplored. Furthermore, limited attention has been given to embedding morphological inductive bias directly into deep representation learning pipelines for interpretable knowledge extraction.

In this work, we propose a Morphology-Aware Multi-Scale Convolutional Network designed for robust multi-class medical image classification. While the individual building blocks of the proposed framework have been explored in prior studies, the novelty of this work lies in their integration into a unified morphology-aware multi-scale architecture specifically tailored for brain MRI analysis. From a representation learning perspective, the proposed framework integrates structural inductive bias directly within hierarchical feature construction to enable interpretable structural knowledge extraction. The proposed architecture integrates multi-scale convolutional pathways with trainable morphology-aware components that explicitly emphasize structural and shape-based information during both feature extraction and downsampling. By combining complementary spatial representations and morphology-aware learning, the proposed method aims to improve discriminative capability while maintaining computational efficiency and stable optimization. The main contributions of this work are:

A structure-guided deep representation learning architecture embedding trainable morphological operations as core primitives within multi-scale blocks and downsampling for joint structural-semantic optimization.
Morphology-aware downsampling module preserving boundary cues by fusing original and morphologically transformed features before spatial reduction.
Residual morphology-aware multi-scale block fusing multi-scale features with parallel dilation/erosion and residual connections for stable morphology-enhanced learning.
Comprehensive evaluation on brain tumor MRI showing superior accuracy, balanced performance, and improved generalization.
Computational efficiency analysis confirming practical viability for clinical deployment.

2. Related Work

Recent studies on brain tumor classification using magnetic resonance imaging (MRI) indicate a clear transition from conventional machine learning pipelines toward deep learning-based approaches, with an increasing emphasis on transfer learning and ensemble strategies. Early research efforts were largely constrained by the scarcity of annotated datasets and limited computational resources, which motivated the use of manually engineered texture and shape descriptors combined with shallow classifiers, such as support vector machines and extreme learning machines. While these methods achieved acceptable performance and computational efficiency, their effectiveness was highly dependent on feature selection and segmentation accuracy, thereby limiting their robustness across diverse tumor types and varying imaging conditions [17,18,19].

As deep learning optimization techniques have grown and publicly available MRI datasets have become more accessible, convolutional neural networks (CNNs) emerged as the dominant paradigm for brain tumor classification. Transfer learning approaches leveraged pretrained CNN backbones to mitigate data scarcity and accelerate convergence during training. More advanced deep transfer learning frameworks further enhanced performance by integrating task-specific dense layers and refined classification heads, typically evaluated under three-class and four-class classification settings [20]. For instance, Deepak and Ameer [21] demonstrated the effectiveness of fine-tuning a pretrained GoogLeNet model on the Figshare MRI dataset, achieving strong performance under five-fold cross-validation. Similarly, evolutionary optimization strategies have been incorporated into transfer learning pipelines, where convolutional backbones such as ResNet were optimized using genetic algorithms to enhance discriminative capability and parameter efficiency [22]. Recent studies have further refined transfer learning architectures by modifying pretrained ResNet-50 models with additional task-specific layers and feature transformation modules, reporting improved robustness in multi-class tumor classification settings [23]. Similarly, Bukaita and Vadde [24] compared standard CNNs with ResNet-18 for MRI-based tumor classification, highlighting the trade-offs between architectural depth and generalization.

Beyond standard CNN architectures, alternative deep learning paradigms have also been explored. Afshar et al. [25] introduced a modified Capsule Network (CapsNet) that explicitly models spatial relationships between features, reporting an accuracy of 90.89%. Rasheed et al. [26] proposed a CNN-based framework incorporating image enhancement techniques, including Gaussian blur-based sharpening and contrast-limited adaptive histogram equalization, achieving a classification accuracy of 97.84% across four tumor classes. In addition, data augmentation strategies such as scaling and rotation have been widely adopted to further improve model generalization and classification performance [27].

An important methodological direction in this domain involves decoupling feature extraction from classification in order to alleviate overfitting, particularly in small-scale neuroimaging datasets. The RBEBT framework exemplifies this strategy by employing a fine-tuned ResNet-18 network for deep feature extraction, followed by a randomized neural network classifier optimized using a bat algorithm. Although this hybrid design reported very high accuracy under cross-validation settings, its evaluation was restricted to binary classification and conducted on relatively limited datasets, which raises concerns regarding its scalability and generalization to more complex multi-class scenarios [28]. A related hybrid strategy was proposed by Liu et al. [29], who extracted CNN features and then applied traditional machine learning classifiers, achieving competitive performance on multi-class MRI data.

To further enhance robustness, recent studies have explored hybrid and ensemble learning strategies. A hybrid CNN–LSTM ensemble was shown to improve classification stability by capturing both spatial and sequential feature dependencies; however, this gain came at the cost of increased architectural complexity [30]. Likewise, a deep ensemble framework integrating GAN-based data balancing with optimized ensemble weighting addressed class imbalance and demonstrated strong generalization performance [31]. In contrast, automated “smart” classification pipelines that combined EfficientNet and Inception architectures reported very high accuracy, yet their effectiveness relied heavily on extensive data augmentation, raising concerns about scalability and reproducibility across different clinical settings [32].

In parallel with convolutional neural networks, Vision Transformers (ViTs) have emerged as an alternative deep learning paradigm, demonstrating a strong capability to model long-range dependencies within medical images [33,34]. Elhadidy et al. [35] conducted a comparative study involving CNNs, Swin Transformers, and EfficientNet for brain tumor classification from MRI scans. Under standardized preprocessing and data augmentation protocols, EfficientNet achieved the highest accuracy (98.72%), outperforming conventional CNN models (95.16%) while maintaining superior computational efficiency, thereby underscoring its potential suitability for clinical deployment.

Further transformer-based designs have also been explored. Wang et al. [34] proposed a transformer architecture that employs token combination strategies to enhance representational efficiency without compromising classification performance. Similarly, Aloraini et al. [36] introduced a hybrid transformer-enhanced convolutional framework in which CNN layers extract local spatial features, while self-attention mechanisms capture global contextual information. Transformer-based and hybrid architectures have demonstrated strong representational capabilities in medical imaging. In this work, we focus on developing a lightweight and morphology-aware convolutional architecture that offers competitive performance while emphasizing efficiency and structural interpretability, thereby complementing existing transformer-based approaches.

Interpretability has become a critical requirement in medical image analysis. Recent studies have utilized visualization techniques such as Grad-CAM to highlight the regions influencing model predictions, thereby improving transparency and clinical trust [37]. In this work, Grad-CAM is employed to analyze the decision-making process of MA-MSCNet. The generated activation maps indicate that the model effectively focuses on tumor-relevant regions, demonstrating consistency with medical knowledge while complementing the proposed morphology-aware feature learning.

In light of the limitations observed in existing brain tumor classification approaches, there is a clear need for models that can effectively capture structural and shape-related characteristics while preserving multi-scale contextual information. Many current deep learning frameworks either rely predominantly on texture-driven representations or introduce excessive architectural complexity to achieve robustness, which may hinder scalability and clinical applicability. Furthermore, limited attention has been given to explicitly modeling morphological information within the feature extraction and downsampling processes. These considerations motivate the development of a morphology-aware and multi-scale learning framework that can enhance discriminative performance while maintaining computational efficiency and stable training behavior.

3. Materials and Methods

3.1. Brain Tumor MRI Dataset

This study follows a retrospective observational design using a publicly available dataset curated by Kadam et al. [38] and hosted on Kaggle.The dataset consists of 7023 MRI slices collected from multiple sources, including Figshare, SARTAJ, and Br35H repositories. Inclusion was determined based on dataset availability and labeling completeness, while no explicit exclusion criteria were provided by the dataset curators. Due to the absence of subject-level identifiers, the dataset is treated as an image-level cohort rather than a patient-level cohort. Labels were assigned by the dataset curators based on expert-reviewed sources and have been widely adopted in prior brain tumor classification studies. Ground truth labels were assigned by the dataset curators based on expert-reviewed clinical sources and radiological findings. Each MRI slice is annotated into one of four classes (glioma, meningioma, pituitary tumor, or no tumor). As annotations are provided at the image level rather than the patient level, no inter-observer variability information is available. Representative samples from each class are shown in Figure 1. The dataset was selected due to its standardized labeling and widespread use in the literature, enabling fair comparison with existing methods.

3.2. Image Preprocessing and Data Augmentation

A standardized preprocessing pipeline was applied to all images to ensure consistency and facilitate model convergence. All images were resized to 125 × 125 pixels to balance computational efficiency and preservation of relevant structural information. This resolution falls within the range commonly adopted in prior studies (e.g., 96 × 96 to 224 × 224) [39]. The impact of input resolution is further analyzed in Section 4.4.6, where multiple resolutions are systematically evaluated under identical experimental settings. Pixel intensities were then normalized to the range

[0, 1]

by dividing by the maximum value (255). Finally, images were reshaped to a tensor format of

(125, 125, 1)

to match the single-channel input expected by the proposed network.

To improve model generalization and mitigate overfitting, data augmentation was applied exclusively to the training data within each cross-validation fold, while the validation data remained completely unaltered to prevent data leakage. The augmentation was applied uniformly across all classes and was not intended to address class imbalance. Consequently, the original class distribution was preserved, and no explicit rebalancing techniques (e.g., class weighting or targeted oversampling of minority classes) were employed. Data augmentation was implemented using Keras ImageDataGenerator with the following parameters: rotation_range = 10, width_shift_range = 0.05, height_shift_range = 0.05, and horizontal_flip = True. Augmentation was applied online during training and restricted to the training data only.

3.3. Proposed Morphology-Aware Multi-Scale Network (MA-MSCNet) Architecture

This section presents the architecture of the proposed Morphology-Aware Multi-Scale Convolutional Network (MA-MSCNet), which explicitly preserves structural and morphological characteristics across the feature hierarchy while enabling efficient end-to-end learning. From a deep representation learning perspective, the architecture is designed to embed structural inductive bias directly within hierarchical feature construction. The network integrates multi-scale convolutional processing, trainable morphological refinement, and morphology-aware downsampling within a unified residual learning framework.

The core design philosophy of MA-MSCNet is to move beyond treating morphological operations as an external enhancement and instead integrate them as fundamental components of the representation learning process. While previous attempts at differentiable morphology have largely focused on replacing individual convolutional layers or adding parallel morphological branches, our approach seeks a deeper integration.

We achieve this through two key innovations: (i) a morphology-aware downsampling module that prevents structural information loss during spatial compression, a known weakness of standard pooling; and (ii) a residual morphology-aware multi-scale block where multi-scale features are jointly optimized with their morphologically refined counterparts. Together, these mechanisms ensure that structural cues are not merely appended to the network but actively guide feature evolution across scales. This design ensures that shape and boundary cues are actively used to guide representation learning at every stage of the network, rather than being computed in isolation. As a result, MA-MSCNet promotes interpretable structural knowledge extraction while maintaining architectural efficiency.

3.3.1. Trainable Morphological Operators

Classical morphological operations such as dilation and erosion are effective in emphasizing structural patterns and object boundaries; however, they are not directly compatible with gradient-based optimization. To enable end-to-end learning, MA-MSCNet incorporates differentiable, trainable morphological layers that approximate classical morphology while remaining fully trainable.

Figure 2 illustrates the proposed operators. Given an input feature map X

X \in R^{B \times H \times W \times C},

(1)

the trainable dilation is defined as

Dil (X) = {MaxPool}_{k \times k} (X + W_{d}),

(2)

where

W_{d} \in R^{1 \times 1 \times 1 \times C}

is a learnable channel-wise offset. Similarly, erosion is formulated as

Ero (X) = - {MaxPool}_{k \times k} (- X + W_{e}),

(3)

with learnable offset

W_{e}

.

These operators enhance structural saliency and boundary sensitivity while preserving spatial resolution and differentiability.

3.3.2. Morphology-Aware Downsampling

Standard pooling operations may discard fine structural cues during spatial reduction. To mitigate this limitation, MA-MSCNet introduces a morphology-aware downsampling module that enriches features prior to spatial compression.

As shown in Figure 3, given an input feature map

x \in R^{B \times H \times W \times C}

, parallel dilation and erosion generate

Dil (x)

and

Ero (x)

. These are concatenated with the original representation as

[x, Dil (x), Ero (x)],

(4)

and fused using a

1 \times 1

convolution followed by batch normalization and ReLU activation. Average pooling with stride 2 is then applied to reduce spatial resolution.

By incorporating morphology-driven refinement before downsampling, the proposed module preserves boundary-sensitive and shape-related information that may otherwise be degraded by conventional pooling, which is particularly beneficial for tumor characterization in MRI.

3.3.3. Morphology-Aware Multi-Scale Block

The proposed Morphology-Aware Multi-Scale (MA-MSC) block, illustrated in Figure 4, serves as the fundamental building unit of MA-MSCNet. Given an input feature map

X \in R^{B \times H \times W \times C}

, the block integrates multi-scale convolutional learning, morphological refinement, and residual fusion in a unified framework.

Multi-Scale Feature Extraction: As shown in Figure 4, parallel

3 \times 3

and

5 \times 5

convolutional branches are applied to capture local texture patterns and broader contextual information, respectively. Let

F_{3 \times 3} (X)

and

F_{5 \times 5} (X)

denote the outputs of these branches. The fused multi-scale representation is defined as

F_{m s} = Concat (F_{3 \times 3} (X), F_{5 \times 5} (X))

(5)

This concatenation preserves complementary spatial representations prior to channel projection.

Morphological Refinement: The fused multi-scale features

F_{m s}

are further processed through trainable dilation and erosion operators (Figure 4), enabling adaptive structural enhancement. The morphology-enhanced representation is defined as

F_{morph} = Concat (F_{m s}, Dil (F_{m s}), Ero (F_{m s}))

(6)

Since each component has dimensionality

B \times H \times W \times C

,

F_{m o r p h}

has dimensionality

B \times H \times W \times 3 C

.

The concatenated tensor is then projected via a

1 \times 1

convolution and batch normalization to generate a structure-refined feature map:

R (X) = BN ({Conv}_{1 \times 1} (F_{morph}))

(7)

Unlike conventional convolutional filtering, which primarily captures local linear combinations of pixel intensities, morphological operations explicitly model structural transformations such as expansion (dilation) and contraction (erosion) of salient regions. These operators are particularly effective in medical imaging scenarios, where lesion boundaries, shape irregularities, and regional continuity serve as important diagnostic cues.

By incorporating dilation and erosion into the feature learning pipeline, the proposed MA-MSC block enhances boundary sensitivity and structural consistency prior to channel compression. The subsequent

1 \times 1

projection enables adaptive recalibration of the expanded feature space, allowing the network to selectively emphasize morphology-aware responses while preserving multi-scale contextual information.

Residual Fusion: To stabilize optimization and facilitate gradient flow, a residual shortcut connection is incorporated, as depicted in Figure 4. If the input and output channel dimensions differ, a

1 \times 1

convolution with batch normalization is applied to the shortcut path. The refined features are then added to the shortcut and passed through a ReLU activation to produce the final block output.

The overall transformation of the MA-MSC block (Figure 4) can be expressed as

Y = σ (S (X) + R (X))

(8)

where

σ (\cdot)

denotes ReLU activation,

S (\cdot)

represents the residual shortcut, and

R (\cdot)

denotes the morphology-refined multi-scale transformation.

3.3.4. Overall MA-MSCNet Architecture

The overall architecture of MA-MSCNet is illustrated in Figure 5 and summarized in Table 1. The network follows a modular design where feature transformation is performed exclusively by the MA-MSC blocks, while spatial downsampling is achieved through dedicated morphology-aware pooling modules placed between successive blocks, rather than being embedded within the blocks themselves. This design choice ensures that structural information is explicitly preserved during resolution reduction without interfering with the multi-scale feature extraction process.

The network receives an input grayscale MRI slice of size

125 \times 125 \times 1

and first applies a stem convolution layer to extract low-level representations. Subsequently, four sequential MA-MSC blocks with increasing channel dimensions (32, 64, 128, and 256) are employed to progressively enhance structural and semantic representations.

To reduce spatial resolution while preserving morphology-sensitive features, a morphology-aware pooling module is inserted after the first three MA-MSC blocks (i.e., between Block 1 and Block 2, between Block 2 and Block 3, and between Block 3 and Block 4). This results in spatial dimensions of

63 \times 63

,

32 \times 32

, and

16 \times 16

, respectively. The fourth MA-MSC block operates without subsequent downsampling to retain high-level structural information prior to global aggregation.

Finally, global average pooling aggregates spatial information into a compact

1 \times 1 \times 256

representation, which is passed through fully connected layers and a softmax classifier to produce the final multi-class prediction.

3.4. Experimental Setup

3.4.1. Evaluation Metrics

The performance of the proposed MA-MSCNet was evaluated using a set of standard quantitative metrics widely adopted in medical image classification to provide a comprehensive assessment of classification accuracy, reliability, and discriminative capability. As summarized in Table 2, the employed metrics include accuracy, precision, sensitivity, specificity, F1-score. Accuracy measures the overall correctness of the classification model, whereas precision reflects the reliability of positive predictions. Sensitivity quantifies the model’s ability to correctly identify positive samples, while specificity assesses its capability to correctly recognize negative cases. The F1-score represents the harmonic mean of precision and sensitivity, providing a balanced evaluation of classification performance, which is particularly important in medical decision-support systems [40,41]. The AUC-ROC metric evaluates the discriminative power of the model and has been widely adopted for assessing classifier performance [42], including extensions to multi-class settings using one-vs-rest strategies [43].

For multi-class evaluation, macro-averaging was employed to ensure that all classes contribute equally to the final performance scores, regardless of class imbalance. In addition, 95% confidence intervals (CI) were computed for key evaluation metrics to assess the statistical reliability and stability of the reported results.

The mathematical formulations and interpretations of the evaluation metrics used in this study are provided in Table 2. Here, TP, TN, FP, and FN denote true positives, true negatives, false positives, and false negatives, respectively.

3.4.2. Training and Evaluation Protocol

The model development and evaluation followed a structured two-phase protocol designed to ensure stable optimization and unbiased performance estimation.

Model Implementation and Training Configuration: The proposed MA-MSCNet architecture was implemented in TensorFlow/Keras and optimized using the AdamW optimizer with an initial learning rate of

1 \times 10^{- 3}

and a weight decay of

1 \times 10^{- 4}

. Training minimized the categorical cross-entropy loss with label smoothing (

ϵ = 0.1

) to reduce overconfidence in predicted probabilities. A cosine learning-rate schedule with warmup was employed, consisting of a five-epoch linear warmup phase followed by cosine decay to a minimum learning rate of

1 \times 10^{- 5}

.

All models were trained for 50 epochs using a mini-batch size of 8 on input images resized to

125 \times 125

. Data augmentation was implemented using Keras ImageDataGenerator. Hyperparameters were selected based on cross-validation experiments conducted exclusively on the training set, with performance monitored on validation folds to ensure stable convergence. No information from the held-out test set was used during hyperparameter tuning or model development. Reproducibility was further assessed through controlled experiments using multiple fixed random seeds, as detailed in Section 4.1.

All experiments were conducted on the Google Colab platform using an NVIDIA Tesla T4 GPU (16 GB memory), with Python 3.12.13 on a Linux environment (glibc 2.35), TensorFlow 2.19.0, and Keras 3.13.2. Additional libraries include NumPy, OpenCV (cv2), scikit-learn, and Matplotlib (version 3.10.0) for preprocessing, evaluation, and visualization.

Final Model Training and Test Set Evaluation: Following cross-validation, a final MA-MSCNet model was trained from scratch on the full training set (5712 images) using the same hyperparameters. The model was then evaluated once on the independent test set (1311 images), which was not used in any training or validation stage. All reported performance metrics are based on this final test evaluation.

4. Results

4.1. Overall Performance

The cross-validation learning dynamics of the proposed MA-MSCNet are presented in Figure 6, which illustrates the mean ± standard deviation of accuracy and loss across all five folds. As shown in Figure 6a, both training and validation accuracy increase steadily and remain closely aligned throughout the training process, with narrow variance bands indicating consistent convergence across folds. Correspondingly, Figure 6b demonstrates a smooth and progressive reduction in training and validation loss, without noticeable divergence or oscillatory behavior. The small gap between training and validation curves, together with the limited inter-fold variability, confirms stable optimization dynamics and strong generalization capability of the proposed framework.

On the held-out test set, the final MA-MSCNet model achieves an overall accuracy of 99.31%, with a corresponding 95% confidence interval of [98.86%, 99.70%]. Balanced classification performance is further reflected by macro-averaged precision, recall, and F1-score values of 99.30%, 99.28%, and 99.29%, respectively. In addition, the model achieves a Matthews Correlation Coefficient (MCC) of 99.08% with a 95% confidence interval of [98.47%, 99.69%], further confirming the robustness and reliability of the classification performance. Weighted and micro-averaged precision, recall, and F1-score are all 99.31%, confirming consistent performance across classes and robustness to class distribution.

To assess sensitivity to stochastic initialization, MA-MSCNet was trained using five fixed random seeds (11, 22, 42, 55, and 77) under identical conditions. As shown in Table 3, performance remains consistently high (accuracy: 98.63–99.24%, mean 99.02% ± 0.23%), with minimal variation in macro F1-score and MCC. This confirms that the reported results are stable and not dependent on a specific random seed.

Bootstrap resampling (1000 iterations) was employed to compute 95% confidence intervals for key evaluation metrics on the test set. The narrow intervals (e.g., accuracy: 99.31% [98.86%, 99.70%]; MCC: 99.08% [98.47%, 99.69%]) indicate stable and reproducible model behavior across different data splits. These results demonstrate that performance is not sensitive to a particular split but remains consistent across repeated resampling.

The normalized confusion matrix shown in Figure 7 further confirms the strong class-wise performance of MA-MSCNet. The model achieves high sensitivity for glioma (99.00%), meningioma (98.37%), no-tumor (99.75%), and pituitary tumors (100.00%). Corresponding specificity values exceed 99.70% for all classes, with misclassification rates remaining below 1.00%, demonstrating reliable discrimination across all tumor categories.

To further quantify the model’s discriminative performance, receiver operating characteristic (ROC) analysis was conducted. The proposed MA-MSCNet achieves consistently high area under the curve (AUC) values across all tumor classes, with a macro-averaged AUC of 0.9986 and a micro-averaged AUC of 0.9983. Class-wise evaluation yields AUCs of 0.9976 for Glioma, 0.9993 for Meningioma, 0.9975 for No Tumor, and 1.0000 for Pituitary, indicating excellent class separability. These results further confirm the robustness and reliability of the proposed approach under varying decision thresholds. To assess the reliability of predicted probabilities, calibration performance was evaluated using the Brier score and Expected Calibration Error (ECE). The proposed model achieved low Brier scores (0.0048–0.0090) and low ECE values (0.0329–0.0404) across all classes, indicating that predicted probabilities are well aligned with observed outcomes. Moreover, to evaluate clinical utility beyond conventional performance metrics, a decision-theoretic analysis was conducted using the Entechne framework. The model demonstrated consistently high standardized net benefit (SNB = 0.975–0.994) across all classes, substantially outperforming treat-all and treat-none strategies. Notably, the highest net benefit was observed in the no tumor vs. rest setting (NB = 0.307), highlighting the model’s effectiveness in screening and ruling out disease. For tumor detection tasks, near-optimal decision performance was achieved (e.g., glioma SNB = 0.994), confirming strong clinical applicability.

Overall, these results demonstrate that MA-MSCNet achieves robust and consistent performance across cross-validation folds and the independent test set, supporting its effectiveness for reliable multi-class brain tumor classification.

4.2. Explainability and Visual Interpretation Using Grad-CAM

To enhance the transparency of the proposed MA-MSCNet and verify that its decisions rely on meaningful anatomical cues, we employ Gradient-weighted Class Activation Mapping (Grad-CAM) to generate class-discriminative visual explanations. Grad-CAM produces a coarse localization map by backpropagating gradients from the target class score to the final convolutional feature maps and combining these gradients to estimate the spatial contribution of each region to the prediction. In our experiments, Grad-CAM was computed using the final convolutional layer and then upsampled to the input resolution for visualization. For improved clarity and to suppress spurious responses outside the brain region, the resulting heatmaps were masked using a simple brain-region mask derived from intensity thresholding followed by morphological closing, and then overlaid on the original grayscale MRI slices.

Figure 8 illustrates Grad-CAM visualizations for correctly classified test samples from all four classes. For each class, two representative samples are displayed, showing the input grayscale slice (left) and the corresponding Grad-CAM overlay (right). Overall, the activation patterns indicate that MA-MSCNet consistently focuses on discriminative regions associated with tumor presence and morphology. In glioma and meningioma cases, the highlighted areas are concentrated around the lesion regions, whereas pituitary tumor samples show strong activation near the sellar region. For the No Tumor class, the responses are distributed within normal brain tissue without a focal pathological hotspot, which is consistent with the absence of tumor-related structures. These observations support that the proposed architecture learns clinically relevant cues rather than relying on background artifacts.

4.3. Morphology-Aware Feature Visualization

To further illustrate the behavior of the proposed morphology-aware components, Figure 9 presents representative feature maps extracted from the trainable dilation (Dil) and erosion (Ero) layers for a test MRI slice. The dilation maps highlight prominent structural and intensity patterns across the brain region, enhancing salient responses, whereas the erosion maps emphasize boundary-related information and suppress more homogeneous regions, resulting in localized and sparse activations. Since each layer produces multiple feature channels, only a subset of representative maps is shown for clarity, selected based on activation variability. These visualizations provide qualitative evidence that the proposed operators effectively contribute to morphology-aware feature refinement within the learned representations.

4.4. Ablation Study

An extensive ablation study was conducted to systematically evaluate the contribution of each architectural component in the proposed MA-MSCNet. The results, summarized in Table 4, demonstrate that the full model configuration (A0) consistently outperforms all ablated variants. In particular, the complete MA-MSCNet achieves the highest accuracy (99.31%), macro-averaged F1-score (99.29%), and macro-averaged one-vs-rest AUC (99.86%), confirming the effectiveness of jointly integrating multi-scale feature extraction, in-block morphological operations, and morphology-aware pooling.

4.4.1. Effect of Morphology Design (Group A)

Removing all morphological operations (variant A1: Multi-scale CNN without morphological operators, parameter-matched) leads to a substantial performance degradation, with accuracy and F1-score decreasing by approximately 2.7% and 2.8%, respectively, compared with the full MA-MSCNet configuration. This drop directly quantifies the contribution of the proposed morphology-aware inductive bias, since all other architectural components (multi-scale convolutions, residual connections, and parameter count) remain identical. Introducing morphology either within the MA-MSC blocks (A2) or exclusively in the downsampling stage (A3) partially recovers performance; however, neither configuration matches the full model. These findings confirm that jointly integrating morphology during both feature extraction and downsampling is essential for achieving optimal performance.

4.4.2. Effect of Multi-Scale Feature Extraction (Group B)

Single-scale configurations (B1 and B2) consistently underperform compared with the full multi-scale design. In particular, relying solely on larger receptive fields (B2) yields the weakest performance. Incorporating both

3 \times 3

and

5 \times 5

convolutions (B3) improves robustness over single-scale variants, validating the effectiveness of multi-scale feature fusion in capturing complementary spatial information.

4.4.3. Effect of Morphological Operations (Group C)

Using only dilation (C1) or only erosion (C2) results in reduced performance compared with the full morphology configuration. In contrast, combining dilation and erosion (C3) yields clear improvements, demonstrating that these operations provide complementary structural representations that are jointly required for effective morphology-aware learning.

4.4.4. Effect of Downsampling Strategy (Group D)

Replacing the proposed morphology-aware pooling with conventional AvgPool (D2) or MaxPool (D3) leads to inferior performance. The morphology-aware pooling strategy (D1) consistently outperforms standard pooling operations, confirming that morphology-informed downsampling better preserves discriminative structural cues than conventional approaches.

Overall, the ablation results indicate that the performance gains of MA-MSCNet arise from the synergistic integration of multi-scale convolution, trainable morphological operations, and morphology-aware pooling, rather than from any single component in isolation.

4.4.5. Effect of Data Augmentation

To further investigate the impact of data augmentation on model performance, an additional experiment was conducted by training the model without augmentation under the same settings, as summarized in Table 5.

The results indicate that data augmentation improves generalization, leading to higher accuracy and MCC values, and contributes to more stable classification performance.

4.4.6. Effect of Input Resolution

To evaluate the impact of input resolution, additional experiments were conducted using 96 × 96, 160 × 160, and 224 × 224 inputs under identical training settings. As shown in Table 6, performance improves from 96 × 96 to 160 × 160, but does not further improve at 224 × 224, while computational cost increases substantially. Notably, the proposed 125 × 125 configuration achieves the best overall performance (99.31% accuracy), indicating that classification accuracy does not monotonically increase with resolution. Instead, an intermediate resolution provides a better balance between spatial detail and generalization. These results suggest that the proposed morphology-aware architecture effectively captures discriminative structural features without requiring high-resolution inputs.

4.5. Per-Class Performance Analysis

The class-wise evaluation (Table 7) demonstrates consistently high and well-balanced performance across all tumor categories. Sensitivity exceeds 98.37% for all classes and reaches 100% for Class 3, while specificity remains above 99.70% across all categories, indicating effective suppression of false-positive predictions. Precision and F1-score values are uniformly high, with particularly strong performance observed for Class 2. In addition, the one-vs-rest AUC values approach or reach 100% for all classes, confirming excellent discriminative capability and robust class separation.

All reported per-class metrics are accompanied by 95% confidence intervals computed using bootstrap resampling, which exhibit narrow uncertainty ranges and confirm the statistical reliability and stability of the proposed MA-MSCNet.

The strong agreement between the overall evaluation metrics, per-class performance results, and the normalized confusion matrix confirms the robustness and reliability of the proposed MA-MSCNet across all tumor categories. Misclassifications are rare and primarily occur between visually similar tumor classes, particularly in cases with ambiguous boundaries or overlapping structural characteristics. Importantly, the misclassification rate remains below 1.63% for all classes, and no systematic bias toward any specific category is observed, indicating effective control of both false-negative and false-positive errors.

4.6. Computational Efficiency Analysis

To assess computational efficiency, MA-MSCNet was compared with representative CNN, modern convolutional, transformer, and hybrid architectures under an identical experimental setup, including the same input resolution, batch size, and hardware configuration (Table 8). MA-MSCNet maintains a compact design with 2.15 M parameters and requires only 0.43 GFLOPs, both substantially lower than all evaluated baselines. In terms of runtime, it achieves the lowest inference latency of 0.66 ms per image, while all baseline models incur higher computational cost and inference time. These results demonstrate that MA-MSCNet achieves an effective accuracy–efficiency trade-off, where high predictive performance is obtained through architectural design, including multi-scale feature extraction and morphology-aware operations, rather than increased model complexity. Overall, this supports the suitability of MA-MSCNet for deployment in resource-constrained settings. All results correspond to the final MA-MSCNet configuration (A0) and are obtained under a fixed and reproducible experimental protocol.

4.7. Contextual Comparison with Existing Methods

To further assess the effectiveness of the proposed MA-MSCNet, its performance was compared with several recent representative methods for multi-class brain tumor classification, as summarized in Table 9. The comparison includes both conventional deep learning models and transfer learning-based approaches reported in the literature.

It is important to emphasize that the comparisons presented in Table 9 are based on reported results from the literature and were not reproduced under a unified experimental setting. Differences in dataset splits, preprocessing strategies, input resolutions, augmentation pipelines, and training protocols may therefore exist. As such, these comparisons are intended to provide contextual insight rather than direct, controlled performance evaluation.

As shown in the table, earlier CNN-based and hybrid methods generally achieve accuracies in the range of 97.00–97.84%, while deeper or fine-tuned architectures report improved performance approaching 98.90–99.66%. The proposed MA-MSCNet achieves an overall accuracy of 99.31%, along with macro-averaged precision, recall, and F1-score values of 99.30%, 99.28%, and 99.29%, respectively, placing it within the upper range of reported results.

To ensure clarity and avoid misleading comparisons, a rigorous and fair evaluation under identical experimental conditions is presented separately in Table 10, where all models are trained and evaluated using the same dataset split, preprocessing pipeline, augmentation strategy, and comparable training protocol.

Unlike approaches that rely primarily on deeper backbones or transfer learning, MA-MSCNet explicitly incorporates multi-scale feature extraction and trainable morphological operations, enabling enhanced structural representation learning. This design contributes to stable and balanced performance across evaluation metrics, as demonstrated in the controlled experiments.

To address the limitation of restricted benchmarking and to provide a more comprehensive evaluation, we expanded the controlled comparison to include a diverse set of contemporary baseline architectures spanning multiple design paradigms. Specifically, the evaluation now includes conventional CNNs (ResNet50, DenseNet201), modern convolutional architectures (ConvNeXt-Tiny, EfficientNetV2), transformer-based models (ViT-B16, Swin-Tiny), and hybrid architectures (CvT).

All models were trained and evaluated under identical experimental conditions, including the same dataset split, 125 × 125 grayscale input, preprocessing pipeline, augmentation strategy, and a unified training protocol, including the same optimizer, learning rate schedule, batch size, and number of epochs. This ensures a fair and unbiased comparison across different architectural families. The results are summarized in Table 10.

The expanded benchmarking results demonstrate that, while modern transformer-based and hybrid architectures achieve competitive performance, they do not consistently outperform convolutional models in this setting. This can be attributed to the relatively limited dataset size and the domain-specific characteristics of medical imaging, where convolutional inductive biases remain advantageous.

Notably, the proposed MA-MSCNet achieves the highest performance across all reported metrics, indicating its effectiveness in capturing both local structural details and multi-scale contextual information. These findings highlight that the proposed architecture provides a favorable balance between accuracy and robustness when compared with diverse contemporary models.

5. Discussion

This section interprets the experimental findings and highlights the key contributions of MA-MSCNet, with emphasis on generalization, reproducibility, and limitations in brain tumor MRI classification.

5.1. Overall Performance and Architectural Contribution

MA-MSCNet demonstrates that embedding morphology-aware inductive bias within multi-scale deep architectures improves brain tumor MRI classification while maintaining a compact design of 2.15 M parameters. The model achieves highly competitive accuracy (99.31%) with balanced class-wise performance and stable generalization. It is important to consider that the employed dataset presents relatively well-defined class boundaries compared with more heterogeneous clinical data, which may reduce task complexity. Accordingly, the contribution of this work extends beyond incremental accuracy gains. Instead, it is reflected in the consistent performance across folds, balanced class-wise behavior, and the effective incorporation of morphology-aware inductive bias within a compact and interpretable architecture.

The key contribution lies in integrating trainable morphological operations directly into hierarchical feature extraction and downsampling. Unlike conventional texture-dominant CNNs, the proposed design promotes structure-aware representation learning by enhancing boundary definition and morphological contrast. Grad-CAM visualizations indicate that the proposed model focuses on morphologically relevant tumor regions within brain MRI slices, suggesting clinically meaningful attention patterns. Across different samples and tumor types, the highlighted regions consistently correspond to areas of abnormal tissue and structural irregularities, rather than background or non-informative regions. This behavior is further supported by the morphology-aware design of the proposed architecture, which encourages sensitivity to boundary definition and structural contrast. However, due to the absence of pixel-level tumor annotations, direct overlap-based validation (e.g., Dice coefficient) is not feasible. Future work will incorporate datasets with expert-annotated segmentation masks to enable more rigorous clinical validation of model explanations. Ablation results further show that the performance gains arise from the synergistic interaction between multi-scale learning and morphology-aware refinement rather than increased model complexity.

Balanced sensitivity, specificity, and AUC values across classes indicate reliable discrimination without systematic bias. Importantly, the architecture achieves competitive performance without relying on deep pretrained backbones, suggesting that purpose-built structural priors can match larger models while offering improved parameter efficiency and interpretability.

5.2. Analysis of Performance Gains

The proposed MA-MSCNet achieves 99.31% accuracy and a macro-averaged AUC of 0.9986 under an identical experimental setup. To analyze the source of this performance, we consider three complementary observations.

First, the dataset does not trivially saturate performance. Representative architectures evaluated under the same conditions (125 × 125 grayscale input, identical split, augmentation, and training protocol) yield substantially lower results, including Swin-Tiny (90.16%), ConvNeXt-Tiny (94.36%), ResNet50 (90.08%), and EfficientNetV2 (71.55%). This indicates that performance is not uniformly high across models. However, the dataset consists of contrast-enhanced MRI slices with relatively consistent acquisition conditions and well-defined tumor boundaries, which may reduce classification complexity compared with real-world clinical data.

Second, the morphology-aware design provides a measurable and isolated gain. The ablation study (Table 4) compares the full MA-MSCNet (A0) with a parameter-matched variant without morphological operators (A1). Removing morphology reduces accuracy by 2.67% and F1-score by 2.78%, directly quantifying its contribution under controlled conditions.

Third, the improvement is concentrated in structurally ambiguous classes. The largest gains are observed in glioma and meningioma, where boundary complexity is higher, while more regular classes such as no tumor and pituitary show smaller improvements. This behavior is consistent with the role of morphology-aware operations in enhancing structural discrimination.

Overall, these results indicate that the performance gain of MA-MSCNet is primarily driven by its morphology-aware multi-scale design, while dataset characteristics and preprocessing also contribute. Notably, the use of relatively clean, slice-level MRI data with consistent acquisition settings and clear class boundaries may partially contribute to the high observed accuracy, and may not fully reflect the variability encountered in real-world clinical scenarios.

5.3. Generalization, Robustness, and Reproducibility

Despite strong internal validation, this study relies on a single publicly available dataset for evaluation. While the Kaggle Brain Tumor MRI dataset is widely used and facilitates comparison with prior work, it may not fully capture the variability present in real-world clinical settings, including differences in imaging protocols, scanners, and patient populations. Therefore, although the proposed model demonstrates stable and consistent performance, further validation on external multi-center datasets is necessary to comprehensively assess its generalization. Future work will focus on external validation, as well as domain generalization and uncertainty-aware modeling, to further strengthen robustness under distribution shifts. The morphology-aware paradigm may also extend to other structure-critical medical imaging tasks.

5.4. Limitations of Baseline Comparisons

The proposed model was compared against several strong and representative baseline architectures (ResNet50, DenseNet201, ConvNeXt-Tiny) under a consistent and controlled evaluation protocol. These models were selected due to their widespread use and their ability to be fairly adapted to the same grayscale input setting and training configuration. While this comparison provides a reliable benchmark, it does not encompass all recent state-of-the-art architectures, such as EfficientNet variants, transformer-based models, or hybrid approaches. Therefore, the results should be interpreted as demonstrating strong and competitive performance under controlled conditions rather than an absolute state-of-the-art claim. Future work will extend this evaluation to include these advanced architectures under consistent experimental settings.

5.5. Future Research Directions

Future work will extend the current study through external validation on multi-center datasets and the incorporation of uncertainty-aware modeling to better assess robustness under distribution shifts. In addition, integrating morphology-aware representations with transformer-based architectures may enhance global contextual modeling while preserving structural sensitivity. Combining radiomics-based features with the proposed framework may further provide complementary information and improve interpretability.

6. Conclusions

This study presented MA-MSCNet, a morphology-aware multi-scale deep learning framework for brain tumor MRI classification. While the individual components of the proposed framework have been explored in prior studies, the contribution of this work lies in their integration into a unified morphology-aware multi-scale architecture tailored for structural feature learning in brain MRI analysis. By embedding structural inductive bias through trainable morphological operations within hierarchical feature extraction and downsampling, the model enables structure-aware and interpretable representation learning beyond conventional texture-driven CNNs. The proposed approach achieves strong and consistent performance, reaching 99.31% accuracy with balanced class-wise metrics while maintaining a compact design (2.15 M parameters), and Grad-CAM analysis further supports that the model focuses on morphologically relevant tumor regions. However, the evaluation is conducted on a single publicly available benchmark dataset; therefore, the generalization of the proposed model to external clinical data is not guaranteed. Variations in imaging protocols, scanners, and patient populations may affect real-world performance. Consequently, the conclusions of this study are limited to the utilized dataset, and further validation on independent multi-center datasets is required.

Author Contributions

Conceptualization, H.A.; methodology, H.A.; software, H.A. and M.B.; validation, H.A. and M.B.; formal analysis, H.A.; investigation, H.A.; resources, H.A.; data curation, H.A.; writing—original draft preparation, H.A.; writing—review and editing, M.B.; visualization, M.B.; supervision, H.A.; project administration, H.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The brain tumor MRI dataset analyzed in this study is publicly available on Kaggle at the following URL: https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset/versions/1 (accessed on 15 July 2025). To enhance transparency and reproducibility, an inference-ready implementation of the proposed MA-MSCNet framework, including pretrained weights, Grad-CAM visualization, and representative test samples, is publicly available at: https://github.com/HelalaAShehri/MA-MSCNet-BrainTumor (accessed on 9 April 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MA-MSCNet	Morphology-Aware Multi-Scale Convolutional Network
MRI	Magnetic Resonance Imaging
CNN	Convolutional Neural Network
Grad-CAM	Gradient-weighted Class Activation Mapping
ROC	Receiver Operating Characteristic
AUC	Area Under the Curve
CV	Cross-Validation
ReLU	Rectified Linear Unit
TP	True Positives
TN	True Negatives
FP	False Positives
FN	False Negatives
CI	Confidence Interval
GAP	Global Average Pooling

References

Alshomrani, F. Challenges and Advances in Classifying Brain Tumors: An Overview of Machine, Deep Learning, and Hybrid Approaches with Future Perspectives in Medical Imaging. Curr. Med. Imaging 2025, 21, e15734056365191. [Google Scholar] [CrossRef] [PubMed]
Hamza, A.; Damaševičius, R. Deep Learning for Brain Tumor Segmentation and Classification: A Systematic Review of Methods and Trends. Comput. Mater. Contin. 2025, 86, 1–41. [Google Scholar] [CrossRef]
Bouhafra, S.; El Bahi, H. Deep Learning Approaches for Brain Tumor Detection and Classification Using MRI Images (2020 to 2024): A Systematic Review. J. Imaging Inform. Med. 2025, 38, 1403–1433. [Google Scholar] [CrossRef]
Sandhiya, B.; Kanaga Suba Raja, S. Deep learning and optimized learning machine for brain tumor classification. Biomed. Signal Process. Control 2024, 89, 105778. [Google Scholar] [CrossRef]
Badawy, B.; Samir, R.S.; Tarek, Y.; Ahmed, M.A.; Ibrahim, R.; Ahmed, M.; Hassan, M. Brain Tumor classification and Segmentation using Deep Learning. arXiv 2023, arXiv:2304.07901. [Google Scholar] [CrossRef]
Kang, M.; Ting, C.M.; Ting, F.F.; Phan, R.C.W. BGF-YOLO: Enhanced YOLOv8 with Multiscale Attentional Feature Fusion for Brain Tumor Detection. arXiv 2023, arXiv:2309.12585. [Google Scholar] [CrossRef]
Geirhos, R.; Rubisch, P.; Michaelis, C.; Bethge, M.; Wichmann, F.A.; Brendel, W. ImageNet-Trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 29 (NeurIPS 2016), Proceedings of the 30th Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29, pp. 4898–4906. [Google Scholar]
Kamnitsas, K.; Ledig, C.; Newcombe, V.F.J.; Simpson, J.P.; Kane, A.D.; Menon, D.K.; Rueckert, D.; Glocker, B. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 2017, 36, 61–78. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Computer Vision—ECCV 2018, Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 11211, pp. 833–851. [Google Scholar] [CrossRef]
Srinivasan, S.; Francis, D.; Mathivanan, S.K.; Rajadurai, H.; Shivahare, B.D.; Shah, M.A. A hybrid deep CNN model for brain tumor image multi-classification. BMC Med. Imaging 2024, 24, 21. [Google Scholar] [CrossRef]
Ke, L.; Hu, G.; Zhao, M.; Liu, Z.; Lv, Z.; Yang, Y. Brain tumor classification from MRI images using a multi-scale channel attention CNN integrated with SVM. Sci. Rep. 2026, 16, 6297. [Google Scholar] [CrossRef]
Guzzi, L.; Zuluaga, M.A.; Lareyre, F.; Di Lorenzo, G.; Goffart, S.; Chierici, A.; Raffort, J.; Delingette, H. Differentiable Soft Morphological Filters for Medical Image Segmentation. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2024, Proceedings of the 27th International Conference, Marrakesh, Morocco, 6–10 October 2024; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2024; pp. 177–187. [Google Scholar] [CrossRef]
Blusseau, S. Training Morphological Neural Networks with Gradient Descent: Some Theoretical Insights. In Discrete Geometry and Mathematical Morphology (DGMM), Proceedings of the Third International Joint Conference, DGMM 2024, Florence, Italy, 15–18 April 2024; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2024; Volume 14605, pp. 229–241. [Google Scholar] [CrossRef]
Franchi, G.; Fehri, A.; Yao, A. Deep morphological networks. Pattern Recognit. 2020, 102, 107246. [Google Scholar] [CrossRef]
Kumar, V.; Singh, R.S.; Dua, Y. Morphologically dilated convolutional neural network for hyperspectral image classification. Signal Process. Image Commun. 2022, 101, 116549. [Google Scholar] [CrossRef]
El Amoury, S.; Smili, Y.; Fakhri, Y. Simulated Annealing-Based Hyperparameter Optimization of a Convolutional Neural Network for MRI Brain Tumor Classification. Mach. Learn. Knowl. Extr. 2025, 7, 50. [Google Scholar] [CrossRef]
Zacharaki, E.I.; Wang, S.; Chawla, S.; Yoo, D.S.; Wolf, R.; Melhem, E.R.; Davatzikos, C. Classification of Brain Tumor Type and Grade Using MRI Texture and Shape in a Machine Learning Scheme. Magn. Reson. Med. 2009, 62, 1609–1618. [Google Scholar] [CrossRef]
Pereira, S.; Pinto, A.; Alves, V.; Silva, C.A. Brain Tumor Segmentation Using Convolutional Neural Networks in MRI Images. IEEE Trans. Med. Imaging 2016, 35, 1240–1251. [Google Scholar] [CrossRef]
Asif, S.; Zhao, M.; Tang, F.; Zhu, Y. An enhanced deep learning method for multi-class brain tumor classification using deep transfer learning. Multimed. Tools Appl. 2023, 82, 31709–31736. [Google Scholar] [CrossRef]
Deepak, S.; Ameer, P.M. Brain tumor classification using deep CNN features via transfer learning. Comput. Biol. Med. 2019, 111, 103345. [Google Scholar] [CrossRef] [PubMed]
Anaraki, A.K.; Ayati, M.; Kazemi, F. Magnetic resonance imaging-based brain tumor grades classification and grading via convolutional neural networks and genetic algorithms. Biocybern. Biomed. Eng. 2019, 39, 63–74. [Google Scholar] [CrossRef]
Sharma, A.K.; Nandal, A.; Dhaka, A.; Polat, K.; Alwadie, R.; Alenezi, F.; Alhudhaif, A. HOG transformation based feature extraction framework in modified Resnet50 model for brain tumor detection. Biomed. Signal Process. Control 2023, 84, 104737. [Google Scholar] [CrossRef]
Bukaita, W.; Vadde, V. Comparative Evaluation of CNN and ResNet18 Architectures for MRI-Based Brain Tumor Classification Using Deep Learning. Med. Res. Arch. 2025, 13, 1–12. [Google Scholar] [CrossRef]
Afshar, P.; Plataniotis, K.N.; Mohammadi, A. Capsule Networks for Brain Tumor Classification Based on MRI Images and Coarse Tumor Boundaries. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1368–1372. [Google Scholar] [CrossRef]
Rasheed, Z.; Ma, Y.K.; Ullah, I.; Ghadi, Y.Y.; Khan, M.Z.; Khan, M.A.; Abdusalomov, A.; Alqahtani, F.; Shehata, A.M. Brain Tumor Classification from MRI Using Image Enhancement and Convolutional Neural Network Techniques. Brain Sci. 2023, 13, 1320. [Google Scholar] [CrossRef]
Abdusalomov, A.; Rakhimov, M.; Karimberdiyev, J.; Belalova, G.; Cho, Y.I. Enhancing Automated Brain Tumor Detection Accuracy Using Artificial Intelligence Approaches for Healthcare Environments. Bioengineering 2024, 11, 627. [Google Scholar] [CrossRef]
Zhu, Z.; Khan, M.A.; Wang, S.; Zhang, Y. RBEBT: A ResNet-Based BA-ELM for Brain Tumor Classification. Comput. Mater. Contin. 2023, 74, 101–111. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Z.; Xue, Y.; Cheng, N.; Shen, B.; Hou, L.; Jin, L. MRI brain tumor classification based on CNN features and machine learning classifiers. J. Ambient Intell. Humaniz. Comput. 2024, 16, 233–242. [Google Scholar] [CrossRef]
Islam, M.N.; Azam, M.S.; Islam, M.S.; Kanchan, M.H.; Parvez, A.H.M.S.; Islam, M.M. An improved deep learning-based hybrid model with ensemble techniques for brain tumor detection from MRI image. Inform. Med. Unlocked 2024, 47, 101483. [Google Scholar] [CrossRef]
Nabi, M.S.; Rashidul Islam, M.; Alam, S.; Touhami, M.; Hossain, M.S.; Faizal Ahmad Fauzi, M. AI-Driven Diagnosis of Neurological Disorders Using Brain MRI. In Proceedings of the 2025 Multimedia University Engineering Conference (MECON), Cyberjaya, Malaysia, 21–23 July 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–6. [Google Scholar] [CrossRef]
Natha, S.; Laila, U.; Gashim, I.A.; Mahboob, K.; Saeed, M.N.; Noaman, K.M. Automated Brain Tumor Identification in Biomedical Radiology Images: A Multi-Model Ensemble Deep Learning Approach. Appl. Sci. 2024, 14, 2210. [Google Scholar] [CrossRef]
Hekmat, A.; Zhang, Z.; Khan, S.U.R.; Shad, I.; Bilal, O. An Attention-Fused Architecture for Brain Tumor Diagnosis. Biomed. Signal Process. Control 2025, 101, 107221. [Google Scholar] [CrossRef]
Wang, J.; Lu, S.-Y.; Wang, S.-H.; Zhang, Y.-D. RanMerFormer: Randomized vision transformer with token merging for brain tumor classification. Neurocomputing 2024, 573, 127216. [Google Scholar] [CrossRef]
Elhadidy, M.S.; Elgohr, A.T.; El-geneedy, M.; Akram, S.; Kasem, H.M. Comparative Analysis for Accurate Multi-Classification of Brain Tumor Based on Significant Deep Learning Models. Comput. Biol. Med. 2025, 188, 109872. [Google Scholar] [CrossRef] [PubMed]
Aloraini, M.; Khan, A.; Aladhadh, S.; Habib, S.; Alsharekh, M.F.; Islam, M. Combining the Transformer and Convolution for Effective Brain Tumor Classification Using MRI Images. Appl. Sci. 2023, 13, 3680. [Google Scholar] [CrossRef]
Chinga, A.; Bendezu, W.; Angulo, A. Comparative Study of CNN Architectures for Brain Tumor Classification Using MRI: Exploring GradCAM for Visualizing CNN Focus. Eng. Proc. 2025, 83, 22. [Google Scholar] [CrossRef]
Kadam, A. Brain Tumor Classification using Deep Learning Algorithms. Int. J. Res. Appl. Sci. Eng. Technol. 2021, 9, 417–426. [Google Scholar] [CrossRef]
Zhao, Z. The effect of input size on the accuracy of a convolutional neural network performing brain tumor detection. In Proceedings of the International Conference on Mechatronics Engineering and Artificial Intelligence (MEAI 2022), Changsha, China, 11–13 November 2022; Volume 12596, p. 1259617. [Google Scholar] [CrossRef]
Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar] [CrossRef]
Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Hand, D.J.; Till, R.J. A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Mach. Learn. 2001, 45, 171–186. [Google Scholar] [CrossRef]
Özkaraca, O.; Bağrıaçık, O.I.; Gürüler, H.; Khan, F.; Hussain, J.; Khan, J.; Laila, U.E. Multiple Brain Tumor Classification with Dense CNN Architecture Using Brain MRI Images. Life 2023, 13, 349. [Google Scholar] [CrossRef] [PubMed]
Gómez-Guzmán, M.A.; Jiménez-Beristaín, L.; García-Guerrero, E.E.; López-Bonilla, O.R.; Tamayo-Perez, U.J.; Esqueda-Elizondo, J.J.; Palomino-Vizcaino, K.; Inzunza-González, E. Classifying Brain Tumors on Magnetic Resonance Imaging by Using Convolutional Neural Networks. Electronics 2023, 12, 955. [Google Scholar] [CrossRef]
Raouf, M.H.G.; Fallah, A.; Rashidi, S. Use of Discrete Cosine-Based Stockwell Transform in the Binary Classification of Magnetic Resonance Images of Brain Tumor. In Proceedings of the 2022 29th National and 7th International Iranian Conference on Biomedical Engineering (ICBME), Tehran, Iran, 21–22 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 293–298. [Google Scholar] [CrossRef]
Alnemer, A.; Rasheed, J. An Efficient Transfer Learning-Based Model for Classification of Brain Tumor. In Proceedings of the 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Türkiye, 21–23 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 478–482. [Google Scholar] [CrossRef]
Li, Z.; Dib, O. Empowering Brain Tumor Diagnosis through Explainable Deep Learning. Mach. Learn. Knowl. Extr. 2024, 6, 2248–2281. [Google Scholar] [CrossRef]

Figure 1. Representative MRI samples from the four diagnostic classes: (a) Glioma, (b) Meningioma, (c) Pituitary Tumor, and (d) No Tumor.

Figure 2. Trainable morphological operators. (a) MorphoDilate:

x + W_{d}

followed by max pooling (

k \times k

, stride 1). (b) MorphoErode:

- MaxPool (- x)

with learnable channel-wise offset

W_{e}

. Both preserve spatial resolution and enable adaptive structural refinement.

Figure 2. Trainable morphological operators. (a) MorphoDilate:

x + W_{d}

followed by max pooling (

k \times k

, stride 1). (b) MorphoErode:

- MaxPool (- x)

with learnable channel-wise offset

W_{e}

. Both preserve spatial resolution and enable adaptive structural refinement.

Figure 3. Morphology-aware downsampling. Parallel dilation and erosion are concatenated with the original feature map, fused via

1 \times 1

convolution, and reduced using average pooling (stride 2) to preserve structural information before spatial compression.

Figure 3. Morphology-aware downsampling. Parallel dilation and erosion are concatenated with the original feature map, fused via

1 \times 1

convolution, and reduced using average pooling (stride 2) to preserve structural information before spatial compression.

Figure 4. Architecture of the proposed Morphology-Aware Multi-Scale (MA-MSC) block. The input feature map is first processed through parallel

3 \times 3

and

5 \times 5

convolutional branches for multi-scale feature extraction. The fused representation undergoes trainable morphological refinement via dilation and erosion, followed by feature fusion using a

1 \times 1

convolution and batch normalization. A residual shortcut connection is applied to stabilize learning, and ReLU activation produces the final block output.

Figure 4. Architecture of the proposed Morphology-Aware Multi-Scale (MA-MSC) block. The input feature map is first processed through parallel

3 \times 3

and

5 \times 5

convolutional branches for multi-scale feature extraction. The fused representation undergoes trainable morphological refinement via dilation and erosion, followed by feature fusion using a

1 \times 1

convolution and batch normalization. A residual shortcut connection is applied to stabilize learning, and ReLU activation produces the final block output.

Figure 5. Architecture of MA-MSCNet composed of four sequential MA-MSC blocks with progressive channel expansion (32–256). Morphology-aware pooling is applied in the first three blocks, while the final block preserves spatial resolution. Global average pooling and fully connected layers produce the classification output.

Figure 6. Cross-validation learning dynamics of the proposed MA-MSCNet. (a) Mean ± standard deviation of training and validation accuracy across five folds, demonstrating stable convergence and minimal variance. (b) Mean ± standard deviation of training and validation loss, indicating smooth optimization without signs of overfitting.

Figure 7. Normalized confusion matrix (percentage) of the proposed MA-MSCNet on the test set, illustrating strong and balanced class-wise performance across all tumor categories.

Figure 8. Grad-CAM visualizations for test samples. For each class, two examples show the original MRI (left) and Grad-CAM overlay (right). Activation maps highlight class-discriminative regions. P indicates the predicted probability of the true class.

Figure 9. Representative Dil and Ero feature maps for a test MRI slice.

Table 1. Stage-wise architectural configuration of the proposed MA-MSCNet.

Stage	Operation	Output Size
Input	Grayscale MRI slice	$125 \times 125 \times 1$
Stem	Conv ( $3 \times 3$ ) + BN + ReLU	$125 \times 125 \times 32$
Block 1	MA-MSC Block (32)	$125 \times 125 \times 32$
Pool 1	Morphology-aware pooling	$63 \times 63 \times 32$
Block 2	MA-MSC Block (64)	$63 \times 63 \times 64$
Pool 2	Morphology-aware pooling	$32 \times 32 \times 64$
Block 3	MA-MSC Block (128)	$32 \times 32 \times 128$
Pool 3	Morphology-aware pooling	$16 \times 16 \times 128$
Block 4	MA-MSC Block (256)	$16 \times 16 \times 256$
Head	GAP + Fully connected layers	128
Output	Softmax classifier	4

Table 2. Evaluation metrics used for assessing the performance of the proposed MA-MSCNet.

Metric	Formula	Description
Accuracy	$\frac{T P + T N}{T P + T N + F P + F N}$	Overall correctness of the classification model
Precision	$\frac{T P}{T P + F P}$	Reliability of positive predictions
Sensitivity (Recall)	$\frac{T P}{T P + F N}$	Ability to correctly identify positive cases
Specificity	$\frac{T N}{T N + F P}$	Ability to correctly identify negative cases
F1-Score	$\frac{2 T P}{2 T P + F P + F N}$	Harmonic mean of precision and recall
AUC-ROC	$\int_{0}^{1} T P R (F P R) d (F P R)$	Discriminative capability evaluated using one-vs-rest receiver operating characteristic analysis
MCC	$\frac{T P \cdot T N - F P \cdot F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}}$	Matthews Correlation Coefficient metric accounting for all confusion matrix terms, suitable for imbalanced data

AUC-ROC is computed numerically from the ROC curve using a one-vs-rest strategy.

Table 3. Effect of random seed on model performance. All experiments were conducted under identical settings to assess reproducibility.

Seed	Accuracy (%)	F1-Score (%)	MCC
11	99.16	99.14	0.9887
22	99.24	99.22	0.9898
42	99.08	99.04	0.9877
55	99.01	98.97	0.9867
77	98.63	98.59	0.9816
Mean ± Std	99.02 ± 0.23	98.99 ± 0.24	0.9869 ± 0.0029

Table 4. Component-wise ablation results evaluating the contribution of different architectural designs in MA-MSCNet.

Group	Variant	Description	Accuracy	F1-Score	AUC-ROC
A	A0	Full MA-MSCNet (multi-scale + in-block morphology + morphology-aware pooling)	99.31%	99.29%	99.86%
	A1	Multi-scale CNN (no morph.)	96.64%	96.51%	99.70%
	A2	Morphology in MA-MSC blocks only; AvgPool downsampling	97.25%	97.13%	99.82%
	A3	Morphology-aware pooling only; no in-block morphology	96.49%	96.30%	99.75%
B	B1	Single-scale $3 \times 3$ only; full morphology	97.10%	96.98%	99.76%
	B2	Single-scale $5 \times 5$ only; full morphology	95.80%	95.59%	99.69%
	B3	Multi-scale ( $3 \times 3 + 5 \times 5$ ); full morphology	96.95%	96.79%	99.63%
C	C1	Dilation only; erosion removed	97.03%	96.89%	99.78%
	C2	Erosion only; dilation removed	95.80%	95.68%	99.38%
	C3	Dilation + erosion (full morphology)	97.71%	97.60%	99.86%
D	D1	Morphology-aware pooling (proposed)	97.25%	97.20%	99.73%
	D2	AvgPool2D downsampling	96.87%	96.72%	99.70%
	D3	MaxPool2D downsampling	97.10%	96.96%	99.80%

Table 5. Impact of data augmentation on model performance.

Setting	Accuracy	MCC
With augmentation	99.31%	99.08%
Without augmentation	98.09%	97.45%

Table 6. Effect of input resolution on classification performance and computational cost. All experiments were conducted under identical training settings.

Resolution	Accuracy (%)	F1-Score (%)	MCC (%)	GFLOPs	Inference (ms)
96 × 96	98.47	98.45	97.96	2.22	0.92
125 × 125	99.31	99.29	99.08	0.43	0.66
160 × 160	98.86	98.84	98.47	6.16	2.20
224 × 224	98.55	98.49	98.06	12.07	4.27

Table 7. Per-class performance metrics of the proposed MA-MSCNet on the test set.

Class	Precision	Sensitivity	Specificity	F1-Score	AUC
Class 0	99.33%	99.00%	99.80%	99.17%	99.76%
Class 1	99.34%	98.37%	99.80%	98.85%	99.93%
Class 2	99.51%	99.75%	99.78%	99.63%	99.75%
Class 3	99.01%	100.00%	99.70%	99.50%	100.00%
Macro Average	99.30%	99.28%	99.77%	99.29%	99.86%

Table 8. Computational efficiency comparison of MA-MSCNet and baseline models. All experiments are conducted under the same hardware and input configuration.

Model	Params (M)	GFLOPs	Inference (ms/Image)
Convolutional Neural Networks (CNNs)
ResNet50	49.68	7.80	4.23 ± 1.28
DenseNet201	42.79	8.68	5.14 ± 0.07
Modern Convolutional Architectures
ConvNeXt-Tiny	29.62	2.24	1.34 ± 0.02
EfficientNetV2	118.11	8.06	3.59 ± 0.03
Transformer-Based Architectures
ViT-B16	85.90	11.15	5.62 ± 0.81
Swin-Tiny	27.52	4.10	4.60 ± 0.14
Hybrid Architectures
CvT	19.61	8.15	4.26 ± 0.11
MA-MSCNet (Proposed)	2.15	0.43	0.66 ± 0.11

Table 9. Contextual comparison of MA-MSCNet with previously reported methods. Results are taken from the literature and are not directly comparable due to differences in input resolution, preprocessing, dataset partitioning, and evaluation protocols.

Authors	Methodology	Precision	Recall	F1-Score	Accuracy	Protocol Consistency
Özkaraca et al. [44]	VGG16 + DenseNet	96.00%	96.50%	96.00%	97.00%	Different input resolution (RGB) and training pipeline
Gómez-Guzmán et al. [45]	Pretrained CNNs	97.97%	96.59%	97.27%	97.12%	Variable input resolution and mixed evaluation protocol
Raouf et al. [46]	DCST + SVM	97.80%	96.60%	97.20%	97.71%	Different preprocessing pipeline
Rasheed et al. [26]	Image enhancement + CNN	97.85%	97.85%	97.90%	97.84%	Different preprocessing (CLAHE + sharpening) and dataset split
Alnemer et al. [47]	Modified ResNet152V2	–	–	–	98.90%	Not explicitly reported
Li et al. [48]	ResNet-50	99.00%	99.00%	99.00%	98.70%	Different training configuration and input pipeline
Proposed MA-MSCNet	Multi-scale CNN + morphology-aware learning	99.30%	99.28%	99.29%	99.31%	Controlled (this work)

Note: Prior methods were not re-evaluated under a unified experimental setting; reported results reflect their original protocols.

Table 10. Controlled comparison of the proposed MA-MSCNet with diverse baseline architectures under identical experimental conditions (same dataset split, 125 × 125 grayscale input, preprocessing pipeline, augmentation strategy, and training protocol). All metrics are reported as percentages with 95% confidence intervals in brackets.

Model	Accuracy (%)	F1-Score (%)	MCC (%)
Convolutional Neural Networks (CNNs)
ResNet50	90.08 [88.41, 91.69]	89.45 [87.78, 91.09]	86.77 [84.67, 89.11]
DenseNet201	97.86 [97.03, 98.55]	97.77 [96.94, 98.55]	97.14 [96.02, 98.16]
Modern Convolutional Architectures
ConvNeXt-Tiny	94.36 [93.14, 95.58]	94.01 [92.69, 95.30]	92.44 [90.73, 94.07]
EfficientNetV2	71.55 [69.11, 73.91]	68.85 [66.43, 71.22]	62.16 [59.15, 65.40]
Transformer-Based Architectures
ViT-B16	85.58 [83.40, 87.62]	84.67 [82.42, 86.87]	80.35 [77.62, 83.02]
Swin-Tiny	90.16 [88.41, 91.69]	89.67 [87.98, 91.27]	86.80 [84.70, 88.95]
Hybrid Architectures
CvT	82.84 [80.51, 85.12]	81.86 [79.51, 84.10]	77.12 [74.11, 80.01]
MA-MSCNet (Proposed)	99.31 [98.86, 99.70]	99.29 [98.77, 99.72]	99.08 [98.47, 99.69]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

AlShehri, H.; Busaleh, M. Morphology-Aware Multi-Scale Deep Representation Learning for Interpretable Knowledge Extraction in Brain Tumor MRI. Mach. Learn. Knowl. Extr. 2026, 8, 119. https://doi.org/10.3390/make8050119

AMA Style

AlShehri H, Busaleh M. Morphology-Aware Multi-Scale Deep Representation Learning for Interpretable Knowledge Extraction in Brain Tumor MRI. Machine Learning and Knowledge Extraction. 2026; 8(5):119. https://doi.org/10.3390/make8050119

Chicago/Turabian Style

AlShehri, Helala, and Mariam Busaleh. 2026. "Morphology-Aware Multi-Scale Deep Representation Learning for Interpretable Knowledge Extraction in Brain Tumor MRI" Machine Learning and Knowledge Extraction 8, no. 5: 119. https://doi.org/10.3390/make8050119

APA Style

AlShehri, H., & Busaleh, M. (2026). Morphology-Aware Multi-Scale Deep Representation Learning for Interpretable Knowledge Extraction in Brain Tumor MRI. Machine Learning and Knowledge Extraction, 8(5), 119. https://doi.org/10.3390/make8050119

Article Menu

Morphology-Aware Multi-Scale Deep Representation Learning for Interpretable Knowledge Extraction in Brain Tumor MRI

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Brain Tumor MRI Dataset

3.2. Image Preprocessing and Data Augmentation

3.3. Proposed Morphology-Aware Multi-Scale Network (MA-MSCNet) Architecture

3.3.1. Trainable Morphological Operators

3.3.2. Morphology-Aware Downsampling

3.3.3. Morphology-Aware Multi-Scale Block

3.3.4. Overall MA-MSCNet Architecture

3.4. Experimental Setup

3.4.1. Evaluation Metrics

3.4.2. Training and Evaluation Protocol

4. Results

4.1. Overall Performance

4.2. Explainability and Visual Interpretation Using Grad-CAM

4.3. Morphology-Aware Feature Visualization

4.4. Ablation Study

4.4.1. Effect of Morphology Design (Group A)

4.4.2. Effect of Multi-Scale Feature Extraction (Group B)

4.4.3. Effect of Morphological Operations (Group C)

4.4.4. Effect of Downsampling Strategy (Group D)

4.4.5. Effect of Data Augmentation

4.4.6. Effect of Input Resolution

4.5. Per-Class Performance Analysis

4.6. Computational Efficiency Analysis

4.7. Contextual Comparison with Existing Methods

5. Discussion

5.1. Overall Performance and Architectural Contribution

5.2. Analysis of Performance Gains

5.3. Generalization, Robustness, and Reproducibility

5.4. Limitations of Baseline Comparisons

5.5. Future Research Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI