Next Article in Journal
Enhancing Wearable-Based Elderly Activity Recognition Through a Hybrid Deep Residual Network
Previous Article in Journal
Evaluating the Efficacy of Large Language Models in Stock Market Decision-Making: A Decision-Focused, Price-Only, Multi-Country Analysis Using Historical Price Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Diffusion-Based Feature Denoising and Using NNMF for Robust Brain Tumor Classification

by
Hiba Adil Al-kharsan
1 and
Róbert Rajkó
2,1,*
1
Doctoral School of Computer Science, University of Szeged, Árpád tér 2, H-6720 Szeged, Hungary
2
University Research and Innovation Center (EKIK), Óbuda University, Bécsi út 96/b, H-1034 Budapest, Hungary
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2026, 8(4), 105; https://doi.org/10.3390/make8040105
Submission received: 11 March 2026 / Revised: 7 April 2026 / Accepted: 16 April 2026 / Published: 18 April 2026
(This article belongs to the Section Learning)

Abstract

Brain tumor classification from magnetic resonance imaging, which is also known as MRI, plays a sensitive role in computer-assisted diagnosis systems. In recent years, deep learning models have achieved high classification accuracy. However, their sensitivity to adversarial perturbations has become an important reliability concern in medical applications. This study suggests a robust brain tumor classification framework that combines non-negative matrix factorization (NNMF or NMF), lightweight convolutional neural networks (CNNs), and diffusion-based feature purification. Initially, MRI images are preprocessed and converted into a non-negative data matrix, from which compact and interpretable NNMF feature representations are extracted. Statistical metrics, including AUC, Cohen’s d, and p-values, are used to rank and choose the most discriminative components. Then, a lightweight CNN classifier is trained directly on the selected feature groups. To improve adversarial robustness, a diffusion-based feature-space purification module is introduced. A forward noise method followed by a learned denoiser network is used before classification. System performance is estimated using both clean accuracy and robust accuracy under powerful adversarial attacks created by AutoAttack. The experimental results show that the proposed framework achieves competitive classification performance while significantly enhancing robustness against adversarial perturbations. The findings presuppose that combining interpretable NNMF-based representations with a lightweight deep approach and diffusion-based defense technique supplies an effective and reliable solution for medical image classification under adversarial conditions.

1. Introduction

The classification of brain tumors from magnetic resonance imaging (MRI) is a large and complex component in computer-supported diagnostic systems. Early and careful detection improves handling, design, and patient survival. In recent years, deep learning approaches, mostly convolutional neural networks (CNNs), have shown remarkable performance in medical image analysis tasks [1]. CNN-based models have high classification accuracy in brain tumor detection and segmentation problems due to their ability to learn hierarchical features directly from data.
Despite their powerful predictive performance, deep neural networks are highly vulnerable to adversarial attacks. Little, carefully crafted input modifications can safely degrade classification accuracy while remaining visually imperceptible [2]. This sensitivity raised serious concerns in safety-critical applications, such as medical diagnosis, where solidity and robustness are essential. To ensure a robust evaluation of the robustness of the framework, united attack benchmarks such as AutoAttack have been suggested [3].
Feature extraction [4] and dimensionality reduction remain key components for building interpretable and computationally efficient classification models. Similar chemometric approaches such as PLS-DA have been successfully applied in other domains, illustrating the value of interpretable low-dimensional representations [5]. Non-negative matrix factorization (NNMF or NMF) is a well-confirmed method to decompose non-negative data into representations based on collective parts [6]. Unlike unconstrained linear techniques such as Principal Component Analysis (PCA), NNMF imposes non-negativity constraints, resulting in an interpretable low-rank representation. This property makes NNMF especially suitable for medical image analysis, where the pixel density and derived features are inherently non-negative [7].
In this work, we suggest a structured framework for brain tumor classification that combines NNMF-based feature extraction, statistical feature ranking, CNN-based classification, and diffusion-based feature purification for adversarial robustness. Firstly, MRI images are converted into non-negative feature matrices and decomposed using NNMF. The extracted components are statistically evaluated using metrics such as Area Under the Curve (AUC), Cohen’s d, and hypothesis testing to identify the most spatial features. Then, a lightweight CNN classifier is trained on the selected feature subset. To improve robustness, a feature-space diffusion is introduced to affect structured noise injection, followed by a learned denoising network that borders the reverse diffusion operation. System performance is evaluated against strong adversarial attacks using AutoAttack [3], with the implementation adopted from the official public store [8]. The estimate compares baseline and defended models in terms of both clean precision and robust precision. The suggested path demonstrates that the combination of interpretable NNMF representations with lightweight deep models and diffusion-based purification provides competitive classification accuracy while improving robustness versus adversarial attacks. However, current methods, such as PCA, autoencoders, and Transformer-based embeddings, likely lack interpretability in medical applications. In contrast, NNMF provides a parts-based representation that is proper to its non-negativity constraints, making it more suitable for modeling tumor structures in brain MRI images. Moreover, while many studies focus on improving classification accuracy, little attention has been paid to robustness to adversarial perturbations. This work aims to address this gap by merging diffusion-based feature denoising with interpretable feature extraction.

2. Related Works

  • Deep Learning-Based Brain Tumor Classification (Recent Advances)
    Almuhaimeed et al. [9] proposed a deep learning framework for brain tumor classification using MRI images, integrating convolutional neural networks with data augmentation techniques. Their model achieved high classification accuracy exceeding 97%, demonstrating the effectiveness of deep learning approaches in medical imaging. However, their evaluation was conducted under standard conditions without considering adversarial robustness. In contrast, the current study evaluates both clean and adversarial performance using AutoAttack.
  • Hybrid CNN and Transformer Models
    Gómez et al. [10] proposed a hybrid CNN–Transformer architecture for multi-class brain tumor classification. Their model combines local feature extraction with global attention mechanisms, achieving improved classification performance. However, this approach increases model complexity and reduces interpretability. In contrast, the proposed framework employs NNMF to generate interpretable feature representations.
  • Diffusion and Generative Models in Medical Imaging
    Deem et al. [11] analyzed robustness in brain tumor classification using modern deep learning models and highlighted the trade-off between classification accuracy and adversarial robustness. Similarly, recent studies have explored generative approaches such as GANs and diffusion models to improve performance and robustness. However, most of these methods operate in pixel or latent space.
  • Brain Tumor Detection Using CNN
    Hossain et al. [12] suggest a hybrid brain tumor detection framework that joins classical machine learning classifiers with a computational neural network (CNN). The research used the BRATS benchmark dataset and applied Fuzzy C-Means (FCM) clustering for tumor segmentation, followed by feature extraction using texture and statistical descriptors.
    In the classical machine learning phases, several classifiers were evaluated, including Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Logistic Regression, Naïve Bayes, Random Forest, and Multilayer Perceptron (MLP). Among them, SVM achieved the best performance with an accuracy of 92.42%.
    For deep learning-based classification, the researcher designed a five-layer CNN structure based on convolution, max-pooling, flattening, and fully linked layers. The suggested CNN achieved 97.87% accuracy in an 80:20 training–testing split, outperforming traditional classifiers.
    However, the performance reported increases; the approach relies on pixel-level segmentation and explicit image-based CNN classification. In contrast, our work achieves interpretable low-rank NNMF representations combined with a lightweight CNN approach and diffusion-based robustness boost, aiming not only at high classification accuracy but also improved adversarial robustness.
  • NMF-CNN for the enhancement of features
    Chan et al. [13] suggest a hybrid NMR–convolutional neural network (NMF–CNN) framework for the detection of sound events in the DCASE 2019 challenge. In their work, NMF was used as a preprocessing and feature enhancement step to approximate powerful labels from weakly labeled data by resolving the analysis matrix H. The extracted representations were then fed into a CNN approach for event classification. The results of their study showed that the integration of NMF with a shallow CNN enhances the event-based F1-score (30.39%) compared to the baseline system (23.7%), explaining that NMF can provide a meaningful structural decay that benefits deep learning architectures. Unlike their system in voiced scene analysis, the current study adopts NNMF for medical image feature extraction, where it is applied to generate interpretable low-rank representations of brain MRI images. Subsequently, these representations are utilized for classification and robustness evaluation under autoattacks.
  • Semi-NMF Network for Image Classification
    Huang et al. [14] suggest a Semi-NMF-based convolutional network (SNnet) for image classification, where convolutional filters are not learned through backpropagation but instead are built using semi-non-negative matrix factorization used to image patches. Unlike traditional CNNs that depend on slope-based optimization, their process learns filter banks by matrix factorization, reduces computational cost, and avoids global parameter tuning. In addition, a weakly supervised extension (S-SNnet) was introduced via merge graph regularization into the Semi-NMF framework to enhance discriminative capability. Experimental results on the MNIST dataset demonstrated that the suggested style achieves competitive performance compared to state-of-the-art shallow and deep learning architectures such as PCANet. This work highlights the feasibility of merging the matrix factorization mechanism within convolutional frameworks for active feature learning.
  • Classification–Denoising Joint Models
    Thiry and Guth [15] suggest classification–denoising networks, which simultaneously model image classification and denoising by learning a single network that holds the common distribution of noisy image information and their labels. Their method combines loss of cross-entropy classification with the proper denoising outcome at multiple noise levels, using the Tweedie–Miyasawa formula to estimate the denoised product. Experimental results in CIFAR-10 and ImageNet demonstrate competitive performance and improved adversarial robustness compared to standard discriminative classifiers. This study provides a theoretical link between denoising objectives and adversarial gradients, offering a new view of robustness that complements conventional defense mechanisms [15].
  • Reliable Robustness Evaluation
    Croce and Hein [3] propose a robust evaluation framework for adversarial robustness based on a crew of various parameter-free attacks. Unlike earlier benchmarks that often build on individual attack procedures, their ensemble joins complementary attacks such as APGD and Square attacks to provide a parameter-independent and safe robustness rating. The evaluation of the study involved testing more than fifty models and explained that many already considered robust defenses could be broken when evaluated with the suggested attack ensemble. This study highlighted the need for a strict and united robustness estimate in adversarial machine learning and directly motivated the use of AutoAttack as a reliable benchmark in the current work.
  • Recent studies (2024–2026) on brain tumor classification have reported high precision using the deep CNN model and Transformer-based models, particularly in publicly available datasets. Many of these approaches focus mainly on maximizing clean accuracy, often exceeding 95%. More three reviews [16,17,18] treated brain tumor segmentation, potential of hybrid models and analysis of model interpretability, resp.
    However, these methods are typically evaluated under classical conditions and do not consider adversarial robustness or interpretability. In contrast, the suggested work confirms robustness against adversarial perturbations while preserving interpretable NNMF-based feature representations.
    Thus, this study provides a complementary perspective by addressing reliability and robustness, which are critical in safety-sensitive medical systems. In addition to classification-based approaches, recent studies have discussed anomaly detection and the mechanism of medical image segmentation for brain tumor testing, with the aim of improving localization accuracy and robustness. The suggested work differs by focusing on interpretable feature extraction combined with adversarial robustness over diffusion-based feature purification, providing a complementary perspective to existing methods.

3. Materials and Methods

To develop a robust and adversarial classifier, a structured sequence of fundamental phases is required for the proper implementation of a machine learning model. In this work, a neural network-based model is adopted as the core classification algorithm. The proposed classifier is constructed through four main stages, each comprising a set of well-defined sub-processes prepared to ensure computational correctness, stability, and robustness against adversarial attacks. The functions of NNMF and CNN in the suggested pipeline are complementary rather than redundant. NNMF is applied to extract low-rank and interpretable feature representations from MRI data, preserving meaningful structural patterns.
CNN runs on the selected NNMF features as a lightweight classifier, focusing on discriminative learning rather than feature extraction from raw images. This separation minimizes model complexity and avoids redundancy between components.
The diffusion module is used in the feature space after NNMF and feature selection, acting as a purification step that enhances robustness against adversarial perturbations. It operates independently of the feature extraction stage and does not interfere with the interpretability of the NNMF representations. Therefore, each component in the pipeline serves a distinct and non-overlapping role. These four stages are shown in Figure 1, which summarizes the overall workflow of the suggested framework, including preprocessing, feature extraction based on NNMF, CNN-based classification, diffusion-based denoising and robustness evaluation.

3.1. Preprocessing Dataset

The dataset was selected from Kaggle, a well-known platform for data science and machine learning research. This dataset contains brain magnetic resonance images with identical segmentation masks, which are usually used to train and evaluate brain tumor segmentation models [19]. This dataset consists of approximately 2200 brain magnetic resonance images, each with a binary segmentation mask that highlights the tumor part at the pixel level. It is intended for semantic segmentation function and has been mostly used in research and public notebooks on Kaggle to train and evaluate deep learning models such as U-Net and its variants for brain tumor analysis. It was originally provided in the “COCO annotation” format, where the images and their corresponding labels were stored separately in a JSON file. The first preprocessing script in Python (version 3.10) was developed to reorganize the dataset into a folder directory structure suitable for image classification tasks. The script parses the annotation file, maps each image to its corresponding category, and saves the images to dedicated folders that represent each class: 730 training, 219 validating, and 97 testing items for normal images, and 770 training, 210 validating, and 118 testing items for tumor images. Thus, the train/validation/test ratios were maintained and the single original Kaggle split was used only [19], i.e., 70%/20%/10%, respectively. There was no exact information on the number of slices per patient, nor for the patient-wise representation. Because patient identification was not available, slice-level leakage between splits cannot be fully ruled out, and the reported performance may be optimistic compared to a strictly patient-wise evaluation. See details on the risk of cut-level leakage in brain magnetic resonance imaging classification in [20]. As we know, there is no published classification examination of this Kaggle brain tumor dataset.
These preprocessing steps enable smooth integration with convolutional neural network training. It is important to note that the dataset used in this study is derived from a semantic segmentation task and does not include patient identifiers. Therefore, data separation was performed at the slice level rather than at the patient level. This may introduce a possible risk of data leakage, leading to an optimistic performance rating. However, all preprocessing and splitting procedures were applied consistently in all experiments to give a fair evaluation. Future work will consider patient-wise data splitting and additional sensitivity analysis to further validate the robustness and generalization of the proposed framework.

3.2. Extraction Feature Phase Using NNMF

In this phase, features are extracted from each image using non-negative matrix factorization (NNMF). NNMF is a mechanism to factorize a non-negative matrix V R + K × N into two non-negative matrices W R + K × R and H R + R × N , so that their product approximates V, as reported in [21].
V = W H + E
where E R K × N is the reconstruction error matrix. The matrices W and H are estimated by minimizing the cost function between V and W H as follows [21]:
W = arg min W C ( V W H ) , for fixed H
H = arg min H C ( V W H ) , for fixed W
where C ( A B ) indicates a distance measurement between the matrices A and B. Various distance measures can be used, such as the Euclidean distance, the Kullback–Leibler divergence, the Itakura–Saito divergence, and β -divergence.
NNMF is used not only as a dimensionality-lowering mechanism, but also as an interpretable low-rank approximation model for non-negative data. As clarified in the SIAM study on non-negative matrix factorization [7], NMF belongs to the family of linear dimensionality-lowering mechanisms, where every sample appears as an additive group of a small number of basis components under non-negativity constraints. Unlike PCA or unconstrained low-rank approximations, the non-negativity constraint yields parts-based representations, which are particularly appropriate for image analysis and medical data where pixel density is inherently non-negative.
Moreover, the option of an objective function significantly affects the obtained decomposition and its statistical interpretation [7]. Consequently, in the suggested framework, NNMF is optimized using the Kullback–Leibler (KL) divergence with multiplicative update rules [22]:
W W V . / ( W H ) H T 1 K × N H T
H H W T V . / ( W H ) W T 1 K × N
where ⊗ and . / denote element-wise multiplication and division, respectively, and 1 K × N represents a matrix K × N whose elements are all equal to one.
The images were loaded using MATLAB Datastore (MATLAB R2025b) with class labels from folder names. The image was transformed to grayscale if necessary, resized to 128 × 128 , normalized to the [0,1] range, and vectorized to form a non-negative data matrix V, where each column represents one image. NNMF was learned using the training set, producing a basis matrix W and a coefficient matrix H. Validation and test features were obtained by dropping their vectors onto the fixed basis W via non-negative least squares. Finally, the feature vectors were L2-normalized and saved for classification and robustness. Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6 explain the steps to apply this method: Figure 2 presents the learned NNMF basis components and shows that the decomposition captures multiple meaningful structural patterns. Figure 3 gives an example of a normalized NNMF feature vector for a test image, while Figure 4 confirms the stabilization of the L2-normalization process. In addition, Figure 5 and Figure 6 illustrate class-wise activation behavior and the most powerfully activated test samples, respectively, supporting the interpretability of the extracted NNMF features.

3.3. Feature Selection and Statistical Feature Analysis

Statistical analysis was performed on the NNMF features extracted from brain magnetic resonance images to identify the most distinguishing components to classify tumors and normal tissues; the NNMF rank was identified as 15. Each feature of the NNMF was individually evaluated using multiple complementary criteria, including the Area Under the ROC Curve (AUC), the size of the effect measured by Cohen’s d-value and the statistical significance assessed using Welch’s t-test. This multi-criteria evaluation allows for an accurate result for the features by jointly considering discriminatory power, class separation size, and statistical reliability. The selected features provide interpretability and a concise representation suitable for subsequent classification and robustness analysis. Figure 7 highlights the discriminative power of the selected components of the NNMF based on AUC values, while Figure 8 shows the relationship between impact size and statistical significance. The allocation plots and heatmap further support the hypothesis that the selected features capture meaningful class-dependent differences between normal and tumor samples (see Figure 9 and Figure 10). The components initially extracted from NNMF (k = 15), the final subset of features used for classification was selected based on a combination of the AUC ranking, Cohen’s effect size d, and statistical significance (p-values). The top-M features (M = 15) were selected and consistently used in training, validation, and test sets.

3.4. CNN-Based Classification on Selected NNMF Features

After extracting features using NNMF, a statistical feature ranking stage is applied to evaluate the discriminative force of each NNMF component. Specifically, features are rated using multiple statistical criteria, including the Area Under the ROC Curve (AUC), Cohen’s effect size d, and p-values derived from hypothesis testing. These metrics quantify class separability and enable the selection of the most informative NNMF components while refusing redundant or weakly discriminatory features.
Based on this ranking, a reduced subset of the top-M NNMF features is selected and used as input to a lightweight convolutional neural network (CNN) model. This step aims to explore whether the compact and interpretable NNMF representation is sufficient for proper brain tumor classification without the dependence on high-dimensional image inputs. Before training, the selected feature vectors are smoothed using L2 normalization on a per-sample basis to stabilize the learning operation and ensure comparable feature scales across all samples. The CNN model is trained using the normalized NNMF features of the training set, while performance is monitored on a validation set to block overfitting. Lastly, the trained model is ranked on a held-out test set using standard classification metrics, including accuracy and confusion matrices. This evaluation explains the effectiveness of combining the NNMF-based dimensionality decrease with CNN-based classification, achieving reliable tumor-versus-normal discrimination while maintaining a compact and strong feature representation. Similar hybrid NNMF-CNN strategies have been shown to be effective in related style recognition and signal processing models, where NNMF supplies interpretable features and CNNs capture nonlinear decision boundaries [13,22]. Figure 11 and Figure 12 display CNN training behavior in terms of accuracy and loss, while the validation and test confusion matrices (see Figure 13 and Figure 14) explain that the classifier is able to recognize between normal and tumor samples with reasonably balanced performance. Although traditional classifiers such as SVM, Random Forest, and Gradient Boosting are usually used for low-dimensional data, CNN was selected in this work to preserve compatibility with deep learning-based pipelines and to enable seamless integration with the diffusion-based defense technique. Future work may include benchmarking against classical machine learning models to provide a more comprehensive comparison.

3.5. Feature-Space Diffusion Data Generation

In this step, diffusion training data was built in the NNMF feature space to allow feature-level denoising. First, the most discriminative top-M NNMF features were selected based on feature ratings obtained during the labeling phase. All feature ranking statistics were computed exclusively on the training set; the selected top-M components were then applied unchanged to the validation and test sets. To ensure consistency across the pipeline, the same per-sample L2 normalization used by the classifier was applied to all selected features. A forward diffusion process was then defined using a linear noise timetable with a fixed number of diffusion steps. For each training sample, Gaussian noise was gradually added to the clean feature vector according to the accruing diffusion factor, generating noisy feature representations at randomly selected timesteps. This process yielded paired samples comprising clean features and noisy versions. The generated diffusion pairs were stored as (xt,x0), where X0 indicates the original clean feature vector, and Xt represents its noisy counterpart in diffusion step t = 41. These pairs form the training sets for the feature denoiser in the later step. In addition, the diffusion constants needed for inference and refinement were saved to ensure consistency between the MATLAB and Python implementations. This step does not involve model training or visualization; instead, it focuses only on setting up structured diffusion data, where the corruption of feature representations’ phases enables effective feature-space denoising and robust classification in subsequent stages. The behavior of the NNMF features under the forward diffusion process is shown in Figure 15, Figure 16 and Figure 17, indicating the phased corruption of the feature representations as the diffusion timestep increases, thus these figures of the subsection explain the effect of the forward diffusion process on NNMF features, also the corruption of features at increasing time steps, the change in the distribution of feature-values, and the increase in the noise energy as diffusion proceeds. Unlike classical diffusion models, which are mostly used for image generation in pixel or latent space, the suggested approach applies diffusion in feature space as a controlled denoising technique. This project ensures that the operation does not change the semantic structure of the NNMF representations. Specifically, the diffusion process operates on already extracted interpretable features, introducing noise and then removing it using a learned denoiser. Since the denoising step aims to recover the original feature distribution, the interpretability provided by NNMF is preserved rather than disrupted. Therefore, the suggested diffusion-based purity boost is robust against adversarial perturbations while maintaining the intrinsic structure and interpretability of the feature space.
The diffusion operation is defined using a fixed number of timesteps (T = 50) with linear noise. The denoising network is trained to rebuild clean feature vectors from their noisy counterparts. The timestep information is combined into the denoiser using a normalized scalar embedding, which allows the model to adjust its denoising behavior according to the diffusion step. The selection of t = 41 for evaluation is based on experimental observations, where this timestep provides a balance between suitable noise perturbation and effective recovery by the denoiser.

3.6. Feature-Space Denoiser Training

After the construction of diffusion-corrupted feature pairs in the previous step, this stage focuses on learning a feature-level denoising model able to recover clean NNMF representations from noisy diffusion samples. The target of this step is to round the reverse diffusion step in the feature space by training a regression-based neural network that maps a noisy feature vector xt, conditioned on its diffusion timestep t, back to its matching clean feature vector x0. To allow for noise-level awareness, the diffusion timestep is coded using a compact sinusoidal embedding and hierarchically with the noisy feature input. The denoiser is optimized using a mean squared error objective, allowing it to gradually suppress diffusion-induced perturbations while keeping the underlying discriminative structure of the features. This trained denoiser serves as the basic component of the proposed diffusion-based defense, providing purified feature representations that are later used for robust classification. Figure 18, Figure 19, Figure 20 and Figure 21 in the subsection visually demonstrate the denoising behavior, illustration that the denoiser output shifts closer to the original clean feature representation and that the reconstruction error is reduced compared to the noisy input.

3.7. Evaluation of Feature-Space Diffusion Refine

In this stage, we evaluate the suggested feature-space diffusion defense at the test time. The method implements a forward diffusion step to perturb the clean NNMF feature vectors at a chosen timestep, then uses the trained denoiser to assess the clean features x0. Finally, we compare the classifier performance on clean vs. purified (defended) features using confusion matrices and test accuracy, and export a Python-ready defended test bundle for AutoAttack. The following data tables report the types and quantitative comparisons between clean and diffusion-defended test features. Figure 22, Figure 23 and Figure 24 in the subsection appear clean and defended feature representations, confusion matrices, and standard test accuracy, allowing a direct visual assessment of the effect of diffusion-based refinement on classification results.

3.8. Robustness Evaluation Under AutoAttack

In this stage, we evaluate the robustness of the suggested diffusion-based feature defense under robust adversarial attacks. We run AutoAttack ( L , ϵ = 0.10 ) using two components (APGD-CE and Square) on both the clean classifier and the defended pipeline (forward diffusion noise + feature denoiser + classifier). Since the defense is random due to injected noise, Expectation over Transformation (EOT) is applied by medium predictions over multiple random samples (K = 8). The robustness is assessed according to the attack and as the final robust accuracy. Figure 25, Figure 26 and Figure 27 in this part summarize the robustness evaluation under AutoAttack, including clean-versus-robust accuracy, robust accuracy per-attack, and the reduction in performance drop achieved by the defended pipeline compared to the baseline model. The baseline refered to the same NNMF&CNN pipeline without the diffusion defense, thus the baseline was the same architecture but undefended.

3.9. Comprehensive Performance Evaluation

In addition to classical classification metrics, a total probabilistic rating was performed to estimate discrimination capacity and probability standardization under clean and adversarial conditions. Inspired by practical discussions on metric selection in the classification system [23], we used ROC-AUC, Brier Score, and Log-Loss, along with precision, recall, F1-score, Matthews Correlation Coefficient (MCC), and balanced precision.

3.9.1. ROC-AUC

The Receiver Operating Characteristic Area Under the Curve Measurement (ROC-AUC) measures how well the pattern ranks positive samples above negative samples, independently of a fixed classification threshold [23]. A higher ROC-AUC indicates a powerful discrimination ability.

3.9.2. Brier Score

The Brier Score estimates the average squared error of the predicted probabilities:
Brier Score = 1 N i = 1 N ( p i y i ) 2
This metric can standardize the probability, where lower values indicate the highest probabilistic accuracy.

3.9.3. Log-Loss

Log-Loss (cross-entropy loss) calculates the chance of predicted probabilities, given the true labels:
LogLoss = 1 N i = 1 N y i log ( p i ) + ( 1 y i ) log ( 1 p i )
Log-Loss with difficulty penalizes overconfident untrue predictions, making it useful and informative in adversarial tuning [23].

3.9.4. Matthews Correlation Coefficient (MCC)

The Matthews Correlation Coefficient (MCC) is a stable performance metric that considers all values of the confusion matrix and is particularly suitable for unbalanced datasets. It is defined as follows:
MCC = T P × T N F P × F N ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N )
where T P , T N , F P , and F N denote true positives, true negatives, false positives, and false negatives, respectively. The MCC amount ranges from 1 (total disagreement) to + 1 (perfect prediction)
Balanced Accuracy was also included to relieve the possibility of class imbalance effects.
Balanced Accuracy = S e n s i t i v i t y + S p e c i f i c i t y 2

3.9.5. Results and Discussion

The results show that while the clean model suffers substantial degradation under adversarial attacks, the suggested diffusion-based defense preserves strong discriminative capability (ROC-AUC) and improved probability calibration (lower Brier Score and Log-Loss). These results support the importance of evaluating the classification model that employs discrimination and probabilistic metrics, as highlighted in [23]. The approximation −1 of MCC under AutoAttack perturbations for the baseline indicates a complete reveal of the predictions and confirms that the adversarial attack effectively destroyed the classifier. Table 1 summarizes the quantitative comparison. It is important to explain that the baseline model is designed purposely as a standard reference without any defense technique. Its role is not to achieve maximum robustness, but to provide a fair comparison framework to evaluate the effectiveness of the proposed diffusion-based defense.
The performance degradation observed in the baseline model under adversarial attacks highlights its vulnerability, which is consistent with the widely reported behavior of deep learning models in the literature. In contrast, the proposed defended model demonstrates significantly improved robustness under the same attack conditions, confirming that the performance gain is directly attributed to the introduced diffusion-based feature purification rather than architectural complexity. This comparison ensures a fair and controlled evaluation in which the contribution of the proposed defense can be clearly isolated and validated. Although the baseline model displays a considerable performance drop under AutoAttack (Acc = 0.0047), this behavior is consistent with the widely reported sensitivity of deep neural networks under powerful adversarial attacks. The objective of the baseline is to provide a standard reference without any defense technique.
Importantly, the improvement achieved by the proposed method is not only reflected in relative gains, but also in absolute robustness performance (Acc = 0.5953), which demonstrates substantial resilience under the same attack conditions. This confirms that the observed improvement is not exaggerated, but reflects the effectiveness of the proposed diffusion-based defense. To improve readability, key results in clean, adversarial, and defended scenarios are summarized in a unified comparison format, allowing for direct evaluation of performance differences. Figure 28 illustrates the comparative behavior of the evaluated metrics.

3.10. Comparison with Recent Works

A comparison with recent studies shows that, while some procedures have cleaner accuracy, they often lack robustness and interpretability. The suggested method provides a balanced trade-off between accuracy, robustness, and interpretability, which distinguishes it from existing models.

4. Implementation and Computational Performance Analysis

The suggested framework was run using a combined MATLAB–Python pipeline. MATLAB was applied to NNMF for the extraction of features, statistical classification, CNN training, and a diffusion-based purification system. The trained models were exported to the ONNX format and run in Python using PyTorch and the AutoAttack framework for adversarial robustness rating.
All tests were carried out on a workstation armed with an Intel Core i7 CPU (Intel Corporation, Santa Clara, CA, USA; 16 GB RAM) and an NVIDIA RTX-series GPU (NVIDIA Corporation, Santa Clara, CA, USA) with CUDA acceleration.

4.1. Computational Time Comparison

To analyze computational efficiency, we measured the execution time for all main phases of the suggested framework in both CPU and GPU environments. The results are summarized in Table 2.

4.2. Analysis

The results explain that GPU acceleration basically reduces the total runtime of the suggested framework. In particular, the adversarial robustness estimate phase shows the most significant improvement, where the defended AutoAttack runtime decreases from 155.00 s on CPU to 70.30 s on GPU, and there appears to be a major computational load. The main differences are observed in lightweight stages, such as CNN training, due to GPU initialization and kernel beginning overhead. Since the NNMF feature volume is relatively small, the GPU does not provide a consistent advantage for all phases.
In general, GPU acceleration improves the overall runtime from 201.47 s to 116.60 s, obtaining an approximately 1.73× speedup. These results show that hardware acceleration is particularly useful for computationally strong adversarial robustness estimates, making the proposed framework practically feasible for large-scale analysis.

5. Conclusions

This study approaches a robust and structured framework for brain tumor classification based on NNMF feature extraction, statistical feature selection, CNN-based classification, and diffusion-based feature purity. Unlike a traditional end-to-end deep learning example that builds just on high-dimensional image input, the proposed path uses interpretable low-rank NNMF representations to build a compact and discriminative feature space.
The extracted NNMF components were statistically rated using metrics such as AUC, Cohen’s d, and hypothesis testing, allowing the selection of the most informative features. A lightweight CNN classifier trained on the selected feature subset demonstrated competitive execution in clean test data while ensuring computational efficiency.
To direct adversarial vulnerability, a feature-space diffusion technique was applied, followed by a learned denoising model that approximates the reverse diffusion process. The robustness estimate under powerful adversarial attacks using AutoAttack guarantees that the proposed defense enhances stability compared to the baseline classifier. The experimental results show that combining interpretable NNMF features with diffusion-based purification improves adversarial robustness while maintaining classification accuracy.
In general, the suggested framework explains that the combination of statistical feature analysis, compact neural architectures, and a structured feature-level defense technique provides a stable solution between interpretability, accuracy, and robustness in the medical image classification model.
Future work may investigate the adaptive diffusion table, multi-step purity strategies, and extension to multi-class tumor-level tasks.

Author Contributions

Conceptualization, H.A.A.-k. and R.R.; methodology, H.A.A.-k.; software, H.A.A.-k.; validation, H.A.A.-k. and R.R.; formal analysis, H.A.A.-k.; investigation, H.A.A.-k.; resources, R.R.; data curation, H.A.A.-k.; writing—original draft preparation, H.A.A.-k.; writing—review and editing, H.A.A.-k. and R.R.; visualization, H.A.A.-k.; supervision, R.R.; project administration, R.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study are publicly available from the Kaggle dataset: https://www.kaggle.com/datasets/pkdarabi/brain-tumor-image-dataset-semantic-segmentation (accessed on 11 March 2026). The dataset was prepared using Roboflow (https://universe.roboflow.com/ accessed on 11 March 2026) under the CC BY 4.0 license. The processed data and code used in this work are available from the authors upon reasonable request.

Acknowledgments

This work was supported by the Distinguished Professor Program of Óbuda University. The authors are also grateful for the opportunity to use the HUN-REN Cloud https://science-cloud.hu/en (accessed on 10 March 2026) [24] which helped achieve some specific results published in this paper. Hiba Adil Al-kharsan gratefully acknowledges the financial support of the Stipendium Hungaricum Doctoral Program, managed by the Tempus Public Foundation. During the preparation of this manuscript, the authors used AI for language improvement and clarification of explanations. The authors have checked and edited all outputs and take full responsibility for the content of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciomp, F.; Ghafoorian, M.; van der Laak, J.A.; van Ginneke, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [PubMed]
  2. Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199. [Google Scholar] [CrossRef]
  3. Croce, F.; Hein, M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. arXiv 2020, arXiv:2003.01690. [Google Scholar] [CrossRef]
  4. Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  5. Rajkó, R.; Siket, I.; Hegedűs, P.; Ferenc, R. Development of partial least squares regression with discriminant analysis for software bug prediction. Heliyon 2024, 10, e35045. [Google Scholar] [CrossRef] [PubMed]
  6. Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef] [PubMed]
  7. Gillis, N. Nonnegative Matrix Factorization; SIAM: Philadelphia, PA, USA, 2020. [Google Scholar] [CrossRef]
  8. Croce, F.; Hein, M. AutoAttack: Reliable Evaluation of Adversarial Robustness. 2020. Available online: https://github.com/fra31/auto-attack (accessed on 10 March 2026).
  9. Almuhaimeed, A.; Alenezi, F.; Alotaibi, A. Brain Tumor Classification Using Deep Learning and Data Augmentation Techniques. Front. Med. 2025, 12, 1635796. [Google Scholar] [CrossRef] [PubMed]
  10. Gómez, J.; Martínez, C.; Fernández, L. Enhanced Multi-Class Brain Tumor Classification Using Hybrid CNN and Transformer Models. Technologies 2025, 13, 379. [Google Scholar] [CrossRef]
  11. Deem, M.; Johnson, S.; Kim, D. Robustness Analysis of Brain Tumor Classification Models Under Adversarial Attacks. arXiv 2026, arXiv:2602.11646. [Google Scholar] [CrossRef]
  12. Hossain, T.; Shishir, F.S.; Ashraf, M.; Al Nasim, M.A.; Shah, F.M. Brain tumor detection using Convolutional Neural Network. In Proceedings of the 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT-2019), Dhaka, Bangladesh, 3–5 May 2019. [Google Scholar] [CrossRef]
  13. Chan, T.K.; Chin, C.S.; Li, Y. Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) for sound event detection. arXiv 2020, arXiv:2001.07874. [Google Scholar] [CrossRef]
  14. Huang, H.; Yang, Z.; Liang, N.; Li, Z. Semi-NMF network for image classification. IEEE Access 2019, 7, 8899–8903. [Google Scholar] [CrossRef]
  15. Thiry, L.; Guth, F. Classification-denoising networks. arXiv 2024, arXiv:2410.03505. [Google Scholar] [CrossRef]
  16. Ahamed, M.F.; Hossain, M.M.; Nahiduzzaman, M.; Islam, M.R.; Islam, M.R.; Ahsan, M.; Haider, J. A review on brain tumor segmentation based on deep learning methods with federated learning techniques. Comput. Med. Imaging Graph. 2023, 110, 102313. [Google Scholar] [CrossRef] [PubMed]
  17. Netshamutshedzi, N.; Netshikweta, R.; Ndogmo, J.C.; Obagbuwa, I.C. A systematic review of the hybrid machine learning models for brain tumour segmentation and detection in medical images. Front. Artif. Intell. 2025, 8, 1615550. [Google Scholar] [CrossRef]
  18. Gomes, E.F.; Barbosa, R.S. Deep Learning Approaches for Brain Tumor Classification in MRI Scans: An Analysis of Model Interpretability. Appl. Sci. 2026, 8, 831. [Google Scholar] [CrossRef]
  19. Darabi, P.K. Brain Tumor Image Dataset: Semantic Segmentation. 2023. Available online: https://www.kaggle.com/datasets/pkdarabi/brain-tumor-image-dataset-semantic-segmentation (accessed on 25 January 2026).
  20. Yagis, E.; Atnafu, S.W.; García Seco de Herrera, A.; Marzi, C.; Scheda, R.; Giannelli, M.; Tessa, C.; Citi, L.; Diciotti, S. Effect of data leakage in brain MRI classification using 2D convolutional neural networks. Sci. Rep. 2021, 11, 22544. [Google Scholar] [CrossRef] [PubMed]
  21. Lee, S.; Pang, H.S. Feature extraction based on the Non-Negative Matrix Factorization of Convolutional Neural Networks for monitoring domestic activity with acoustic signals. IEEE Access 2020, 8, 122384–122395. [Google Scholar] [CrossRef]
  22. Lee, D.D.; Seung, H.S. Algorithms for Non-negative Matrix Factorization. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 27 November–2 December 2000; MIT Press: Cambridge, MA, USA, 2000; Volume 13. Available online: https://proceedings.neurips.cc/paper_files/paper/2000/file/f9d1152547c0bde01830b7e8bd60024c-Paper.pdf (accessed on 11 March 2026).
  23. Mazzanti, S. It Took Me 6 Years to Find the Best Metric for Classification Models. 2023. Available online: https://medium.com/data-science-collective/it-took-me-6-years-to-find-the-best-metric-for-classification-models-0f5aa21a2b85 (accessed on 10 March 2026).
  24. Héder, M.; Rigó, E.; Medgyesi, D.; Lovas, R.; Tenczer, S.; Török, F.; Farkas, A.; Emődi, M.; Kadlecsik, J.; Mező, G.; et al. The Past, Present and Future of the ELKH Cloud. Inf. Társadalom 2022, 22, 128–137. [Google Scholar] [CrossRef]
Figure 1. Stages of the proposed framework.
Figure 1. Stages of the proposed framework.
Make 08 00105 g001
Figure 2. NNMF basis components (k = 15). This figure shows the learned NNMF basis components obtained from the training data. Each basis image represents a non-negative spatial pattern that contributes to rebuilding brain MRI images. The components take meaningful anatomical structures such as skull boundaries, tissue distribution, and localized density variations. The variety across components indicates that NNMF decomposes the images into multiple complementary patterns rather than a single dominant structure, providing an interpretable and compact representation suitable for later analysis.
Figure 2. NNMF basis components (k = 15). This figure shows the learned NNMF basis components obtained from the training data. Each basis image represents a non-negative spatial pattern that contributes to rebuilding brain MRI images. The components take meaningful anatomical structures such as skull boundaries, tissue distribution, and localized density variations. The variety across components indicates that NNMF decomposes the images into multiple complementary patterns rather than a single dominant structure, providing an interpretable and compact representation suitable for later analysis.
Make 08 00105 g002
Figure 3. Example test image and its normalized NNMF feature vector. This figure illustrates an example test MRI image from the normal class side and its matching normalized NNMF feature vector. The bar plot shows the activation power of each NNMF component for this image, highlighting that only a subset of components exhibits strong responses. This sparse and selective activation pattern explains that NNMF features encode discriminative structural information rather than uniformly responding to all components, which is desirable for robust classification.
Figure 3. Example test image and its normalized NNMF feature vector. This figure illustrates an example test MRI image from the normal class side and its matching normalized NNMF feature vector. The bar plot shows the activation power of each NNMF component for this image, highlighting that only a subset of components exhibits strong responses. This sparse and selective activation pattern explains that NNMF features encode discriminative structural information rather than uniformly responding to all components, which is desirable for robust classification.
Make 08 00105 g003
Figure 4. L2 norm of Xtest after normalization. This figure reports the L2 norm of all normalized test feature vectors. The values are tightly focused around one, confirming the correctness and stability of the normalization process. Ensuring unit-norm feature vectors is critical for a fair comparison between samples and for robustness evaluation, as it blocks feature magnitude variations from controlling the classifier or adversarial attack.
Figure 4. L2 norm of Xtest after normalization. This figure reports the L2 norm of all normalized test feature vectors. The values are tightly focused around one, confirming the correctness and stability of the normalization process. Ensuring unit-norm feature vectors is critical for a fair comparison between samples and for robustness evaluation, as it blocks feature magnitude variations from controlling the classifier or adversarial attack.
Make 08 00105 g004
Figure 5. Class-wise mean NNMF features (normalized, TEST). This figure shows the mean activation of each NNMF component for normal and tumor classes after feature normalization. Clearly, differences can be noticed via several components, where certain features show continuously higher activation for tumor samples, while others are more prominent for normal samples. These class-dependent activation patterns indicate that NNMF components capture discriminative characteristics linked to pathological changes, supporting their effectiveness for classification and feature-space defense strategies.
Figure 5. Class-wise mean NNMF features (normalized, TEST). This figure shows the mean activation of each NNMF component for normal and tumor classes after feature normalization. Clearly, differences can be noticed via several components, where certain features show continuously higher activation for tumor samples, while others are more prominent for normal samples. These class-dependent activation patterns indicate that NNMF components capture discriminative characteristics linked to pathological changes, supporting their effectiveness for classification and feature-space defense strategies.
Make 08 00105 g005
Figure 6. Top-activated TEST samples per component (using normalized Xtest): This figure appears to be the most strongly activated test sample for each NNMF component based on normalized feature vectors. For each component, the corresponding image and its activation value are viewed along with the class label. The results detect that some components are often activated by tumor images, while others respond more strongly to normal brain structures. This component-wise association boosts interpretability by directly linking devoid feature dimensions to concrete visual patterns in MRI images.
Figure 6. Top-activated TEST samples per component (using normalized Xtest): This figure appears to be the most strongly activated test sample for each NNMF component based on normalized feature vectors. For each component, the corresponding image and its activation value are viewed along with the class label. The results detect that some components are often activated by tumor images, while others respond more strongly to normal brain structures. This component-wise association boosts interpretability by directly linking devoid feature dimensions to concrete visual patterns in MRI images.
Make 08 00105 g006
Figure 7. Top-15 features—AUC. This figure shows the AUC rate of the top-rated NNMF features based on the feature selection operation. Each bar corresponds to a single NNMF component and affects its individual ability to distinguish tumor samples from normal ones. Higher AUC numbers indicate the most powerful discriminative ability, while values closer to 0.5 suggest limited separability. The score proves that several NNMF components show meaningful classification potential at the feature grade.
Figure 7. Top-15 features—AUC. This figure shows the AUC rate of the top-rated NNMF features based on the feature selection operation. Each bar corresponds to a single NNMF component and affects its individual ability to distinguish tumor samples from normal ones. Higher AUC numbers indicate the most powerful discriminative ability, while values closer to 0.5 suggest limited separability. The score proves that several NNMF components show meaningful classification potential at the feature grade.
Make 08 00105 g007
Figure 8. Effect size vs. significance. This figure clarifies the relation between effect size and statistical significance for NNMF features. The horizontal axis marks Cohen’s d, where positive values correspond to higher activation in tumor samples and negative values point to higher activation in normal samples. The vertical axis performs the negative logarithm of the p-value gained from Welch’s t-test. Features with large absolute effect sizes and high statistical significance are visually stressed, highlighting components that are both highly discriminative and statistically reliable.
Figure 8. Effect size vs. significance. This figure clarifies the relation between effect size and statistical significance for NNMF features. The horizontal axis marks Cohen’s d, where positive values correspond to higher activation in tumor samples and negative values point to higher activation in normal samples. The vertical axis performs the negative logarithm of the p-value gained from Welch’s t-test. Features with large absolute effect sizes and high statistical significance are visually stressed, highlighting components that are both highly discriminative and statistically reliable.
Make 08 00105 g008
Figure 9. Top feature distributions (normal vs. tumor). This figure displays boxplot visualizations of the maximum discriminative NNMF features, comparing their normalized distributions between normal and tumor classes. The plots detect visible differences in average and allocation spreads for selected features, providing visual confirmation of their discriminative stand. These distributions supplement the statistical metrics and explain how individual NNMF components react differently to healthy and sick brain structures.
Figure 9. Top feature distributions (normal vs. tumor). This figure displays boxplot visualizations of the maximum discriminative NNMF features, comparing their normalized distributions between normal and tumor classes. The plots detect visible differences in average and allocation spreads for selected features, providing visual confirmation of their discriminative stand. These distributions supplement the statistical metrics and explain how individual NNMF components react differently to healthy and sick brain structures.
Make 08 00105 g009
Figure 10. Class mean heatmap (top-15 features). This figure summarizes the class-wise mean activation of the top-rated NNMF features using a heatmap representation. Each column corresponds to a selected NNMF component, while rows represent the normal and tumor classes. The color intensity reflects the average feature activation, enabling speedy identification of tumor-dominant and normal-dominant features. The observed style confirms that NNMF components capture class-based structural information and contribute to interpretable feature-level discrimination.
Figure 10. Class mean heatmap (top-15 features). This figure summarizes the class-wise mean activation of the top-rated NNMF features using a heatmap representation. Each column corresponds to a selected NNMF component, while rows represent the normal and tumor classes. The color intensity reflects the average feature activation, enabling speedy identification of tumor-dominant and normal-dominant features. The observed style confirms that NNMF components capture class-based structural information and contribute to interpretable feature-level discrimination.
Make 08 00105 g010
Figure 11. Training progress–Accuracy. This figure shows the evolution of classification accuracy via training. The light curve represents raw mini-boost training accuracy, while the smoothed curve highlights the overall trend. Validation accuracy (black markers) is measured periodically and remains close to the training curve, indicating stable learning and limited overfitting. Accuracy climbs sharply during the early iterations and then gradually saturates, suggesting convergence of the model when trained on the selected NNMF feature space.
Figure 11. Training progress–Accuracy. This figure shows the evolution of classification accuracy via training. The light curve represents raw mini-boost training accuracy, while the smoothed curve highlights the overall trend. Validation accuracy (black markers) is measured periodically and remains close to the training curve, indicating stable learning and limited overfitting. Accuracy climbs sharply during the early iterations and then gradually saturates, suggesting convergence of the model when trained on the selected NNMF feature space.
Make 08 00105 g011
Figure 12. Training progress – Loss. This figure reports the training and validation loss curves over iterations. The training loss reduces steadily, while the validation loss follows a similar downward trend with a slightly higher amount, which is expected. The parallel behavior of both curves suggests that optimization proceeds normally, and the model generalizes reasonably well. A lack of a significant difference is shown between training and validation loss, indicating that the CNN is not severely overfitting in spite of being trained for multiple epochs.
Figure 12. Training progress – Loss. This figure reports the training and validation loss curves over iterations. The training loss reduces steadily, while the validation loss follows a similar downward trend with a slightly higher amount, which is expected. The parallel behavior of both curves suggests that optimization proceeds normally, and the model generalizes reasonably well. A lack of a significant difference is shown between training and validation loss, indicating that the CNN is not severely overfitting in spite of being trained for multiple epochs.
Make 08 00105 g012
Figure 13. VAL confusion matrix (Acc ≈ 0.83). This confusion matrix abstracts the model’s performance on the validation set. Correct predictions are shown on the diagonal, while off-diagonal input corresponds to wrong classifications. The matrix points to a high rate of correctly classified normal and tumor samples, with errors mainly occurring when normal images are predicted as tumors and vice versa. The total validation accuracy (≈83%) confirms that the selected NNMF features retain meaningful discriminative information and that the classifier is learning a separable decision limit.
Figure 13. VAL confusion matrix (Acc ≈ 0.83). This confusion matrix abstracts the model’s performance on the validation set. Correct predictions are shown on the diagonal, while off-diagonal input corresponds to wrong classifications. The matrix points to a high rate of correctly classified normal and tumor samples, with errors mainly occurring when normal images are predicted as tumors and vice versa. The total validation accuracy (≈83%) confirms that the selected NNMF features retain meaningful discriminative information and that the classifier is learning a separable decision limit.
Make 08 00105 g013
Figure 14. TEST confusion matrix (Acc ≈ 0.851). This confusion matrix reports the final performance on the unseen test set. The majority of samples lie on the diagonal, yielding an overall accuracy ≈ 0.851. Although this accuracy may seem lower compared to some current approaches, it is important to note that many reported results are achieved under standard (non-adversarial) situations and often depend on complex deep architectures. In contrast, the suggested framework prefers robustness and interpretability in addition to classification performance. The model runs on a compact set of statistically chosen NNMF features and uses a lightweight CNN, which improves generalization and minimizes overfitting. Furthermore, under robust adversarial attacks (e.g., AutoAttack), the baseline model displays a considerable drop in performance, while the defended model preserves substantially higher robustness. This demonstrates the effectiveness of the suggested diffusion-based feature purity and highlights the importance of robustness in safety-critical implementations such as medical diagnosis. The relatively low numbers of false positives (normal → tumor) and false negatives (tumor → normal) point to balanced performance across both classes. This result supports the effectiveness of using a compact subset of NNMF features (selected by statistical ranking) as input to a small CNN, achieving robust generalization while reducing dimensionality.
Figure 14. TEST confusion matrix (Acc ≈ 0.851). This confusion matrix reports the final performance on the unseen test set. The majority of samples lie on the diagonal, yielding an overall accuracy ≈ 0.851. Although this accuracy may seem lower compared to some current approaches, it is important to note that many reported results are achieved under standard (non-adversarial) situations and often depend on complex deep architectures. In contrast, the suggested framework prefers robustness and interpretability in addition to classification performance. The model runs on a compact set of statistically chosen NNMF features and uses a lightweight CNN, which improves generalization and minimizes overfitting. Furthermore, under robust adversarial attacks (e.g., AutoAttack), the baseline model displays a considerable drop in performance, while the defended model preserves substantially higher robustness. This demonstrates the effectiveness of the suggested diffusion-based feature purity and highlights the importance of robustness in safety-critical implementations such as medical diagnosis. The relatively low numbers of false positives (normal → tumor) and false negatives (tumor → normal) point to balanced performance across both classes. This result supports the effectiveness of using a compact subset of NNMF features (selected by statistical ranking) as input to a small CNN, achieving robust generalization while reducing dimensionality.
Make 08 00105 g014
Figure 15. Clean vs. diffused NNMF feature vectors at a late diffusion timestep (t = 41). The figure shows the impact of forward diffusion noise on selected feature components and how the original clean features xt,x0 are gradually corrupted into noisy features.
Figure 15. Clean vs. diffused NNMF feature vectors at a late diffusion timestep (t = 41). The figure shows the impact of forward diffusion noise on selected feature components and how the original clean features xt,x0 are gradually corrupted into noisy features.
Make 08 00105 g015
Figure 16. Impact of diffusion time on NNMF features at various timesteps (t = 1, 10, 25, and 50). As the diffusion timestep increases, the injected noise becomes more dominant, leading to higher distortion and variability in the feature representations.
Figure 16. Impact of diffusion time on NNMF features at various timesteps (t = 1, 10, 25, and 50). As the diffusion timestep increases, the injected noise becomes more dominant, leading to higher distortion and variability in the feature representations.
Make 08 00105 g016
Figure 17. Allocation of NNMF feature values before and after diffusion. The histogram comparison highlights the increase in contrast and spread of feature values caused by the diffusion operation, indicating a deviation from the original feature set.
Figure 17. Allocation of NNMF feature values before and after diffusion. The histogram comparison highlights the increase in contrast and spread of feature values caused by the diffusion operation, indicating a deviation from the original feature set.
Make 08 00105 g017
Figure 18. Noise energy as a function of diffusion timestep. The plot shows the L2 distance between clean and noisy feature vectors, x t 2 x 0 , increasing with diffusion time, quantitatively confirming the gradual damage introduced by the forward diffusion step. Although the accumulative diffusion table is not explicitly explained, its effect is inherently reflected in the progressive rise in the noise capacity, which confirms how the contribution of the original clean features gradually decreases as the diffusion timestep increases.
Figure 18. Noise energy as a function of diffusion timestep. The plot shows the L2 distance between clean and noisy feature vectors, x t 2 x 0 , increasing with diffusion time, quantitatively confirming the gradual damage introduced by the forward diffusion step. Although the accumulative diffusion table is not explicitly explained, its effect is inherently reflected in the progressive rise in the noise capacity, which confirms how the contribution of the original clean features gradually decreases as the diffusion timestep increases.
Make 08 00105 g018
Figure 19. Feature denoising example at diffusion step t = 41. The clean feature vector X0, its diffused version xt, and the denoiser output x0^ are drawn to explain how the network suppresses diffusion noise and movement, acting closer to the clean features.
Figure 19. Feature denoising example at diffusion step t = 41. The clean feature vector X0, its diffused version xt, and the denoiser output x0^ are drawn to explain how the network suppresses diffusion noise and movement, acting closer to the clean features.
Make 08 00105 g019
Figure 20. Denoising error is decreasing relative to the noisy information. Each point contrasts the noisy reconstruction error, x t x 0 2 (x-axis), versus the denoised reconstruction error, x ^ 0 x 0 2 (y-axis). Points lying below the identity line indicate successful fault decrees after denoising, demonstrating that the denoiser effectively returns features near the original clean representation.
Figure 20. Denoising error is decreasing relative to the noisy information. Each point contrasts the noisy reconstruction error, x t x 0 2 (x-axis), versus the denoised reconstruction error, x ^ 0 x 0 2 (y-axis). Points lying below the identity line indicate successful fault decrees after denoising, demonstrating that the denoiser effectively returns features near the original clean representation.
Make 08 00105 g020
Figure 21. Denoiser reconstruction error versus diffusion time. The plot shows x 0 2 x ^ 0 as a function of timestep t, highlighting how denoising becomes harder to change with increasing diffusion force.
Figure 21. Denoiser reconstruction error versus diffusion time. The plot shows x 0 2 x ^ 0 as a function of timestep t, highlighting how denoising becomes harder to change with increasing diffusion force.
Make 08 00105 g021
Figure 22. Clean vs. diffusion-defended NNMF feature vector example on the test set, explaining how the refine step alters the feature profile after forward noise and denoising at purt.
Figure 22. Clean vs. diffusion-defended NNMF feature vector example on the test set, explaining how the refine step alters the feature profile after forward noise and denoising at purt.
Make 08 00105 g022
Figure 23. (a) Confusion matrix on the test set using clean (non-defended) NNMF features, displays class-wise predicate outcomes for normal and tumor. (b) Confusion matrix on the test set after implementation of diffusion-based feature refinement (defended features), highlighting changes in misclassification patterns compared to the clean state.
Figure 23. (a) Confusion matrix on the test set using clean (non-defended) NNMF features, displays class-wise predicate outcomes for normal and tumor. (b) Confusion matrix on the test set after implementation of diffusion-based feature refinement (defended features), highlighting changes in misclassification patterns compared to the clean state.
Make 08 00105 g023
Figure 24. Test accuracy comparison between clean features and diffusion-defended (refine) features, quantifying the net effect of the defense on standard classification accuracy.
Figure 24. Test accuracy comparison between clean features and diffusion-defended (refine) features, quantifying the net effect of the defense on standard classification accuracy.
Make 08 00105 g024
Figure 25. Clean and robust accuracy under AutoAttack ( L , ϵ = 0.10 ) for the clean model and the suggested diffusion-based defense. The robust accuracy corresponds to the final AutoAttack score (minimum across the evaluated attacks).
Figure 25. Clean and robust accuracy under AutoAttack ( L , ϵ = 0.10 ) for the clean model and the suggested diffusion-based defense. The robust accuracy corresponds to the final AutoAttack score (minimum across the evaluated attacks).
Make 08 00105 g025
Figure 26. Robust accuracy per AutoAttack component (APGD-CE and Square) for the baseline and defended models (L, ϵ = 0.10 ). The final robustness is calculated as the minimum accuracy across attacks.
Figure 26. Robust accuracy per AutoAttack component (APGD-CE and Square) for the baseline and defended models (L, ϵ = 0.10 ). The final robustness is calculated as the minimum accuracy across attacks.
Make 08 00105 g026
Figure 27. Accuracy decline under AutoAttack for baseline versus defended models (L, ϵ = 0.10). The diffusion-based defense noticeably reduces the drop from clean to robust performance.
Figure 27. Accuracy decline under AutoAttack for baseline versus defended models (L, ϵ = 0.10). The diffusion-based defense noticeably reduces the drop from clean to robust performance.
Make 08 00105 g027
Figure 28. Comparison of classification and probabilistic metrics for baseline and diffusion-based defense under clean and adversarial settings. Higher values indicate better performance except for Brier Score and Log-Loss, where lower values are preferred.
Figure 28. Comparison of classification and probabilistic metrics for baseline and diffusion-based defense under clean and adversarial settings. Higher values indicate better performance except for Brier Score and Log-Loss, where lower values are preferred.
Make 08 00105 g028
Table 1. Comprehensive performance comparison under clean and adversarial settings.
Table 1. Comprehensive performance comparison under clean and adversarial settings.
ModelAccPrecRecF1MCCBalAccROC-AUCBrierLogLoss
Clean_Baseline0.86050.85480.89830.87600.71780.85640.91050.14610.4751
Clean_Defended0.85120.85250.88140.86670.69880.84790.89670.15550.4963
Robust_Baseline0.00470.00000.00000.0000−0.99060.00520.00750.47021.1629
Robust_Defended0.59530.61150.72030.66150.17030.58180.74850.21500.6182
Table 2. Execution time comparison (CPU vs. GPU).
Table 2. Execution time comparison (CPU vs. GPU).
StageCPU (sec)GPU (sec)
NNMF Feature Extraction17.039.11
CNN Training (NNMF Features)12.6017.22
Diffusion Denoiser Training8.429.24
AutoAttack Baseline (APGD-CE + Square)8.4210.73
AutoAttack Defended (APGD-CE + Square)155.0070.30
Total Runtime201.47116.60
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al-kharsan, H.A.; Rajkó, R. Diffusion-Based Feature Denoising and Using NNMF for Robust Brain Tumor Classification. Mach. Learn. Knowl. Extr. 2026, 8, 105. https://doi.org/10.3390/make8040105

AMA Style

Al-kharsan HA, Rajkó R. Diffusion-Based Feature Denoising and Using NNMF for Robust Brain Tumor Classification. Machine Learning and Knowledge Extraction. 2026; 8(4):105. https://doi.org/10.3390/make8040105

Chicago/Turabian Style

Al-kharsan, Hiba Adil, and Róbert Rajkó. 2026. "Diffusion-Based Feature Denoising and Using NNMF for Robust Brain Tumor Classification" Machine Learning and Knowledge Extraction 8, no. 4: 105. https://doi.org/10.3390/make8040105

APA Style

Al-kharsan, H. A., & Rajkó, R. (2026). Diffusion-Based Feature Denoising and Using NNMF for Robust Brain Tumor Classification. Machine Learning and Knowledge Extraction, 8(4), 105. https://doi.org/10.3390/make8040105

Article Metrics

Back to TopTop