Next Article in Journal
Histo-Molecular Intratumoral Heterogeneity in Meningiomas: A Narrative Review
Previous Article in Journal
Sarcopenia as a Marker of Immunometabolic Vulnerability in Pancreatic Ductal Adenocarcinoma
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Interpretable Multimodal Machine-Learning Model for Non-Invasive Preoperative Glioma Grading

1
Department of Neurosurgery, The First Affiliated Hospital of Harbin Medical University, Harbin 150001, China
2
Department of Pediatric Surgery, The Sixth Affiliated Hospital of Harbin Medical University, Harbin 150023, China
3
Department of Spleen and Stomach Diseases, Wangjing Hospital, China Academy of Chinese Medical Sciences, Beijing 100102, China
4
Department of Laboratory Diagnostics, The First Affiliated Hospital of Harbin Medical University, Harbin 150001, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Cancers 2026, 18(8), 1204; https://doi.org/10.3390/cancers18081204
Submission received: 4 March 2026 / Revised: 1 April 2026 / Accepted: 2 April 2026 / Published: 10 April 2026
(This article belongs to the Section Cancer Pathophysiology)

Simple Summary

Glioma grading before surgery is essential because treatment planning and expected outcomes differ greatly between low-grade and high-grade tumors. In routine practice, grading often depends on postoperative pathology, while preoperative imaging and symptoms alone may be uncertain. We developed and internally validated a machine-learning model that combines readily available preoperative information, including clinical symptoms, standard imaging features, and metabolic measurements from magnetic resonance spectroscopy. Using data from 400 patients, the model provided an individualized probability of high-grade glioma and showed strong performance in an independent validation group. To improve transparency, we used an explanation method to show how each factor contributed to the prediction. This approach may help clinicians better stratify risk before surgery, support discussions with patients, and guide planning of surgical strategies and postoperative management. Further validation in external cohorts is needed before broader clinical adoption.

Abstract

Background: Gliomas are the most common primary malignant tumors of the central nervous system. Accurate preoperative grading is essential for individualized surgical planning and treatment selection; however, reliable non-invasive prediction tools integrating multimodal preoperative data remain limited. This study aimed to develop and internally validate an interpretable machine-learning model for non-invasive glioma grading. Methods: Clinical and imaging data from 400 patients with pathologically confirmed gliomas were retrospectively collected. Twenty-four preoperative variables were analyzed. The dataset was randomly divided into training and validation cohorts (7:3). Feature selection was performed using a combination of the Boruta algorithm and logistic regression analyses, followed by correlation filtering. Seventeen machine-learning algorithms were benchmarked using five-fold cross-validation, and the optimal model was evaluated in the independent validation cohort using ROC analysis, calibration assessment, precision–recall curves, and decision curve analysis. Model interpretability was examined using SHAP. Results: Eight key predictors were identified, including age, focal neurological deficits, midline shift, tumor laterality, tumor lobar location, enhancing tumor volume, and MRS-derived Cho/NAA and Cho/Cr ratios. The Random Forest model achieved an area under the ROC curve of 0.946 (95% CI: 0.902–0.989) in the validation cohort. Calibration analysis demonstrated reasonable agreement between predicted and observed outcomes, and the precision–recall curve yielded an average precision of 0.98. Decision curve analysis indicated net clinical benefit across relevant probability thresholds. Conclusions: A multimodal machine-learning model integrating clinical, structural imaging, and MRS-derived metabolic features was developed and internally validated for non-invasive preoperative glioma grading. The model showed good discrimination and calibration and provided individualized probability estimates, suggesting potential value for preoperative risk stratification. However, clinical deployment remains premature, and further external validation is required.

Graphical Abstract

1. Introduction

Glioma is the most common primary malignant intracranial tumor in adults, accounting for approximately 80% of central nervous system malignancies [1,2]. According to the World Health Organization (WHO) classification, gliomas are categorized into low-grade (LGG, grades I–II) and high-grade (HGG, grades III–IV) tumors [3,4]. Compared with LGG, HGG exhibits aggressive growth, treatment resistance, and markedly poorer survival outcomes [5,6,7,8,9,10]. Therefore, accurate preoperative grading is essential for determining surgical extent, intraoperative functional mapping strategies, and postoperative adjuvant therapy.
Currently, glioma grading relies on postoperative histopathology combined with molecular markers such as IDH and TP53 [3]. However, this gold standard is invasive and only available after surgery. Although stereotactic biopsy may assist preoperative evaluation, sampling errors due to intratumoral heterogeneity limit its reliability [11]. In clinical practice, grading is primarily based on contrast-enhanced MRI, yet enhancement patterns are not always specific: some HGGs exhibit minimal enhancement, while certain LGGs present atypical enhancement features, leading to potential misclassification [12]. These limitations highlight the need for more reliable non-invasive preoperative assessment tools.
To improve diagnostic performance, various approaches have been explored. Radiomics-based models derived from multiparametric MRI have shown promising results [13,14,15], and clinical predictors have also been investigated [16]. In addition, functional MRI techniques such as magnetic resonance spectroscopy (MRS), diffusion tensor imaging (DTI), and amide proton transfer (APT) imaging provide complementary metabolic and microstructural information [17,18,19]. However, most previous studies rely predominantly on imaging features alone, with limited integration of routinely available clinical symptoms and metabolic biomarkers. Moreover, the combined utilization of multimodal data in a clinically accessible and standardized manner remains insufficient, and the generalizability and interpretability of existing models are still limited.
Artificial intelligence and machine-learning approaches have rapidly advanced in medical imaging and clinical decision support, including applications in neuro-oncology [20]. Recent studies have increasingly focused on deep learning frameworks and data augmentation strategies. For example, generative adversarial networks (GANs), such as Pix2Pix-based models, have been applied to generate synthetic MRI data to address data scarcity and improve classification performance [21]. Similarly, conditional deep convolutional neural networks combined with GAN-generated datasets have demonstrated promising results in brain tumor image classification [22]. However, successful translation into neurosurgical practice requires not only predictive performance but also interpretability and biological plausibility.
To address these gaps, we developed an interpretable multimodal machine-learning model for preoperative glioma grading. The framework integrates three categories of routinely accessible data: clinical symptoms, structural imaging indicators (midline shift and enhanced tumor volume), and MRS-derived metabolic ratios. Through rigorous feature selection and systematic algorithm comparison, we aimed to construct a high-performance yet transparent model for exploratory evaluation in a single-center cohort. This study aims to assess the feasibility and potential value of integrating multimodal data and interpretable machine-learning methods for preoperative glioma grading.

2. Materials and Methods

2.1. Study Population

This study retrospectively collected cases pathologically confirmed glioma following surgery at the hospital between January 2019 and December 2024.
The inclusion criteria were: (1) age ≥ 18 years; (2) availability of a definitive postoperative histopathological diagnosis; (3) availability of complete preoperative imaging datasets, including cranial CT, conventional MRI sequences (non-contrast and contrast-enhanced T1-weighted sequences), and MRS data. Exclusion criteria included: (1) absence of key clinical information; (2) incomplete imaging data or the presence of severe artifacts affecting analysis; (3) concomitant neurological disorders or other primary intracranial malignant tumors; (4) a history of prior surgical intervention for glioma before admission. All included patients had complete data for the 24 prespecified candidate predictors. Therefore, no imputation was required for model development and no missing values were present in the final analytic dataset (the default imputation component in the machine-learning pipeline was retained for workflow consistency but did not modify any values). All included patients were evaluated and managed by a neurosurgical team in accordance with relevant clinical guidelines. The study protocol was approved by the Ethics Committee, with informed consent waived.

2.2. Data Collection

Data were obtained through a systematic review of electronic medical records and the imaging archiving system. Collected variables included: 1. Demographics and baseline status: age, sex, history of smoking and alcohol consumption, and comorbidities (hypertension, diabetes). 2. Clinical presentation: Karnofsky Performance Status (KPS) score at admission, duration of symptoms, and the presence of specific neurological symptoms, categorized as follows: a. Seizure—any type of preoperative seizure event; b. Intracranial increased pressure (IIP)—presence of at least one of the following: headache, nausea, vomiting, or visual blurring associated with papilledema; c. Focal neurological deficits (FND)—objective signs including motor impairment (monoparesis or hemiparesis with strength ≤ grade 4), sensory disturbances, language impairment (aphasia or dysarthria), or visual field defects; d. Cognitive impairment (CI)—such as memory decline, slowed responsiveness; e. Cerebellar symptoms (CS)—including ataxia or gait instability. 3. Imaging features: MS on CT, measured at the axial slice demonstrating the maximal tumor diameter, with displacement ≥ 2 mm considered positive. Tumor location was recorded as laterality (left hemisphere, right hemisphere, or crossing the midline) and involved lobe(s) (multilobar, frontal, temporal, parietal, occipital lobe, deep structure). ETV was estimated on contrast-enhanced T1-weighted MRI using the ellipsoid approximation: V = (a × b × c)/2, where a, b, and c represent the three maximal perpendicular diameters. This method was chosen because it reflects routine clinical practice and does not require dedicated segmentation software; however, it is an approximation and may introduce measurement error compared with manual or automated volumetry. For MRS, the voxel/region of interest was placed within the solid enhancing portion of the tumor while avoiding necrosis, hemorrhage, cystic areas, and surrounding normal tissue as much as possible; Cho/NAA, Cho/Cr, and NAA/Cr ratios were obtained. MRI and MRS examinations were performed using a 1.5-T scanner (Siemens Aera, Siemens Healthineers, Erlangen, Germany) equipped with a 24-channel head and neck coil. Conventional MRI acquisition included axial T1-weighted imaging (T1WI), fat-suppressed T2-weighted imaging (T2WI), and contrast-enhanced T1-weighted imaging, with representative parameters as follows: repetition time (TR) 383–2700 ms, echo time (TE) 11–78 ms, slice thickness 5 mm, interslice gap 1 mm, field of view 180–240 mm, and matrix size 256 × 180 to 384 × 308, depending on the sequence protocol. MRS was acquired using a single-voxel technique with a point-resolved spectroscopy (PRESS) sequence, with the voxel positioned within the solid tumor component while avoiding necrosis and surrounding normal tissue. The principal acquisition parameters were: repetition time (TR) 1500–2000 ms, echo time (TE) 135 ms, voxel size approximately 10 × 10 × 10 mm3 (1.0 cm3), and number of excitations 96–128. All therapeutic strategies were determined by neurosurgeons based on individual patient characteristics and in accordance with current clinical guidelines for glioma management. All imaging measurements (including MS and ETV) were performed by a radiologist with 10 years of neuroimaging experience who was blinded to the pathological grade.

2.3. Definition of HGG and LGG

The final diagnosis for all patients was established according to the integrated diagnostic framework of the 2021 WHO Classification of Tumors of the Central Nervous System (Fifth Edition). This diagnosis incorporates histological morphology with key biological indicators including IDH mutation status, TP53 status, and the Ki-67 proliferation index. Tumor grading in this study was based on this integrated diagnosis. Accordingly, WHO grades I and II were classified as low grade gliomas (LGG), while grades III and IV were classified as high grade gliomas (HGG).

2.4. Sample Size and Statistical Power

A total of 400 patients were enrolled and divided into a training set (n = 280) and an internal validation set (n = 120) in a 7:3 ratio using stratified sampling to preserve the HGG/LGG proportion. For reproducibility of resampling, model training, and hyperparameter optimization, a fixed random seed (random_state = 123) was applied throughout the machine-learning pipeline (SMOTETomek, Random Forest training, and OptunaSearchCV). The training set size exceeded the minimum required sample size (n = 240), calculated based on an events-per-variable (EPV) criterion of ≥10 for 24 candidate predictors. The validation cohort (n = 120) accounted for 30% of the total sample, consistent with commonly recommended proportions (20–30%) for internal validation [23,24], supporting stable and reliable performance estimation.

2.5. Statistical Analysis

Categorical variables were presented as frequencies and percentages, and comparisons between groups were performed using the χ2 test or Fisher’s exact test. The normality and homogeneity of variance of continuous variables were assessed using the Shapiro–Wilk test and Levene’s test, respectively. Variables conforming to a normal distribution were expressed as mean ± standard deviation and compared using the independent-samples t test, whereas non-normally distributed variables were expressed as median (interquartile range) and compared using the Mann–Whitney U test [25]. All statistical analyses and machine-learning procedures were performed using Python (version 3.10.14) with the following packages: scikit-learn (version 1.3.2), SHAP (version 0.42.1), and Optuna (version 3.4.0). Feature selection was conducted within the training set only. The Boruta algorithm and univariate/multivariate logistic regression were applied independently, and only predictors identified by both approaches (i.e., the intersection set) were retained. Subsequently, zero-variance features were removed and highly correlated predictors (|r| > 0.8) were excluded. Seventeen machine-learning algorithms were evaluated to cover a broad range of common model families, including linear models, Bayesian classifiers, instance-based methods, kernel-based methods, tree-based ensembles, and boosting algorithms. This design aimed to reduce algorithmic selection bias and identify the most robust approach for multimodal glioma grading. Model training and benchmarking were performed in the training set using 5-fold stratified cross-validation. For the selected model, hyperparameter optimization was further performed within the training set using OptunaSearchCV with stratified 10-fold cross-validation, with the area under the receiver operating characteristic curve (AUC) used as the optimization objective. The hyperparameter search space included key parameters such as the number of trees, maximum depth, and minimum samples split, which were optimized within predefined ranges. Key hyperparameters were automatically tuned within predefined search ranges to identify the optimal model configuration. Evaluation metrics included Accuracy, AUC, Recall, Precision, F1-score, Cohen’s Kappa, MCC, Log Loss, Brier score, and training time. The final model was then validated in the internal validation set using receiver operating characteristic (ROC) curves, calibration curves, precision–recall (PR) curves, and decision-curve analysis (DCA) to assess its discriminative ability, calibration, identification accuracy, and clinical utility. For classification-based metrics (accuracy, sensitivity, specificity, and confusion matrix), the optimal probability threshold for predicting HGG was determined based on the maximum F1-score during model evaluation, yielding a cutoff of P(HGG) = 0.38. To enhance interpretability, SHAP was used to visualize feature contributions via summary and dependence plots. For the final Random Forest model, SHAP values were computed using the tree-based explainer (shap.TreeExplainer; SHAP package version 0.42.1), which implements TreeSHAP for tree ensembles. For categorical variables, SHAP values reflect the contribution of specific one-hot encoded categories and were interpreted accordingly. The individual prediction example was selected from the validation set.

3. Results

3.1. Baseline Characteristics

A total of 461 patients with glioma were initially identified. After rigorous medical and imaging quality review, 61 cases were excluded due to incomplete clinical data, suboptimal imaging quality, or prior interventions, leaving 400 patients in the final cohort (Figure 1). The cohort was randomly divided into a training set (n = 280) for model development and an internal validation set (n = 120). Overall, 75% (n = 300) were HGG and 25% (n = 100) were LGG. Significant differences between the HGG and LGG groups were observed in multiple demographic, clinical, and imaging variables (Table 1). Specifically, HGG patients were older, had a lower incidence of seizures, presented with more frequent FND, and had a significantly higher proportion of MS. In terms of quantitative imaging characteristics, HGG patients exhibited significantly higher ETV, Cho/NAA, and Cho/Cr values compared to LGG patients.

3.2. Prediction Factor Selection

Prediction factor selection was performed exclusively within the training cohort (n = 280) after the initial data split to prevent potential information leakage. The independent validation set was not involved in any feature selection or model development procedures. Univariate analysis identified several candidate variables significantly associated with glioma grading (p < 0.05), including age, seizures, FND, tumor laterality, lobar location, KPS ≥ 70, MS, ETV, and MRS indices (Cho/NAA, Cho/Cr, NAA/Cr). No significant differences were observed between the two groups in terms of basic characteristics such as gender, hypertension, diabetes, smoking, and alcohol history (Table 2). Variables with a p < 0.05 in univariate analysis were included in a multivariate logistic regression model (Table 3). Independent predictors for high-grade glioma were identified as: older age (OR = 1.08, 95% CI: 1.06–1.11), presence of FND (OR = 4.94, 95% CI: 2.25–10.86), MS (OR = 5.44, 95% CI: 2.88–10.27), larger ETV (OR = 1.03, 95% CI: 1.02–1.04), higher Cho/NAA (OR = 1.04, 95% CI: 1.01–1.07), higher Cho/Cr (OR = 1.20, 95% CI: 1.09–1.33). Conversely, the presence of seizures (OR = 0.26, 95% CI: 0.14–0.50), high KPS score (OR = 0.45, 95% CI: 0.23–0.87), and certain lobar locations (OR = 0.81, 95% CI: 0.67–0.91) were negatively associated with HGG. In parallel, the Boruta algorithm was applied within the training cohort to assess feature importance based on a random forest classifier, identifying 9 important features (Figure 2A,B). By intersecting the independent predictors from multivariate regression with the features selected by Boruta, we finalized a set of eight core preoperative variables for model construction (Figure 2C): age, FND, tumor laterality, tumor lobar location, MS, ETV, Cho/NAA ratio, and Cho/Cr ratio. Further Spearman correlation analysis revealed weak correlations between the variables (all |r| < 0.55), suggesting the absence of severe multicollinearity and supporting inclusion of these variables in subsequent model construction(Figure 2D).

3.3. Machine Learning Model Training and Selection

Our model was based on the 8 features identified in the previous section. In the training set, 17 machine learning algorithms were benchmarked using 5-fold cross-validation. Model performance was evaluated using Accuracy, AUC, Recall, Prec., F1, Kappa, MCC, Log Loss, Brier, and TT(Sec). Comparative analysis showed that the Random Forest algorithm demonstrated the highest overall performance among the evaluated models, achieving the best AUC and consistently strong performance across multiple evaluation metrics (Supplementary Table S1) (Figure 3). Therefore, we selected the Random Forest model as the predicting model for preoperative glioma grading in this study.

3.4. Model Performance Validation

The final Random Forest model was rigorously evaluated on the internal validation set (n = 120) to assess its discriminative ability, calibration, and clinical utility. The results demonstrated that the model performed stably and exhibited strong discriminative power on unseen data.;
In terms of discriminative ability, the model achieved an area under the receiver operating characteristic curve (AUC) of 0.946 (95% CI: 0.902–0.989), indicating excellent discriminative ability in distinguishing HGG from LGG (Figure 4A). The precision-recall (PR) curve reached an average precision of 0.98, further illustrating the model’s high recognition accuracy despite the imbalanced class distribution in the dataset (Figure 4B). The calibration curve demonstrated reasonable agreement between predicted and observed probabilities (calibration slope = 1.29, intercept = 0.38), although slight overestimation of risk was observed (Figure 4C). Using the predefined optimal probability threshold (P(HGG) = 0.38), the confusion matrix showed an overall accuracy of 88.3%, sensitivity of 92.2%, and specificity of 76.7%. The balanced accuracy was 84.5%, indicating stable classification performance across both classes under imbalanced conditions (Figure 4D). Decision curve analysis demonstrated that across threshold probabilities of approximately 0.05–0.75, the Random Forest model yielded a higher net benefit than both the treat-all and treat-none strategies, indicating potential clinical utility within this range (Figure 4E).
Overall, the Random Forest model demonstrated strong performance in the internal validation set, supporting its potential utility for non-invasive preoperative glioma grading.

3.5. Model Interpretability Analysis Based on SHAP

To elucidate the decision-making process of the Random Forest model, Shapley Additive exPlanations (SHAP) was used for post hoc interpretability analysis. The SHAP summary plot (Figure 5A) illustrates the relative importance and directional contribution of each feature.
Overall, the contributions of different features to the model output varied in magnitude, and the direction of the major contributing features aligned closely with existing clinical knowledge. Among the features, the Cho/NAA ratio was identified as the most influential feature, with higher values associated with positive SHAP values (red points), indicating increased probability of HGG. Similarly, elevated Cho/Cr ratio, increased patient age, larger ETV, and the presence of both MS and FND were associated with positive SHAP values. These findings were consistent with known clinical associations of high-grade glioma. Furthermore, other features, including the tumor laterality and lobar location also contributed to the final model output to varying degrees.
To further illustrate the decision-making process at the individual level, we present a SHAP waterfall plot for a randomly selected patient (Figure 5B). The baseline prediction for this patient (the overall mean predicted probability) was 0.556. After incrementally adding the positive and negative contributions of each feature, the model adjusted the patient’s predicted probability to 0.89 for HGG. In this case, advanced age (66 years) and a high Cho/Cr ratio (15.32) were the major positive contributors increasing the predicted risk, while the relatively low Cho/NAA ratio (5.29) acted as a negative contributor, offsetting influence. This visualization provides an interpretable decomposition of the individual prediction into feature-level contributions.

4. Discussion

Glioma represents the most prevalent primary malignant tumor of the central nervous system [1]. Despite advances in multimodal diagnosis and treatment, prognosis remains unsatisfactory due to aggressive biological behavior [2]. Accurate preoperative grading is therefore critical for surgical planning and adjuvant treatment selection under the Stupp protocol [26,27]. However, conventional grading relies on postoperative histopathology, which is invasive and may not adequately support preoperative decision-making [28]. In the present study, we developed and internally validated an interpretable machine-learning model integrating clinical variables, structural imaging features, and MRS-derived metabolic biomarkers for preoperative glioma grading. The results indicate that the model demonstrated strong discriminatory performance in the internal validation cohort. In addition, routinely available multimodal variables contributed meaningfully to prediction, and SHAP-based interpretation improved transparency of the decision process at both global and individual levels. These findings suggest that integrating accessible multimodal data with interpretable machine-learning may represent a feasible approach for preoperative glioma risk stratification in a single-center setting.
With an AUC of 0.946, our model achieved performance comparable to advanced radiomics-based approaches [29,30] and to studies reporting higher AUC values using complex image-processing pipelines [31]. This comparison is noteworthy because many previously published models depend on high-dimensional radiomic extraction, labor-intensive image preprocessing, or dedicated post-processing workflows, which may limit reproducibility and routine clinical implementation [30,31]. In addition, some prior investigations rely predominantly on single-modality features or focus on isolated clinical associations without integrated validation [16,32], highlighting the limited integration of multimodal information in existing approaches. In contrast, our model integrates clinically interpretable variables and routine imaging parameters within a unified framework, thereby offering a more pragmatic modeling strategy for exploratory preoperative assessment. The favorable performance of the Random Forest model may be attributed to its ability to capture complex nonlinear relationships and interactions among multimodal predictors. Given the heterogeneity of clinical symptoms, structural imaging features, and metabolic biomarkers, ensemble tree-based methods may offer advantages over simpler linear approaches in modeling such heterogeneous clinical-imaging data.
Among the selected predictors, MRS-derived metabolic ratios played a prominent role. Elevated Cho/NAA and Cho/Cr ratios were positively associated with high-grade classification, consistent with the biological interpretation that choline reflects membrane turnover, NAA represents neuronal integrity, and creatine serves as a relatively stable metabolic reference [33,34,35]. These metabolic alterations suggest increased tumor proliferation and neuronal disruption, aligning with prior findings [35]. However, variability in MRS acquisition parameters, voxel placement, and scanner differences across institutions may affect reproducibility. Broader validation under heterogeneous acquisition conditions is therefore warranted. Structural imaging features, including midline shift (MS) and enhancing tumor volume (ETV), also contributed meaningfully. HGG typically exhibits rapid growth, substantial mass effect, and blood–brain barrier disruption, leading to increased MS and ETV. Tumor volume in this study was estimated using the ellipsoid approximation formula, which reflects routine clinical practice but may introduce measurement bias compared with full volumetric segmentation. Future incorporation of precise volumetric techniques may further refine predictive performance. Clinical variables such as age and focal neurological deficits (FND) additionally strengthened model prediction. Older age is well recognized as being associated with higher-grade glioma and poorer prognosis [36], while FND likely reflects aggressive infiltrative growth and vasogenic edema [37,38,39]. Tumor laterality and lobar location also demonstrated spatial associations with grading risk [40,41,42], although the underlying biological mechanisms warrant further exploration.
Importantly, the model outputs an individualized probability of HGG (P(HGG)), allowing clinicians to interpret risk on a continuous scale rather than relying solely on binary classification. For the purpose of calculating classification-based performance metrics, a predefined cutoff (P(HGG) = 0.38) was applied, yielding a sensitivity of 92.2% and specificity of 76.7% in the validation cohort. From a clinical perspective, this probabilistic output may be more informative than a simple dichotomous label, as it can assist clinicians in integrating model estimates with anatomical location, symptom burden, and other perioperative considerations. This may provide additional support for preoperative risk assessment and clinical judgment. Nevertheless, the current model should be regarded as a decision-support tool for exploratory evaluation rather than a standalone basis for treatment selection. Decision curve analysis further demonstrated net benefit across threshold probabilities of approximately 0.05–0.75, supporting potential clinical utility within this range. Although discrimination was strong, calibration analysis indicated mild overestimation of predicted probabilities (slope = 1.29, intercept = 0.38), possibly related to the relatively high HGG prevalence in the cohort. External validation and recalibration may further improve probability accuracy.
Several aspects of this study may strengthen its practical relevance. First, the model integrates multimodal yet routinely obtainable preoperative data without requiring complex radiomics extraction or dedicated segmentation software. Second, the dual feature-selection strategy enhances robustness by retaining consistently supported predictors. Third, incorporation of SHAP-based interpretability allows transparent visualization of feature contributions, helping bridge the gap between algorithmic prediction and clinical reasoning. These features may improve methodological transparency and facilitate future external testing.
This study has several limitations. Its single-center retrospective design may introduce selection bias, and although the overall sample size met methodological requirements, the relatively smaller LGG subgroup may influence performance stability. The model has not yet undergone external validation, and generalizability across diverse populations, imaging platforms, and acquisition protocols remains to be established. Moreover, potential dataset shift arising from variations in imaging protocols, scanner types, and patient populations across institutions may affect model performance. Furthermore, the classification threshold was determined based on the F1-score during model evaluation rather than within a cross-validation or external validation framework, which may introduce a degree of optimistic bias. Additionally, reliance on MRS parameters and approximate volumetric estimation may limit reproducibility in settings where standardized protocols are unavailable. Prospective multicenter validation, recalibration, and evaluation of real-world clinical impact and cost-effectiveness are warranted to further define the model’s practical applicability.
This study has several limitations. Its single-center retrospective design may introduce selection bias, and although the overall sample size met methodological requirements, the relatively smaller LGG subgroup may influence performance stability. The model has not yet undergone external validation, and its generalizability across different populations, imaging platforms, and acquisition protocols remains to be established. Potential dataset shift related to variations in imaging protocols, scanner types, and patient characteristics across institutions may also affect model performance. The classification threshold was determined based on the F1-score during model evaluation rather than within a cross-validation or external validation framework, which may introduce a degree of optimistic bias. In addition, reliance on MRS parameters and approximate volumetric estimation may limit reproducibility in settings without standardized imaging protocols. Prospective multicenter validation, model recalibration, and evaluation of real-world clinical impact and cost-effectiveness are warranted to further define the model’s practical applicability.

5. Conclusions

This study developed and internally validated a multimodal machine-learning model for non-invasive preoperative glioma grading by integrating clinical features, structural imaging, and MRS-derived metabolic biomarkers. The model showed good discrimination and acceptable calibration in the internal validation cohort and provided individualized probability estimates for glioma grading, supporting non-invasive preoperative assessment and risk stratification. However, given the single-center retrospective design and the lack of external or prospective validation, clinical application remains premature, and further validation in independent cohorts is required.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cancers18081204/s1, Table S1: Performance metrics for the evaluated models, including Accuracy, AUC, Recall, Precision, F1-score, Cohen’s Kappa, Matthews Correlation Coefficient (MCC), Log Loss, Brier Score, and Training Time (seconds).

Author Contributions

X.R. (Xianfeng Rao) and M.Y. (Min Yang) conceived and designed the study. N.Z. made substantial contributions to the design and implementation of the study’s methodological framework. H.C., G.L., L.W., L.G., M.Y. (Mingchun Yang), H.W., Y.D. and G.C. were responsible for data collection and interpretation. X.R. (Xianjun Rao), X.W. and L.T. performed data analysis and visualization. M.Y. (Min Yang) wrote the original manuscript. X.W. and L.T. critically revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by grants from the National Natural Science Foundation of China [Grant No. 82573744], the National Natural Science Foundation of China [Grant No. 82473135], the Heilongjiang Provincial Charity Federation [Grant No. 2021HX041], and the Natural Science Foundation of Heilongjiang Province [Grant No. PL2025H091].

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, the research protocol was approved by the Ethics Committee of the First Affiliated Hospital of Harbin Medical University (YS069) and informed consent was waived.

Informed Consent Statement

Patient consent was waived due to the retrospective nature of the study, which involved analysis of anonymized data. The requirement for informed consent was waived by the Institutional Review Board (or Ethics Committee).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

We would like to thank all of the patients who participated in the study and the staff of the participating hospital for their support.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AUCArea Under the Curve
BrierBrier Score
CICognitive and psychiatric symptoms
CSCerebellar symptoms
DCADecision-curve analysis
ETVEnhanced tumor volume
FNDFocal neurological deficits
F1F1-Score
HGGHigh-grade glioma
IIPIntracranial increased pressure
KappaCohen’s Kappa
LGGLow-grade glioma
MSMidline shift
MCCMatthews Correlation Coefficient
Prec. Precision
ROCReceiver operating characteristic
TT(Sec)Training Time (Seconds)

References

  1. Siegel, R.L.; Kratzer, T.B.; Giaquinto, A.N.; Sung, H.; Jemal, A. Cancer statistics, 2025. CA Cancer J. Clin. 2025, 75, 10–45. [Google Scholar]
  2. Bray, F.; Laversanne, M.; Sung, H.; Ferlay, J.; Siegel, R.L.; Soerjomataram, I.; Jemal, A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2024, 74, 229–263. [Google Scholar]
  3. Weller, M.; Wen, P.Y.; Chang, S.M.; Dirven, L.; Lim, M.; Monje, M.; Reifenberger, G. Glioma. Nature reviews. Dis. Primers 2024, 10, 33. [Google Scholar]
  4. Lapointe, S.; Perry, A.; Butowski, N.A. Primary brain tumours in adults. Lancet 2018, 392, 432–446. [Google Scholar] [CrossRef] [PubMed]
  5. Gritsch, S.; Batchelor, T.T.; Castro, L.N.G. Diagnostic, therapeutic, and prognostic implications of the 2021 World Health Organization classification of tumors of the central nervous system. Cancer 2022, 128, 47–58. [Google Scholar]
  6. Weller, M.; Wick, W.; Aldape, K.; Brada, M.; Berger, M.; Pfister, S.M.; Nishikawa, R.; Rosenthal, M.; Wen, P.Y.; Stupp, R.; et al. Glioma. Nature reviews. Dis. Primers 2015, 1, 15017. [Google Scholar]
  7. Gusyatiner, O.; Hegi, M.E. Glioma epigenetics: From subclassification to novel treatment options. Semin. Cancer Biol. 2018, 51, 50–58. [Google Scholar] [CrossRef] [PubMed]
  8. Bale, T.A.; Rosenblum, M.K. The 2021 WHO Classification of Tumors of the Central Nervous System: An update on pediatric low-grade gliomas and glioneuronal tumors. Brain Pathol. 2022, 32, e13060. [Google Scholar] [CrossRef] [PubMed]
  9. van den Bent, M.J.; French, P.J.; Brat, D.; Tonn, J.C.; Touat, M.; Ellingson, B.M.; Young, R.J.; Pallud, J.; von Deimling, A.; Sahm, F.; et al. The biological significance of tumor grade, age, enhancement, and extent of resection in IDH-mutant gliomas: How should they inform treatment decisions in the era of IDH inhibitors? Neuro-Oncology 2024, 26, 1805–1822. [Google Scholar] [PubMed]
  10. Bhatia, A.; Moreno, R.; Reiner, A.S.; Nandakumar, S.; Walch, H.S.; Thomas, T.M.; Nicklin, P.J.; Choi, Y.; Skakodub, A.; Malani, R.; et al. Tumor Volume Growth Rates and Doubling Times during Active Surveillance of IDH-mutant Low-Grade Glioma. Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res. 2024, 30, 106–115. [Google Scholar]
  11. Cordone, I.; Masi, S.; Carosi, M.; Vidiri, A.; Marchesi, F.; Marino, M.; Telera, S.; Pasquale, A.; Mengarelli, A.; Conti, L.; et al. Brain stereotactic biopsy flow cytometry for central nervous system lymphoma characterization: Advantages and pitfalls. J. Exp. Clin. Cancer Res. 2016, 35, 128. [Google Scholar] [CrossRef] [PubMed]
  12. Azizova, A.; Prysiazhniuk, Y.; Wamelink, I.J.H.G.; Cakmak, M.; Kaya, E.; Wesseling, P.; de Witt Hamer, P.C.; Verburg, N.; Petr, J.; Barkhof, F.; et al. Preoperative prediction of diffuse glioma type and grade in adults: A gadolinium-free MRI-based decision tree. Eur. Radiol. 2025, 35, 1242–1254. [Google Scholar]
  13. Wang, Q.; Li, Q.; Mi, R.; Ye, H.; Zhang, H.; Chen, B.; Li, Y.; Huang, G.; Xia, J. Radiomics Nomogram Building from Multiparametric MRI to Predict Grade in Patients with Glioma: A Cohort Study. J. Magn. Reson. Imaging JMRI 2019, 49, 825–833. [Google Scholar] [PubMed]
  14. Kim, Y.; Cho, H.H.; Kim, S.T.; Park, H.; Nam, D.; Kong, D.S. Radiomics features to distinguish glioblastoma from primary central nervous system lymphoma on multi-parametric MRI. Neuroradiology 2018, 60, 1297–1305. [Google Scholar] [CrossRef] [PubMed]
  15. Gao, M.; Cheng, J.; Qiu, A.; Zhao, D.; Wang, J.; Liu, J. Magnetic resonance imaging (MRI)-based intratumoral and peritumoral radiomics for prognosis prediction in glioma patients. Clin. Radiol. 2024, 79, e1383–e1393. [Google Scholar] [CrossRef] [PubMed]
  16. Li, L.; Zhang, C.; Wang, Z.; Wang, Y.; Guo, Y.; Qi, C.; You, G.; Zhang, Z.; Fan, X.; Jiang, T. Development of an integrated predictive model for postoperative glioma-related epilepsy using gene-signature and clinical data. BMC Cancer 2023, 23, 42. [Google Scholar]
  17. Xu, P.; Liu, D.; Hu, H.; Liu, T.; Liang, C.; Wen, Z.; Yuan, K.; Yin, J.; Orlandini, L.C.; Lang, J. Magnetic resonance spectroscopy guided radiotherapy boost for patients with glioblastoma. Sci. Rep. 2025, 15, 13371. [Google Scholar] [CrossRef] [PubMed]
  18. Louis, D.N.; Perry, A.; Wesseling, P.; Brat, D.J.; Cree, I.A.; Figarella-Branger, D.; Hawkins, C.; Ng, H.K.; Pfister, S.M.; Reifenberger, G.; et al. The 2021 WHO Classification of Tumors of the Central Nervous System: A summary. Neuro-Oncology 2021, 23, 1231–1251. [Google Scholar] [CrossRef]
  19. Wu, M.; Jiang, T.; Guo, M.; Duan, Y.; Zhuo, Z.; Weng, J.; Xie, C.; Sun, J.; Li, J.; Cheng, D.; et al. Amide proton transfer-weighted imaging and derived radiomics in the classification of adult-type diffuse gliomas. Eur. Radiol. 2024, 34, 2986–2996. [Google Scholar] [CrossRef]
  20. Guo, S.B.; Liu, D.Y.; Fang, X.J.; Meng, Y.; Zhou, Z.Z.; Li, J.; Li, M.; Luo, L.L.; Li, H.L.; Cai, X.Y.; et al. Current concerns and future directions of large language model ChatGPT in medicine: A machine-learning-driven global-scale bibliometric analysis. Int. J. Surg. 2026, 2, 2805–2822. [Google Scholar]
  21. Onakpojeruo, E.P.; Mustapha, M.T.; Ozsahin, D.U.; Ozsahin, I. Enhanced MRI-based brain tumour classification with a novel Pix2pix generative adversarial network augmentation framework. Brain Commun. 2024, 6, fcae372. [Google Scholar] [CrossRef] [PubMed]
  22. Onakpojeruo, E.P.; Mustapha, M.T.; Ozsahin, D.U.; Ozsahin, I. A Comparative Analysis of the Novel Conditional Deep Convolutional Neural Network Model, Using Conditional Deep Convolutional Generative Adversarial Network-Generated Synthetic and Augmented Brain Tumor Datasets for Image Classification. Brain Sci. 2024, 14, 559. [Google Scholar] [CrossRef]
  23. Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMJ 2015, 350, g7594. [Google Scholar] [CrossRef]
  24. Vos-Kerkhof, E.; Gomez, B.; Milcent, K.; Steyerberg, E.W.; Nijman, R.G.; Smit, F.J.; Mintegi, S.; Moll, H.A.; Gajdos, V.; Oostenbrink, R. Clinical prediction models for young febrile infants at the emergency department: An international validation study. Arch. Dis. Child. 2018, 103, 1033–1041. [Google Scholar] [CrossRef]
  25. Yang, M.; Xie, Y.; Zhuang, Y.; Chen, Y.; Lin, X.; Liu, Z.; Zhang, P.; Xiao, W.; Chen, Y.; Chen, C.; et al. Risk factors and predictive models for early recurrent intussusception in children: A retrospective cohort study. Transl. Pediatr. 2023, 12, 1800–1809. [Google Scholar] [CrossRef] [PubMed]
  26. Stupp, R.; Mason, W.P.; van den Bent, M.J.; Weller, M.; Fisher, B.; Taphoorn, M.J.; Belanger, K.; Brandes, A.A.; Marosi, C.; Bogdahn, U.; et al. Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. N. Engl. J. Med. 2005, 352, 987–996. [Google Scholar] [CrossRef]
  27. Obrador, E.; Moreno-Murciano, P.; Oriol-Caballo, M.; López-Blanch, R.; Pineda, B.; Gutiérrez-Arroyo, J.L.; Loras, A.; Gonzalez-Bonet, L.G.; Martinez-Cadenas, C.; Estrela, J.M.; et al. Glioblastoma Therapy: Past, Present and Future. Int. J. Mol. Sci. 2024, 25, 2529. [Google Scholar] [CrossRef]
  28. Fels-Palesandro, H.; Heuer, S.; Boztepe, B.; Streibel, Y.; Ungermann, J.; Pan, C.; Scheck, J.G.; Fischer, M.; Sturm, V.J.; Azorín, D.D.; et al. Assessment of Tumor Cell Invasion and Radiotherapy Response in Experimental Glioma by Magnetic Resonance Elastography. J. Magn. Reson. Imaging JMRI 2025, 61, 1203–1218. [Google Scholar] [CrossRef] [PubMed]
  29. Xu, J.; Ren, Y.; Zhao, X.; Wang, X.; Yu, X.; Yao, Z.; Zhou, Y.; Feng, X.; Zhou, X.J.; Wang, H. Incorporating multiple magnetic resonance diffusion models to differentiate low- and high-grade adult gliomas: A machine learning approach. Quant. Imaging Med. Surg. 2022, 12, 5171–5183. [Google Scholar] [CrossRef]
  30. Du, P.; Liu, X.; Wu, X.; Chen, J.; Cao, A.; Geng, D. Predicting Histopathological Grading of Adult Gliomas Based on Preoperative Conventional Multimodal MRI Radiomics: A Machine Learning Model. Brain Sci. 2023, 13, 912. [Google Scholar] [CrossRef] [PubMed]
  31. Lin, K.; Cidan, W.; Qi, Y.; Wang, X. Glioma grading prediction using multiparametric magnetic resonance imaging-based radiomics combined with proton magnetic resonance spectroscopy and diffusion tensor imaging. Med. Phys. 2022, 49, 4419–4429. [Google Scholar] [CrossRef]
  32. Weng, W.; Chen, X.; Gong, S.; Guo, L.; Zhang, X. Preoperative neutrophil-lymphocyte ratio correlated with glioma grading and glioblastoma survival. Neurol. Res. 2018, 40, 917–922. [Google Scholar] [CrossRef]
  33. Han, X.; Xiao, K.; Bai, J.; Li, F.; Cui, B.; Cheng, Y.; Liu, H.; Lu, J. Multimodal MRI and (1)H-MRS for Preoperative Stratification of High-Risk Molecular Subtype in Adult-Type Diffuse Gliomas. Diagnostics 2024, 14, 2569. [Google Scholar]
  34. Muruganandham, M.; Clerkin, P.P.; Smith, B.J.; Anderson, C.M.; Morris, A.; Capizzano, A.A.; Magnotta, V.; McGuire, S.M.; Smith, M.C.; Bayouth, J.E.; et al. 3-Dimensional magnetic resonance spectroscopic imaging at 3 Tesla for early response assessment of glioblastoma patients during external beam radiation therapy. Int. J. Radiat. Oncol. Biol. Phys. 2014, 90, 181–189. [Google Scholar] [CrossRef] [PubMed]
  35. Lu, W.; Feng, J.; Zou, Y.; Liu, Y.; Gao, P.; Zhao, Y.; Wu, X.; Ma, H. 1H-MRS parameters in non-enhancing peritumoral regions can predict the recurrence of glioblastoma. Sci. Rep. 2024, 14, 29258. [Google Scholar] [CrossRef] [PubMed]
  36. Ge, Z.; Jin, G.; Wang, H.; Wang, S.; Fei, X.; Li, S.; Zhang, L.; Jiang, X.; Zhang, Y. Risk factors for postoperative malignant progression of lower-grade gliomas: A systematic review and meta-analysis. Surg. Oncol. 2025, 64, 102327. [Google Scholar] [CrossRef]
  37. Ohmura, K.; Tomita, H.; Hara, A. Peritumoral Edema in Gliomas: A Review of Mechanisms and Management. Biomedicines 2023, 11, 2731. [Google Scholar] [CrossRef] [PubMed]
  38. Lacroix, M.; Abi-Said, D.; Fourney, D.R.; Gokaslan, Z.L.; Shi, W.; DeMonte, F.; Lang, F.F.; McCutcheon, I.E.; Hassenbusch, S.J.; Holland, E.; et al. A multivariate analysis of 416 patients with glioblastoma multiforme: Prognosis, extent of resection, and survival. J. Neurosurg. 2001, 95, 190–198. [Google Scholar] [CrossRef]
  39. Laws, E.R.; Parney, I.F.; Huang, W.; Anderson, F.; Morris, A.M.; Asher, A.; Lillehei, K.O.; Bernstein, M.; Brem, H.; Sloan, A.; et al. Survival following surgery and prognostic factors for recently diagnosed malignant glioma: Data from the Glioma Outcomes Project. J. Neurosurg. 2003, 99, 467–473. [Google Scholar] [CrossRef]
  40. Houskamp, E.J.; Skorich, E.L.; Mackie, M.A.; Tate, M.C. Impact of High-Grade Glioma Lesion Location on Preoperative Neuropsychological Deficits. Cancers 2025, 17, 2775. [Google Scholar] [CrossRef]
  41. Zhao, K.; Gu, J.; Zhao, B. Impact of gliomas in different brain hemispheres on anxiety and the prognosis of neurosurgical patients. Sci. Rep. 2025, 15, 39280. [Google Scholar] [CrossRef]
  42. Zhao, C.; Liang, B.; Li, X.; Ma, P.; Li, Y.; Qian, Z.; Zhang, Z.; Jiang, T.; Zhang, W. Anatomical distribution and prognostic heterogeneity in glioma: Unique clinical features of occipital glioblastoma. J. Neuro-Oncol. 2025, 175, 525–537. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Flowchart of patient inclusion and exclusion. This flowchart illustrates the screening process used to derive the final study cohort from an initial pool of 461 patients with glioma. A total of 61 patients were excluded according to the prespecified criteria, and 400 patients with complete preoperative clinical informatiosn, imaging datasets, and postoperative pathological confirmation were included for analysis. This structured selection process provides a transparent overview of cohort construction.
Figure 1. Flowchart of patient inclusion and exclusion. This flowchart illustrates the screening process used to derive the final study cohort from an initial pool of 461 patients with glioma. A total of 61 patients were excluded according to the prespecified criteria, and 400 patients with complete preoperative clinical informatiosn, imaging datasets, and postoperative pathological confirmation were included for analysis. This structured selection process provides a transparent overview of cohort construction.
Cancers 18 01204 g001
Figure 2. Feature selection process and results based on multimodal preoperative data. (A) Boruta feature importance across classifier runs. Variables consistently ranked above shadow features (green) were confirmed as important, while those below (blue) were rejected, indicating clear separation between relevant and non-informative predictors. Red bars represent shadow features used as a reference for importance comparison. (B) Distribution of Boruta importance scores (Z-scores). Variables such as ETV, CHO/NAA, Age, and CHO/Cr show higher median importance and more stable distributions, suggesting stronger and more consistent contributions to glioma grading. (C) Overlap of selected features from Boruta and regression analyses. The intersection highlights robust predictors (Age, FND, Laterality, Lobe, MS, ETV, CHO/NAA, and CHO/Cr) that were consistently identified across different methods, improving the reliability of feature selection. (D) Correlation heatmap of selected variables. Most feature pairs show weak to moderate correlations, with no evident strong multicollinearity, supporting the inclusion of these variables in the subsequent model.
Figure 2. Feature selection process and results based on multimodal preoperative data. (A) Boruta feature importance across classifier runs. Variables consistently ranked above shadow features (green) were confirmed as important, while those below (blue) were rejected, indicating clear separation between relevant and non-informative predictors. Red bars represent shadow features used as a reference for importance comparison. (B) Distribution of Boruta importance scores (Z-scores). Variables such as ETV, CHO/NAA, Age, and CHO/Cr show higher median importance and more stable distributions, suggesting stronger and more consistent contributions to glioma grading. (C) Overlap of selected features from Boruta and regression analyses. The intersection highlights robust predictors (Age, FND, Laterality, Lobe, MS, ETV, CHO/NAA, and CHO/Cr) that were consistently identified across different methods, improving the reliability of feature selection. (D) Correlation heatmap of selected variables. Most feature pairs show weak to moderate correlations, with no evident strong multicollinearity, supporting the inclusion of these variables in the subsequent model.
Cancers 18 01204 g002aCancers 18 01204 g002b
Figure 3. Performance comparison of multiple machine-learning classifiers in glioma grading prediction. This figure shows the normalized performance of 17 machine-learning models evaluated using multiple metrics, including Accuracy, AUC, Recall, Precision, F1-score, Cohen’s Kappa, MCC, Log Loss, Brier score, and training time. All metrics were normalized to enable cross-model comparison on a unified scale. Models are ordered according to their overall performance across the evaluated metrics. This comparative analysis facilitates the identification of models with balanced and robust performance across different evaluation criteria. The Random Forest classifier showed the highest overall performance among the evaluated models and was therefore selected as the final model for subsequent analysis.
Figure 3. Performance comparison of multiple machine-learning classifiers in glioma grading prediction. This figure shows the normalized performance of 17 machine-learning models evaluated using multiple metrics, including Accuracy, AUC, Recall, Precision, F1-score, Cohen’s Kappa, MCC, Log Loss, Brier score, and training time. All metrics were normalized to enable cross-model comparison on a unified scale. Models are ordered according to their overall performance across the evaluated metrics. This comparative analysis facilitates the identification of models with balanced and robust performance across different evaluation criteria. The Random Forest classifier showed the highest overall performance among the evaluated models and was therefore selected as the final model for subsequent analysis.
Cancers 18 01204 g003
Figure 4. Performance evaluation of the random forest model in the internal validation set. (A) ROC curve showing high discriminative ability (AUC = 0.946, 95% CI: 0.902–0.989), indicating effective differentiation between low-grade and high-grade gliomas. (B) Precision–recall curve demonstrating strong performance in identifying high-grade glioma, with a high average precision (AP = 0.98) across recall levels. (C) Calibration curve: Predicted probabilities are compared with observed event frequencies; the calibration analysis yielded a slope of 1.29 and an intercept of 0.38, indicating a deviation from perfect calibration. (D) Confusion matrix: using the predefined optimal threshold P(HGG) = 0.38, the model achieved an accuracy of 88.3%, sensitivity of 92.2%, and specificity of 76.7%. (E) Decision curve analysis: the model showed a higher net benefit than the treat-all and treat-none strategies across threshold probabilities of approximately 0.05–0.75.
Figure 4. Performance evaluation of the random forest model in the internal validation set. (A) ROC curve showing high discriminative ability (AUC = 0.946, 95% CI: 0.902–0.989), indicating effective differentiation between low-grade and high-grade gliomas. (B) Precision–recall curve demonstrating strong performance in identifying high-grade glioma, with a high average precision (AP = 0.98) across recall levels. (C) Calibration curve: Predicted probabilities are compared with observed event frequencies; the calibration analysis yielded a slope of 1.29 and an intercept of 0.38, indicating a deviation from perfect calibration. (D) Confusion matrix: using the predefined optimal threshold P(HGG) = 0.38, the model achieved an accuracy of 88.3%, sensitivity of 92.2%, and specificity of 76.7%. (E) Decision curve analysis: the model showed a higher net benefit than the treat-all and treat-none strategies across threshold probabilities of approximately 0.05–0.75.
Cancers 18 01204 g004
Figure 5. Interpretation of the random forest model using SHAP. (A) SHAP summary plot: features are ranked by mean absolute SHAP values. Each point represents a patient in the validation cohort, with SHAP values indicating the direction and magnitude of each feature’s contribution to the predicted probability of HGG (positive values increase the probability of HGG). (B) SHAP waterfall plot for an illustrative patient from the validation set: the plot decomposes the predicted probability (0.89) into feature-level contributions relative to the baseline model output (E[f(X)]). Key contributing features, including age (66 years) and Cho/Cr ratio (15.32), are shown as major drivers of the prediction in this example.
Figure 5. Interpretation of the random forest model using SHAP. (A) SHAP summary plot: features are ranked by mean absolute SHAP values. Each point represents a patient in the validation cohort, with SHAP values indicating the direction and magnitude of each feature’s contribution to the predicted probability of HGG (positive values increase the probability of HGG). (B) SHAP waterfall plot for an illustrative patient from the validation set: the plot decomposes the predicted probability (0.89) into feature-level contributions relative to the baseline model output (E[f(X)]). Key contributing features, including age (66 years) and Cho/Cr ratio (15.32), are shown as major drivers of the prediction in this example.
Cancers 18 01204 g005
Table 1. (A) Baseline clinical characteristics of all participants. (B) Baseline imaging structure characteristics of all participants. (C) Baseline MRS parameters of all participants.
Table 1. (A) Baseline clinical characteristics of all participants. (B) Baseline imaging structure characteristics of all participants. (C) Baseline MRS parameters of all participants.
VariableTotalLGGHGGp-ValueChi-Squared
N = 400N = 100N = 300
(A) Clinical characteristics
Gender, n (%) 0.9060.014
Male242 (60.5)60 (60)182 (60.7)
Female158 (39.5)40 (40)118 (39.3)
Age, Median (IQR)53.0 (43.8, 62.0)42.5 (29.8, 51.2)55.0 (47.0, 64.0)<0.00161.759
Hypertension, n (%) 0.1022.675
No326 (81.5)87 (87)239 (79.7)
Yes74 (18.5)13 (13)61 (20.3)
Diabetes, n (%) 0.0633.46
No377 (94.2)98 (98)279 (93)
Yes23 (5.8)2 (2)21 (7)
Smoking, n (%) 0.3710.801
No309 (77.2)74 (74)235 (78.3)
Yes91 (22.8)26 (26)65 (21.7)
Drinking alcohol, n (%) 0.8760.024
No334 (83.5)83 (83)251 (83.7)
Yes66 (16.5)17 (17)49 (16.3)
IIP, n (%) 0.6860.163
No197 (49.2)51 (51)146 (48.7)
Yes203 (50.7)49 (49)154 (51.3)
Seizure, n (%) <0.00123.171
No331 (82.8)67 (67)264 (88)
Yes69 (17.2)33 (33)36 (12)
FND, n (%) <0.00120.802
No270 (67.5)86 (86)184 (61.3)
Yes130 (32.5)14 (14)116 (38.7)
CI, n (%) 0.670.181
No368 (92.0)93 (93)275 (91.7)
Yes32 (8.0)7 (7)25 (8.3)
CS, n (%) 0.263Fisher
No382 (95.5)98 (98)284 (94.7)
Yes18 (4.5)2 (2)16 (5.3)
Duration, Median (IQR)1.0 (0.3, 2.0)1.0 (0.3, 4.0)0.7 (0.3, 2.0)0.1661.921
BMI, Median (IQR)23.7 (21.5, 26.1)24.0 (21.3, 26.7)23.5 (21.7, 26.0)0.8400.041
KPS ≥ 70, n (%) 0.00110.73
No120 (30.0)17 (17)103 (34.3)
Yes280 (70.0)83 (83)197 (65.7)
(B) Imaging characteristics
Laterality, n (%) <0.00114.782
Bilateral/Midline22 (5.5)13 (13)9 (3)
Left176 (44.0)43 (43)133 (44.3)
Right202 (50.5)44 (44)158 (52.7)
Lobe, n (%) 0.002Fisher
Multilobar72 (18.0)13 (13)59 (19.7)
Frontal Lobe162 (40.5)43 (43)119 (39.7)
Temporal Lobe84 (21.0)15 (15)69 (23)
Parietal Lobe40 (10.0)8 (8)32 (10.7)
Occipital Lobe7 (1.8)1 (1)6 (2)
Deep Structures30 (7.5)17 (17)13 (4.3)
Posterior Fossa5 (1.2)3 (3)2 (0.7)
MS, n (%) <0.00155.505
No195 (48.8)81 (81)114 (38)
Yes205 (51.2)19 (19)186 (62)
Unifocal, n (%) 0.0922.831
No21 (5.2)2 (2)19 (6.3)
Yes379 (94.8)98 (98)281 (93.7)
ETV, Median (IQR)57.8 (20.0, 83.0)17.7 (12.7, 27.4)73.8 (33.7, 88.3)<0.00187.521
(C) MRS parameters
CHO/NAA, Median (IQR)12.2 (5.1, 21.5)3.7 (2.2, 6.4)16.6 (8.1, 24.7)<0.00199.311
CHO/Cr, Median (IQR)4.3 (2.6, 8.6)2.4 (1.8, 3.0)5.8 (3.4, 8.8)<0.00195.702
NAA/Cr, Median (IQR)0.4 (0.2, 0.5)0.5 (0.4, 0.8)0.3 (0.2, 0.5)<0.00124.277
Table 2. (A) Univariate comparison of clinical candidate predictors between LGG and HGG in the training cohort. (B) Univariate comparison of structural imaging candidate predictors between LGG and HGG in the training cohort. (C) Univariate comparison of MRS candidate predictors between LGG and HGG in the training cohort.
Table 2. (A) Univariate comparison of clinical candidate predictors between LGG and HGG in the training cohort. (B) Univariate comparison of structural imaging candidate predictors between LGG and HGG in the training cohort. (C) Univariate comparison of MRS candidate predictors between LGG and HGG in the training cohort.
VariablesTotal (n = 280)LGG (n = 69)HGG (n = 211)p-ValueChi-Squared
(A) Clinical characteristics
Gender, n (%) 0.4160.662
Male171 (61.1)45 (65.2)126 (59.7)
Female109 (38.9)24 (34.8)85 (40.3)
Age, Median (IQR)52.0 (43.8, 63.0)43.0 (32.0, 51.0)56.0 (47.0, 64.0)<0.00143.226
Hypertension, n (%) 0.1372.212
No231 (82.5)61 (88.4)170 (80.6)
Yes49 (17.5)8 (11.6)41 (19.4)
Diabetes, n (%) 0.009Fisher
No263 (93.9)69 (100)194 (91.9)
Yes17 (6.1)0 (0)17 (8.1)
Smoking, n (%) 0.2141.545
No218 (77.9)50 (72.5)168 (79.6)
Yes62 (22.1)19 (27.5)43 (20.4)
Drinking alcohol, n (%) 0.191.714
No237 (84.6)55 (79.7)182 (86.3)
Yes43 (15.4)14 (20.3)29 (13.7)
IIP, n (%) 0.8260.048
No129 (46.1)31 (44.9)98 (46.4)
Yes151 (53.9)38 (55.1)113 (53.6)
Seizure, n (%) <0.00117.882
No230 (82.1)45 (65.2)185 (87.7)
Yes50 (17.9)24 (34.8)26 (12.3)
FND, n (%) <0.00118.242
No189 (67.5)61 (88.4)128 (60.7)
Yes91 (32.5)8 (11.6)83 (39.3)
CI, n (%) 0.4640.537
No258 (92.1)65 (94.2)193 (91.5)
Yes22 (7.9)4 (5.8)18 (8.5)
CS, n (%) 0.46Fisher
No270 (96.4)68 (98.6)202 (95.7)
Yes10 (3.6)1 (1.4)9 (4.3)
Duration, Median (IQR)1.0 (0.3, 2.0)1.0 (0.2, 4.0)1.0 (0.3, 2.0)0.320.989
BMI, Median (IQR)23.4 (21.6, 26.1)24.2 (21.5, 26.9)23.3 (21.6, 25.5)0.2511.319
KPS ≥ 70, n (%) 0.0175.744
No85 (30.4)13 (18.8)72 (34.1)
Yes195 (69.6)56 (81.2)139 (65.9)
(B) Imaging characteristicsTotal (n = 280)LGG (n = 69)HGG (n = 211)p-ValueChi-Squared
Laterality, n (%) 0.006Fisher
Bilateral/Midline17 (6.1)10 (14.5)7 (3.3)
Left115 (41.1)25 (36.2)90 (42.7)
Right148 (52.9)34 (49.3)114 (54)
Lobe, n (%) <0.001Fisher
Multilobar52 (18.6)9 (13)43 (20.4)
Frontal Lobe111 (39.6)32 (46.4)79 (37.4)
Temporal Lobe64 (22.9)11 (15.9)53 (25.1)
Parietal Lobe26 (9.3)3 (4.3)23 (10.9)
Occipital Lobe4 (1.4)0 (0)4 (1.9)
Deep Structures21 (7.5)13 (18.8)8 (3.8)
Posterior Fossa2 (0.7)1 (1.4)1 (0.5)
MS, n (%) <0.00130.756
No138 (49.3)54 (78.3)84 (39.8)
Yes142 (50.7)15 (21.7)127 (60.2)
Unifocal, n (%) 0.742Fisher
No13 (4.6)2 (2.9)11 (5.2)
Yes267 (95.4)67 (97.1)200 (94.8)
ETV, Median (IQR)51.6 (18.1, 83.0)17.7 (12.7, 22.7)75.0 (32.2, 91.7)<0.00168.698
(C) MRS parametersTotal (n = 280)LGG (n = 69)HGG (n = 211)p-ValueChi-Squared
CHO/NAA, Median (IQR)11.6 (5.0, 22.1)3.6 (2.1, 6.4)16.1 (7.9, 24.7)<0.00160.334
CHO/Cr, Median (IQR)4.3 (2.6, 8.7)2.3 (1.8, 3.0)5.8 (3.1, 8.8)<0.00158.042
NAA/Cr, Median (IQR)0.4 (0.2, 0.6)0.5 (0.3, 0.8)0.4 (0.2, 0.5)<0.00114.285
Table 3. Multivariate regression analysis of the training set.
Table 3. Multivariate regression analysis of the training set.
VariableOR (95%CI)p-Value
Age1.08 (1.06~1.11)<0.001
Seizure
No1 (Ref)
Yes0.26 (0.14~0.5)<0.001
FND
No1 (Ref)
Yes4.94 (2.25~10.86)<0.001
Laterality
Bilateral/Midline1 (Ref)
Left5.14 (1.78~14.88)0.003
Right4.79 (1.69~13.54)0.003
Lobe
Multilobar1 (Ref)
Frontal Lobe0.52 (0.23~1.18)0.118
Temporal Lobe1.01 (0.38~2.66)0.986
Parietal Lobe1.6 (0.4~6.52)0.508
Occipital LobeNot estimable due to sparse data
Deep Structures0.13 (0.04~0.4)<0.001
Posterior Fossa0.21 (0.01~3.67)0.284
KPS ≥ 70
No1 (Ref)
Yes0.45 (0.23~0.87)0.018
MS
No1 (Ref)
Yes5.44 (2.88~10.27)<0.001
ETV1.03 (1.02~1.04)<0.001
CHO/NAA1.04 (1.01~1.07)0.004
CHO/Cr1.2 (1.09~1.33)<0.001
NAA/Cr0.87 (0.67~1.13)0.3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rao, X.; Yang, M.; Chen, H.; Li, G.; Wu, L.; Gong, L.; Yang, M.; Wang, H.; Ding, Y.; Chen, G.; et al. An Interpretable Multimodal Machine-Learning Model for Non-Invasive Preoperative Glioma Grading. Cancers 2026, 18, 1204. https://doi.org/10.3390/cancers18081204

AMA Style

Rao X, Yang M, Chen H, Li G, Wu L, Gong L, Yang M, Wang H, Ding Y, Chen G, et al. An Interpretable Multimodal Machine-Learning Model for Non-Invasive Preoperative Glioma Grading. Cancers. 2026; 18(8):1204. https://doi.org/10.3390/cancers18081204

Chicago/Turabian Style

Rao, Xianfeng, Min Yang, Hao Chen, Guanhao Li, Li Wu, Liudong Gong, Mingchun Yang, Haiyang Wang, Ye Ding, Guanxi Chen, and et al. 2026. "An Interpretable Multimodal Machine-Learning Model for Non-Invasive Preoperative Glioma Grading" Cancers 18, no. 8: 1204. https://doi.org/10.3390/cancers18081204

APA Style

Rao, X., Yang, M., Chen, H., Li, G., Wu, L., Gong, L., Yang, M., Wang, H., Ding, Y., Chen, G., Rao, X., Zhang, N., Wang, X., & Teng, L. (2026). An Interpretable Multimodal Machine-Learning Model for Non-Invasive Preoperative Glioma Grading. Cancers, 18(8), 1204. https://doi.org/10.3390/cancers18081204

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop