Next Article in Journal
External Evaluation of a Predictive Model of Suboptimal Cytoreduction in Advanced Ovarian Cancer
Previous Article in Journal
An Interpretable Ensemble Transformer Framework for Breast Cancer Detection in Ultrasound Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

IDH Mutation Assessment in Gliomas from Anatomical MRI Using Deep Learning: A Comparative Analysis of Centralized and Federated Learning Frameworks

1
Institute of Biomedical Engineering, Bogazici University, Istanbul 34684, Türkiye
2
Center for Targeted Therapy Technologies, Bogazici University, Istanbul 34684, Türkiye
*
Author to whom correspondence should be addressed.
Diagnostics 2026, 16(4), 623; https://doi.org/10.3390/diagnostics16040623
Submission received: 4 December 2025 / Revised: 10 February 2026 / Accepted: 11 February 2026 / Published: 20 February 2026
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

Abstract

Background/Objectives: Isocitrate dehydrogenase (IDH) mutation is a key prognostic indicator in diffuse gliomas; however, it is clinically determined from invasive tissue sampling. Non-invasive preoperative identification of IDH mutation from routine anatomical MRI could support treatment decision making. This study evaluated deep learning models for IDH mutation detection using routine anatomical MRI (post-contrast T1-weighted (T1c), T2-weighted, and fluid attenuated inversion recovery (FLAIR) MRI) and quantified how tumor-focused image preprocessing and different training schemes, centralized learning (CL) versus federated learning (FL) with alternative aggregation strategies, affected model performance. Methods: Anatomical MRI from 501 diffuse glioma patients in the UCSF Preoperative Diffuse Glioma MRI (UCSF-PDGM) dataset was analyzed using a deep learning classifier built on a 2D U-Net encoder, with age and sex included as covariates. Two methods of tumor-focused image preprocessing, Naïve Soft Filtering (NSF) and Gradient-Based Soft Filtering (GBSF), were compared. Centralized learning (CL) was benchmarked against federated learning (FL) using Federated Averaging (FA) and Federated Trimmed Mean (FTM) aggregation strategies. Model performance was compared in terms of accuracy, precision, recall, F1 score, specificity, and the area under the receiver operating characteristic curve (ROC-AUC). Results: The CL model with NSF achieved the best test performance (accuracy = 0.949, F1 = 0.951, ROC-AUC = 0.971), with NSF consistently outperforming GBSF. FL’s performance decreased relative to CL’s, but the FA strategy outperformed FTM (FTM accuracy = 0.915 vs. FA accuracy = 0.949), which indicates that the FL aggregation strategy has an influence on model performance. Conclusions: Deep learning applied to routine anatomical MRI could classify IDH mutation status with high accuracy. Context-preserving image preprocessing with NSF substantially improved performance across training schemes. FL provides a privacy-preserving alternative to CL, but incurs a measurable performance degradation that is sensitive to the choice of aggregation strategy.

1. Introduction

Isocitrate dehydrogenase (IDH) mutation status is a key molecular determinant in gliomas, with implications for prognosis, therapeutic planning, and clinical trial stratification. The fifth edition of the World Health Organization’s (WHO) classification of central nervous system tumors (WHO CNS5) was the second guideline, after WHO 2016, that reinforced the significance of IDH mutations for classifying gliomas [1,2,3,4]. Besides IDH mutations, 1p/19q codeletion, H3F3A mutations, ATRX mutations, and MGMT promoter methylation were also highlighted for glioma classification in WHO CNS5.
IDH mutant (IDH-mut) gliomas generally exhibit a more favorable prognosis than IDH-wildtype (IDH-wt) gliomas. Although some studies have reported no significant prognostic differences between IDH-wt and IDH-mut gliomas [5], many studies have reported better prognosis for IDH-mut gliomas [6,7,8,9,10,11,12,13,14]. Moreover, one study reported a statistically significant relationship between IDH mutation status and mitotic index [15]. There has been recent progress in treating IDH-mut gliomas, underlying its importance in patient management. A recent clinical trial, INDIGO, has shown that vorasidenib, a small-molecule inhibitor targeting the IDH1 and IDH2 enzymes, significantly improved progression-free survival in IDH-mut gliomas [16], and it was approved by the FDA in 2024. Anatomical MRI-based noninvasive estimation of IDH mutational status thus meets a significant clinical need in this regard, especially with the advent of targeted treatments for IDH-mut gliomas.
Due to the limitations of tissue sampling owing to the heterogeneity of tumors and surgical risk, non-invasive preoperative prediction of IDH status in terms of deep learning (DL) models based on MRI data is of great interest. Initial convolutional models proved that conventional anatomic differences have informative representations. A T2-weighted (T2w) and MC-Net dual-network CNN model (post-contrast T1-weighted (T1c) and fluid-attenuated inversion recovery (FLAIR)) yielded cross-validated accuracies >90% and revealed that T2w MRI in particular provided important information for detecting IDH-associated variability [17]. In a cohort of 495 patients, Elyassirad et al. [18] compared 2D and 3D ResNet architectures and found a transfer-learning-based 2D ResNet-50 to be superior to the 3D models (AUC = 0.91), underscoring the practicality of 2D backbones for clinical MRI.
Transformer models and multi-modal feature integration have further advanced performance and interpretability. A Vision Transformer (ViT) with masked-autoencoder self-pretraining operating on whole-slice T1c, T2, and FLAIR attained 93% internal accuracy, and Grad-CAM maps localized predictions to T2-hyperintense and necrotic regions, offering biologically plausible attributions [19]. Incorporating microstructural diffusion features can add complementary signals, and using VGG16 as a feature extractor on anatomic MRI plus diffusion tensor imaging (DTI) maps increased sensitivity to 91.1% [20]. Beyond single-task prediction, a multi-task deep model jointly learned IDH mutational status, prognosis, and molecular subtypes, linking imaging phenotypes with pathways such as epithelial–mesenchymal transition via radio-multiomics (AUC = 0.89–0.90) [21]. Related groundwork in automated tumor segmentation with nnU-Net across 1685 patients (T1c, T2-FLAIR, apparent diffusion coefficient (ADC)) reported AUC = 0.948 for segmentation and identified age as an influential covariate for downstream modeling [22].
Synthesis of this literature via meta-analysis (52 studies) indicated pooled sensitivity/specificity of 0.84/0.87 (AUC = 0.89) for MRI-based IDH classification, while also emphasizing substantial heterogeneity in MRI protocols, segmentation strategies, and validation designs [23]. Methodologically, deep learning-based radiomics eliminates manual feature crafting by extracting large-scale deep features directly from images (e.g., 16,384 features in a modified CNN), streamlining pipelines and potentially improving generalizability [24].
Overall, the literature supports robust and interpretable MRI-based classification of IDH mutational status using CNNs and transformers, especially when multi-contrast inputs and automated segmentation are leveraged. Nonetheless, protocol heterogeneity, limited external validation, and variable image preprocessing continue to impede broad clinical adoption. Dataset variability remains a central obstacle, and deep models often lose accuracy when deployed across sites due to image domain shift. This shift arises from differences in MRI protocols (for example TR/TE, field strength, and coil configuration), vendor-specific reconstruction, preprocessing pipelines, and population demographics. As a result, external validation typically underperforms compared to internal validation. A meta-analysis of 11 studies (n = 1685) reported substantial performance heterogeneity, with AUCs ranging from 0.74 to 0.98 [25]. Even models with strong internal results can show external drops, as in Yu et al. [19], where a ViT achieved 93% internal accuracy but 87% on an external cohort. A key contributor to this variability is the scarcity of multi-institutional training, constrained by data-sharing and privacy regulations (for example, General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA)) that limit centralized pooling, reduce cohort diversity, and weaken cross-site generalization. Federated learning (FL) offers a practical alternative by training models locally on site-resident data and sharing only weight updates or gradients for server-side aggregation, thereby enabling learning from distributed, diverse cohorts while reducing scanner and region-specific bias.
In this study, we quantified the performance gap between centralized and federated learning for IDH mutation status identification in gliomas under controlled conditions, using pseudo-site partitions within a single data cohort. By holding acquisition and demographic factors effectively constant, the analysis isolated the intrinsic costs of federation from confounds related to cross-site domain shift. Centralized and federated models were benchmarked with harmonized code and matched image preprocessing, two federated aggregation rules were compared, and two tumor-focused filtering strategies were evaluated. This design provides actionable guidance on when federated learning can approach centralized performance and which design choices help preserve external validity in IDH classification pipelines, with direct relevance to routine clinical MRI.

2. Materials and Methods

2.1. Materials

This study analyzed preoperative MRI data from 501 diffuse glioma cases (age range 17–94, mean 56.87 ± 15.02; 298 males, 203 females, 103 IDH-mut, 398 IDH-wt), originally sourced from the University of California San Francisco Preoperative Diffuse Glioma MRI (UCSF-PDGM) dataset [26] (Table 1). IDH mutation status was confirmed by either conventional or next-generation sequencing of tumor tissue. MRI data were acquired on a 3T clinical MRI scanner (GE Healthcare, Waukesha, WI, USA) using a standard brain tumor imaging protocol including post-contrast T1-weighted inversion-recovery spoiled gradient echo (IR-SPGR, TR = 6 ms, TE = 2.3 ms, TI = 450 ms, flip angle = 12°), 3D T2-weighted fast spin echo (TR = 2200 ms, TE = 100 ms), and FLAIR (TR = 5700 ms, TE = 115 ms, TI = 1650 ms).
The MRI data of each patient were resampled to a 1 mm3 isotropic resolution and co-registered to 3D space defined by the T2/FLAIR using nonlinear registration via ANTs. Skull stripping was performed using a publicly available deep learning-based brain mask algorithm [27]. Skull-stripping and tumor segmentations were obtained from the data publishers and visually inspected by the authors prior to use. Multiple-compartment tumor segmentations (enhancing tumor, non-enhancing/necrotic core, peritumoral FLAIR abnormality) performed using an ensemble of BraTS-derived models, with manual correction by at least two expert reviewers with more than 15 years of experience, were also provided with the dataset [26].
The UCSF-PDGM dataset is dominated by glioblastomas (IDH-wt; n = 398) and contains a highly imbalanced number of oligodendrogliomas (IDH-mut, 1p/19q co-deleted, n = 13). Given this extreme class imbalance, oligodendrogliomas were excluded to avoid introducing bias and instability into the reported results. In addition, the dataset includes 24 astrocytomas, labeled as IDH-wt, without any accompanying molecular markers. Due to incomplete molecular characterization and the resulting diagnostic ambiguity, IDH-wildtype astrocytomas were excluded to comply with the WHO 2021 classification requirements. The patient cohort distribution and the exclusion criteria are shown in (Figure 1).

2.2. Methods

For each patient, a comprehensive tumor region-of-interest (ROI) mask was generated by integrating segmentations of enhancing tumor, FLAIR abnormality, and necrotic regions. To prioritize tumor-related features while retaining contextual anatomical information, two soft filtering approaches were evaluated to preserve the spatial context of the MR images while emphasizing tumor regions. In the first approach, we applied a fixed-weight soft filtering scheme (Naïve Soft Filtering; NSF) in which non-tumorous regions outside the tumor borders were assigned a heuristically selected weight of 0.3 during both training and testing. This weighting ensured that these regions contributed partially to the loss function (Figure 2). The factor of 0.3 was kept constant across all experiments to maintain consistency and comparability. In the second approach, we introduced a Gradient-Based Soft Filtering strategy, which assigned spatially varying weights that gradually decreased from 0.3 near the tumor boundary to 0 at the farthest points of the image (Gradient-Based Soft Filtering, GBSF) (Figure 3). This approach collected more contextual information near the tumor while suppressing the influence of regions with respect to the distance from the tumor. For axial slices containing tumor tissue, three MRI sequences, FLAIR, T1c, and T2w, were mapped to the red, green, and blue channels of an RGB image, respectively. This color-space transformation leveraged the complementary contrast properties of each sequence, in which FLAIR (red) highlighted edema and non-enhancing tumor, T1c (green) delineated enhancing tumor regions, and T2w (blue) captured overall tumor morphology. Each channel underwent individual intensity normalization (zero-mean, unit variance) to mitigate scanner-specific variations.
The classifier network architecture employed a 2D U-Net encoder pretrained on tumor segmentation tasks to extract spatially resolved features, followed by task-specific convolutional layers for IDH classification (Figure 4). The U-Net encoder was pretrained on an independent in-house glioma dataset, with no patient overlap and no molecular labels used during pretraining. Additionally, both datasets underwent standard preprocessing (skull-stripping, registration, intensity normalization), and encoder weights were fine-tuned rather than frozen during classification training to allow adaptation to potential domain differences while leveraging pretrained anatomical feature representations. This design leveraged U-Net’s ability to capture multi-scale contextual features while minimizing computational overhead compared to 3D approaches.
The network consisted of multiple encoder layers following the standard U-Net [28] encoder design. Each encoder layer (ENC) consisted of two 2D convolutional layers, two batch normalization layers, and two leaky ReLU activation layers. The bottleneck block, which reduced the channel dimension from 1024 to 512, used the same layer types as the encoder but with different configurations. The Flatter, a simple 2D convolutional layer, reduced the channels from 512 to 8, decreasing the number of features after flattening. A feed-forward neural network was added to process the flattened features to obtain a scalar output for identifying the IDH mutational status. On the last linear layer, age and sex were also added to boost the performance of the model. Each model was trained with an AdamW optimizer (learning rate = 1 × 10 4 ) and a cosine annealing learning rate schedule (T_max = 10, n_min = 1 × 10 6 ). Centralized models were trained on aggregated data for 40 epochs. In federated learning experiments, a single local epoch was performed on each model per round, followed by server aggregation. The total number of rounds was chosen to ensure an equal optimization budget for federated and centralized experiments. Early stopping was not used.
The federated learning (FL) framework was implemented using Flower [29], a scalable library for decentralized model training. The dataset, consisting of 464 patients, was partitioned into a training cohort (n = 361), a validation cohort (n = 44), and a holdout test set (n = 59). For centralized learning (CL), the entire training set was used. In the FL paradigm, the training data were evenly distributed between two clients, with careful balancing to ensure comparable IDH mutation ratios (IDH-mutant: around 15% in both splits; (Table 2)). To evaluate FL robustness, two aggregation strategies were tested: (1) Federated Averaging (FA), which computes a simple mean of client model weights, and (2) Federated Trimmed Mean (FTM), which discards outlier parameters to enhance resilience against non-independent and identically distributed (non-IID) data distributions. Identical test sets and model seeds (seed = 61) were used across all experiments to isolate the impact of FL versus centralized training. Figure 5 shows the FL scheme. Weights & Biases (W&B) was used for real-time monitoring of training dynamics and performance [30]. At each optimization step, training/validation loss and the full evaluation suite of accuracy, precision, recall, F1 score, specificity, and ROC-AUC were logged, enabling fine-grained inspection of learning behavior and supporting reproducibility through centralized experiment tracking.

3. Results

Table 3 reports the performance of the proposed CL and FL approaches for IDH-mutation classification in gliomas. Test-set class proportions were intentionally balanced to avoid majority-class bias in threshold-dependent metrics. As a result, precision-recall values reflect performance under balanced prevalence and may differ under natural class distributions, whereas prevalence-insensitive metrics such as ROC-AUC, and specificity provide more generalizable assessments of discriminative ability. The CL model with NSF achieved the highest overall performance, with accuracy = 0.949, F1 = 0.951, and ROC-AUC = 0.971. This configuration provided a favorable precision-recall balance (precision = 0.935; recall = 0.966) and high specificity (0.931), indicating effective detection of IDH-mutant cases.
Under CL with GBSF, the model achieved an accuracy of 0.813 for IDH-mutation classification. Precision (0.952) was significantly higher than recall (0.667), in contrast to the CL model trained with NSF. The resulting F1 score of 0.784 reflected an imbalanced precision-recall trade-off, suggesting class-specific bias under the observed class distribution. The specificity was 0.966, indicating relatively few false positives. Figure 6 illustrates the performance differences in terms of metrics between the two configurations.
Given the superior performance of the CL models on data generated by the NSF approach, the FL models were trained only on NSF-generated data. Using the FTM aggregator, the model achieved accuracy = 0.915, precision = 0.963, recall = 0.867, F1 = 0.912, specificity = 0.966, and ROC-AUC = 0.957. The higher precision than recall, together with the specificity of 0.966, indicates fewer false positives than false negatives.
Using FA, the FL model achieved accuracy = 0.949, F1 = 0.952, precision = 0.909, recall = 1.000, specificity = 0.896, and ROC-AUC = 0.967. In contrast to FTM, recall exceeded precision, indicating fewer false negatives than false positives. This precision–recall pattern matched the CL model trained on NSF-generated data.
Figure 7 summarizes the performance metrics for the CL and FL strategies and their pairwise differences. The left-hand side reports results for the CL model trained on NSF data, alongside both FL variants (FTM and FA). The right-hand side plot shows the metric deltas relative to the CL baseline, highlighting how the choice of FL strategy affects outcomes. Notably, the magnitude of the differences between FTM and FA underscores that the federated aggregation rule is a first-order design choice rather than a secondary detail.
The radar plot on the left hand side shows the performance of each model, while the figure on the right hand side shows the performance differences between different models.
In Figure 8, the monitoring results obtained using the W&B monitoring tool are shown [30]. All the metrics that are displayed were observed on the validation set of the study. Figure 8 is intended to provide qualitative insight into the training dynamics rather than to serve as a direct quantitative comparison of metric performances. Consistently across experiments, CL exhibits more stable convergence and superior overall performance compared to FL, which shows higher volatility. Within federated methods, the FA strategy demonstrates greater stability than FTM. It is important to note that these plots have been smoothed to enhance the visibility of general convergence trends; consequently, they should be interpreted as identifying behavioral patterns rather than recording exact performance values at specific steps.

4. Discussion

This study evaluated MRI-based detection of IDH mutation status in gliomas using only preoperative, routine anatomical MRI sequences (T1c, T2, FLAIR). Multiple training paradigms using centralized learning versus federated learning and tumor-focused preprocessing strategies (NSF versus GBSF) were compared. The CL model trained on pooled data with NSF yielded the strongest overall performance, suggesting that context-preserving attenuation around the lesion may be beneficial for discrimination under the evaluated conditions, without sacrificing specificity or recall. The FL model with the FA aggregator achieved slightly less imbalanced performance in contrast to CL but offered a data-sensitive approach that better accounts for site-specific variability.
Under centralized learning, the model that trained on GBSF data underperformed compared to its NSF counterpart. This degradation may be related to differences in how tumor subregions are emphasized by the spatial weighting scheme; however, direct attribution analyses would be required to confirm the contribution of the intratumoral signal to IDH mutation detection. In other words, prioritizing boundary voxels may have diluted informative core features and destabilized feature learning. Under FTM, performance decreased relative to CL, consistent with training on disjoint, site-partitioned data. Because each client model was exposed only to its local subset and aggregated updates (rather than pooled data), the global model likely suffered from non-IID effects and limited cross-site feature alignment. These findings underscore the benefit of training on pooled data, which affords broader feature diversity and, in this cohort, translated into superior discrimination compared with FL variants.
A key methodological observation was that the choice of FL aggregation strategy influenced performance, with differences between FL variants approaching the magnitude of the CL-to-FL transition itself. FA outperformed FTM in terms of accuracy. The largest divergence occurred in recall, indicating markedly more false negatives under FTM, which is an especially concerning error mode for clinical decision-making. Furthermore, although FTM is designed to be robust to outlier updates, in this setting it appears to suppress informative site-specific variability, leading to a marked increase in false negatives. This trade-off indicates that robustness-oriented aggregation can inadvertently reduce recall when inter-site differences reflect meaningful clinical heterogeneity rather than noise.
Ablation results highlighted the value of peri-tumoral context. Retaining non-lesional regions with NSF, which attenuates rather than excises background, preserved cues related to edema, mass effect, and tissue interfaces and corresponded to the strongest overall performance. In contrast, a gradient-based scheme that emphasized the lesion core increased precision at the expense of recall, yielding more false negatives. This precision–recall trade-off is clinically critical. Core-emphasizing filters may be preferable in settings where missing a positive case carries higher risk, such as pre-surgical triage. Context-preserving attenuation is advantageous when minimizing unnecessary follow-up is the primary goal.
In the literature, some studies have investigated tumor detection and biomarker analysis using T1 or multimodal MRI, demonstrating the potential of radiological imaging combined with machine learning. However, these works differ substantially from the present study in terms of input data, methodological design, and analytical focus. Although prior approaches typically rely on advanced radiomic feature extraction or explicit segmentation pipelines, our work uses routine anatomical MRI with tumor-focused preprocessing. Critically, rather than optimizing performance on a single diagnostic task, we evaluate model generalizability and aggregation effects across different training paradigms, which represents a distinct and complementary contribution to the field [31,32].
Furthermore, several prior studies have reported strong results for MRI-based IDH classification. Usuzaki et al. [33] combined radiomics features with MR images to train a ViT model and achieved 93.5% accuracy on external verification, noting that radiomics features contributed most strongly to discrimination. Other reports on different datasets illustrated that headline accuracy can mask class imbalance. For example, one study reported 96% accuracy with an F1 score of 75%, suggesting skewed class distributions and limited positive-class recall [34]. Another investigation using perfusion MRI with recurrent neural networks achieved 92.8% accuracy for IDH classification [35]. A hybrid approach that fused radiomics with T1 and T2w MRI attained 93.8% internal accuracy and 87.9% and 78.8% on two external cohorts, highlighting the challenge of cross-site generalization [36]. Bjørkeli and Esmaeili [37] combined T1, T1c, FLAIR, and T2 MRI, and reported 97.6% test accuracy for IDH and 1p/19q codeletion classification, but sensitivity was 56.9%, again indicating imbalance and a high false-negative rate. Finally, a two-center comparison of centralized learning and federated variants reported 83.13% accuracy under centralized training, 81.96% with vanilla federated learning, and up to 83.37% with an alternative federated strategy [38], underscoring the impact of optimization choices and data distribution on performance.
On the other hand, compared with prior studies that relied on radiomics features, diffusion or perfusion imaging, or specialized multi-omics pipelines, this study evaluated IDH classification using only routine preoperative anatomical MRI (T1c, T2w, FLAIR) within a single, unified deep learning framework. Additionally, age and sex were incorporated as covariates at the final classification layer by concatenating them with the imaging-derived features, motivated by prior evidence that these factors are associated with IDH mutation status [11,39]. Moreover, the study design isolated three practical factors that are rarely examined together: training paradigm, aggregation strategy in federated settings, and tumor-focused filtering. First, centralized learning was directly compared with federated learning using harmonized code, identical seeds, and matched image preprocessing to quantify the generalization cost of distributing data across sites. Second, two federated aggregation rules were contrasted on the same cohorts, showing that the choice of aggregation strategy produced performance shifts, a result that emphasizes optimization policy as a primary design variable rather than a minor implementation detail. Third, NSF was evaluated against a gradient-based scheme to characterize the precision–recall trade-offs introduced by preserving peri-tumoral context versus emphasizing the lesion core. Taken together, the contribution of this study is a clinically oriented and systematically controlled analysis that clarifies how deployment-relevant choices regarding training paradigm, aggregation, and filtering shape IDH classification performance when only standard clinical MRI is available.
This study has some limitations. Only routine anatomical MRI was used, and advanced MRI modalities such as perfusion, diffusion, and MR spectroscopy were not incorporated. Although this has a realistic clinical baseline and can be considered an advantage for deployability, it reduced the model’s ability to capture microstructural, hemodynamic, and metabolic signatures and limited the data content to a narrower anatomical view. Another limitation is the use of a single publicly available glioma MRI dataset (UCSF-PDGM). Although 1p/19q codeletion status is a critical molecular marker for integrated glioma classification, it was not included in this study due to data limitations. The UCSF-PDGM dataset contains only 13 1p/19q codeleted oligodendrogliomas out of 501 total cases, representing severe class imbalance (only 2.6% 1p/19q codeleted) that would preclude reliable statistical analysis or model training for including this molecular marker. Access to larger datasets with complete molecular annotation would enable broader application of federated learning for integrated glioma classification, including 1p/19q status and other biomarkers. Furthermore, limiting training and evaluation to one source may have reduced cross-site heterogeneity and, in turn, may have inadvertently enhanced the apparent performance of the federated methods by minimizing real-world domain variability. However, by constructing pseudo client splitting within this cohort and keeping acquisition and demographic factors constant, the analysis isolated the internal performance costs of FL, independent of cross-institutional image domain shift. This design clarified FL’s baseline trade-offs, although the single-center scope remains a limitation for external generalizability. Additionally, FL has important constraints, and data distributions, label imbalance, and unequal client sizes can induce client drift and unstable convergence. As a result, additional safeguards and harmonization steps are required for multi-institutional data, including secure aggregation and differential privacy to protect shared updates, as well as site-aware normalization such as intensity standardization, bias-field correction, or statistical harmonization. Personalized FL strategies, such as site-specific batch normalization or lightweight local adaptation and ensemble-style aggregation, can further stabilize training and improve robustness. Since our experiments used partitions created from a single-site dataset, future studies on this topic will need to integrate these optimizations to ensure robustness under real-world multi-center conditions.

5. Conclusions

In conclusion, deep learning models using routine anatomical MRI can classify IDH mutations in gliomas with high accuracy. CL with NSF achieved the best overall performance, while federated learning remained a viable privacy-preserving alternative, whose effectiveness depended strongly on the aggregation strategy. Data preprocessing choices shaped the error profile. GBSF favored precision, whereas NSF preserved peri-tumoral context and improved overall discrimination. Future work should validate these findings in multi-institutional cohorts and explore multimodal inputs and adaptive filtering strategies within federated training frameworks.

Author Contributions

Conceptualization, A.B. and E.O.-I.; methodology, A.B.; software, A.B.; validation, A.B. and E.O.-I.; formal analysis, A.B.; investigation, A.B.; resources, E.O.-I.; data curation, A.B.; writing—original draft preparation, A.B.; writing—review and editing, A.B. and E.O.-I.; visualization, A.B.; supervision, E.O.-I.; project administration, E.O.-I. All authors have read and agreed to the published version of the manuscript.

Funding

This study did not receive any external funding.

Institutional Review Board Statement

Not applicable. This study used only fully anonymized data from publicly available online datasets.

Informed Consent Statement

Not applicable. This study did not involve direct contact with human subjects and used only anonymized public data.

Data Availability Statement

Publicly available datasets were analyzed in this study. All data used in this work are online datasets that can be accessed from the sources and repositories cited in the manuscript. No new datasets were generated.

Acknowledgments

The authors thank all contributors to the publicly available UCSF-PDGM dataset that made this research possible.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

HIPAAHealth Insurance Portability and Accountability Act
GDPRGeneral Data Protection Regulation
GBSFGradient-Based Soft Filtering
MRMagnetic Resonance
NSFNaive Soft Filtering
RGBRed, Green, and Blue
ROIRegion-of-Interest
TIInversion Time
TEEcho Time
TRRepetition Time
IR-SPGRInversion-Recovery Spoiled Gradient Echo
UCSF-PDGMUniversity of California San Francisco Preoperative Diffuse Glioma MRI Dataset
FAFederated Averaging
FTMFederated Trimmed Mean
CLCentralized Learning
FLFederated Learning
DTIDiffusion Tensor Imaging
ViTVision Transformer
FLAIRFluid-Attenuated Inversion Recovery
T1c/T1CET1-weighted Contrast-Enhanced
T1T1-weighted
IDHIsocitrate Dehydrogenase
WHOWorld Health Organization
WHO CNS5WHO Classification of Central Nervous System Tumors (5th Edition)
IDH-mutIDH Mutant
IDH-wtIDH-Wildtype
MRIMagnetic Resonance Imaging
DLDeep Learning
CNNConvolutional Neural Network

References

  1. Choate, K.A.; Pratt, E.P.S.; Jennings, M.J.; Winn, R.J.; Mann, P.B. IDH Mutations in Glioma: Molecular, Cellular, Diagnostic, and Clinical Implications. Biology 2024, 13, 885. [Google Scholar] [CrossRef] [PubMed]
  2. Reuss, D. Updates on the WHO diagnosis of IDH-mutant glioma. J. Neuro-Oncol. 2023, 162, 461–469. [Google Scholar] [CrossRef] [PubMed]
  3. Whitfield, B.T.; Huse, J.T. Classification of adult-type diffuse gliomas: Impact of the World Health Organization 2021 update. Brain Pathol. 2022, 32, e13062. [Google Scholar] [CrossRef] [PubMed]
  4. Louis, D.N.; Perry, A.; Wesseling, P.; Brat, D.J.; Cree, I.A.; Figarella-Branger, D.; Hawkins, C.; Ng, H.K.; Pfister, S.M.; Reifenberger, G.; et al. The 2021 WHO Classification of Tumors of the Central Nervous System: A summary. Neuro-Oncology 2021, 23, 1231–1251. [Google Scholar] [CrossRef]
  5. Goyal, L.; Govindan, A.; Sheth, R.A.; Nardi, V.; Blaszkowsky, L.S.; Faris, J.E.; Clark, J.W.; Ryan, D.P.; Kwak, E.L.; Allen, J.N.; et al. Prognosis and Clinicopathologic Features of Patients with Advanced Stage Isocitrate Dehydrogenase (IDH) Mutant and IDH Wild-Type Intrahepatic Cholangiocarcinoma. Oncologist 2015, 20, 1019–1027. [Google Scholar] [CrossRef]
  6. Buz-Yalug, B.; Turhan, G.; Cetin, A.I.; Dindar, S.S.; Danyeli, A.E.; Yakicier, C.; Pamir, M.N.; Özduman, K.; Dincer, A.; Ozturk-Isik, E. Identification of IDH and TERTp mutations using dynamic susceptibility contrast MRI with deep learning in 162 gliomas. Eur. J. Radiol. 2024, 170, 111257. [Google Scholar] [CrossRef]
  7. Jiang, H.; Yu, K.; Cui, Y.; Ren, X.; Li, M.; Zhang, G.; Yang, C.; Zhao, X.; Zhu, Q.; Lin, S. Differential Predictors and Clinical Implications Associated With Long-Term Survivors in IDH Wildtype and Mutant Glioblastoma. Front. Oncol. 2021, 11, 632663. [Google Scholar] [CrossRef]
  8. Lewandowska, M.A.; Furtak, J.; Szylberg, T.; Roszkowski, K.; Windorbska, W.; Rytlewska, J.; Jóźwicki, W. An Analysis of the Prognostic Value of IDH1 (Isocitrate Dehydrogenase 1) Mutation in Polish Glioma Patients. Mol. Diagn. Ther. 2014, 18, 45–53. [Google Scholar] [CrossRef]
  9. Ozturk-Isik, E.; Cengiz, S.; Ozcan, A.; Yakicier, C.; Ersen Danyeli, A.; Pamir, M.N.; Özduman, K.; Dincer, A. Identification of IDH and TERTp mutation status using 1H-MRS in 112 hemispheric diffuse gliomas. J. Magn. Reson. Imaging 2020, 51, 1799–1809. [Google Scholar] [CrossRef]
  10. Sabha, N.; Knobbe, C.B.; Maganti, M.; Al Omar, S.; Bernstein, M.; Cairns, R.; Çako, B.; von Deimling, A.; Capper, D.; Mak, T.W.; et al. Analysis of IDH mutation, 1p/19q deletion, and PTEN loss delineates prognosis in clinical low-grade diffuse gliomas. Neuro-Oncology 2014, 16, 914–923. [Google Scholar] [CrossRef]
  11. Sacli-Bilmez, B.; Bas, A.; Danyeli, A.E.; Yakicier, M.C.; Pamir, M.N.; Özduman, K.; Dinçer, A.; Ozturk-Isik, E. Detecting IDH and TERTp mutations in diffuse gliomas using 1H-MRS with attention deep-shallow networks. Comput. Biol. Med. 2025, 186, 109736. [Google Scholar] [CrossRef] [PubMed]
  12. Sacli-Bilmez, B.; Danyeli, A.E.; Yakicier, M.C.; Aras, F.K.; Pamir, M.N.; Özduman, K.; Dinçer, A.; Ozturk-Isik, E. Magnetic resonance spectroscopic correlates of progression free and overall survival in “glioblastoma, IDH-wildtype, WHO grade-4”. Front. Neurosci. 2023, 17, 1149292. [Google Scholar] [CrossRef]
  13. Qi, S.T.; Yu, L.; Gui, S.; Ding, Y.Q.; Han, H.X.; Zhang, X.L.; Wu, L.X.; Yao, F. IDH mutations predict longer survival and response to temozolomide in secondary glioblastoma. Cancer Sci. 2012, 103, 269–273. [Google Scholar] [CrossRef]
  14. Weller, M.; Wick, W.; Aldape, K.; Brada, M.; Berger, M.; Pfister, S.M.; Nishikawa, R.; Rosenthal, M.; Wen, P.Y.; Stupp, R.; et al. Glioma. Nat. Rev. Dis. Prim. 2015, 1, 15017. [Google Scholar] [CrossRef] [PubMed]
  15. Olar, A.; Wani, K.M.; Alfaro-Munoz, K.D.; Heathcock, L.E.; van Thuijl, H.F.; Gilbert, M.R.; Armstrong, T.S.; Sulman, E.P.; Cahill, D.P.; Vera-Bolanos, E.; et al. IDH mutation status and role of WHO grade and mitotic index in overall survival in grade II–III diffuse gliomas. Acta Neuropathol. 2015, 129, 585–596. [Google Scholar] [CrossRef]
  16. Mellinghoff, I.K.; van den Bent, M.J.; Blumenthal, D.T.; Touat, M.; Peters, K.B.; Clarke, J.; Mendez, J.; Yust-Katz, S.; Welsh, L.; Mason, W.P.; et al. Vorasidenib in IDH1- or IDH2-Mutant Low-Grade Glioma. N. Engl. J. Med. 2023, 389, 589–601. [Google Scholar] [CrossRef]
  17. Bangalore Yogananda, C.G.; Wagner, B.C.; Truong, N.C.D.; Holcomb, J.M.; Reddy, D.D.; Saadat, N.; Hatanpaa, K.J.; Patel, T.R.; Fei, B.; Lee, M.D.; et al. MRI-Based Deep Learning Method for Classification of IDH Mutation Status. Bioengineering 2023, 10, 1045. [Google Scholar] [CrossRef]
  18. Elyassirad, D.; Gheiji, B.; Vatanparast, M.; Ahmadzadeh, A.M.; Kamandi, N.; Soleimanian, A.; Salehi, S.; Faghani, S. Comparative Analysis of 2D and 3D ResNet Architectures for IDH and MGMT Mutation Detection in Glioma Patients. arXiv 2024, arXiv:2412.21091. [Google Scholar] [CrossRef]
  19. Yu, D.; Zhong, Q.; Xiao, Y.; Feng, Z.; Tang, F.; Feng, S.; Cai, Y.; Gao, Y.; Lan, T.; Li, M.; et al. Combination of MRI-based prediction and CRISPR/Cas12a-based detection for IDH genotyping in glioma. npj Precis. Oncol. 2024, 8, 140. [Google Scholar] [CrossRef]
  20. Yuan, J.; Siakallis, L.; Li, H.B.; Brandner, S.; Zhang, J.; Li, C.; Mancini, L.; Bisdas, S. Structural- and DTI- MRI enable automated prediction of IDH Mutation Status in CNS WHO Grade 2–4 glioma patients: A deep Radiomics Approach. BMC Med. Imaging 2024, 24, 104. [Google Scholar] [CrossRef]
  21. Wu, X.; Zhang, S.; Zhang, Z.; He, Z.; Xu, Z.; Wang, W.; Jin, Z.; You, J.; Guo, Y.; Zhang, L.; et al. Biologically interpretable multi-task deep learning pipeline predicts molecular alterations, grade, and prognosis in glioma patients. npj Precis. Oncol. 2024, 8, 181. [Google Scholar] [CrossRef]
  22. Dorfner, F.J.; Patel, J.B.; Kalpathy-Cramer, J.; Gerstner, E.R.; Bridge, C.P. A review of deep learning for brain tumor analysis in MRI. npj Precis. Oncol. 2025, 9, 2. [Google Scholar] [CrossRef] [PubMed]
  23. Farahani, S.; Hejazi, M.; Tabassum, M.; Ieva, A.D.; Mahdavifar, N.; Liu, S. Diagnostic performance of deep learning for predicting glioma isocitrate dehydrogenase and 1p/19q co-deletion in MRI: A systematic review and meta-analysis. Eur. Radiol. 2024. [Google Scholar] [CrossRef] [PubMed]
  24. Li, Z.; Wang, Y.; Yu, J.; Guo, Y.; Cao, W. Deep Learning based Radiomics (DLR) and its usage in noninvasive IDH1 prediction for low grade glioma. Sci. Rep. 2017, 7, 5467. [Google Scholar] [CrossRef] [PubMed]
  25. Chung, C.Y.C.; Pigott, L.E. Predicting IDH and ATRX mutations in gliomas from radiomic features with machine learning: A systematic review and meta-analysis. Front. Radiol. 2024, 4, 1493824. [Google Scholar] [CrossRef]
  26. Calabrese, E.; Villanueva-Meyer, J.E.; Rudie, J.D.; Rauschecker, A.M.; Baid, U.; Bakas, S.; Cha, S.; Mongan, J.T.; Hess, C.P. The University of California San Francisco Preoperative Diffuse Glioma MRI Dataset. Radiol. Artif. Intell. 2022, 4, e220058. [Google Scholar] [CrossRef]
  27. Calabrese, E.; Rudie, J.D.; Rauschecker, A.M.; Villanueva-Meyer, J.E.; Cha, S. Feasibility of Simulated Postcontrast MRI of Glioblastomas and Lower-Grade Gliomas by Using Three-dimensional Fully Convolutional Neural Networks. Radiol. Artif. Intell. 2021, 3, e200276. [Google Scholar] [CrossRef]
  28. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar] [CrossRef]
  29. Beutel, D.J.; Topal, T.; Mathur, A.; Qiu, X.; Parcollet, T.; Lane, N.D. Flower: A Friendly Federated Learning Research Framework. arXiv 2020, arXiv:2007.14390. [Google Scholar]
  30. Biewald, L. Experiment Tracking with Weights and Biases. 2020. Available online: https://www.wandb.com/ (accessed on 1 February 2026).
  31. Abdusalomov, A.B.; Mukhiddinov, M.; Whangbo, T.K. Brain Tumor Detection Based on Deep Learning Approaches and Magnetic Resonance Imaging. Cancers 2022, 15, 4172. [Google Scholar] [CrossRef]
  32. Qureshi, S.A.; Hussain, L.; Ibrar, U.; Alabdulkreem, E.; Nour, M.K.; Alqahtani, M.S.; Nafie, F.M.; Mohamed, A.; Mohammed, G.P.; Duong, T.Q. Radiogenomic classification for MGMT promoter methylation status using multi-omics fused feature space for least invasive diagnosis through mpMRI scans. Sci. Rep. 2023, 13, 3291. [Google Scholar] [CrossRef]
  33. Usuzaki, T.; Inamori, R.; Shizukuishi, T.; Morishita, Y.; Takagi, H.; Ishikuro, M.; Obara, T.; Takase, K. Predicting isocitrate dehydrogenase status among adult patients with diffuse glioma using patient characteristics, radiomic features, and magnetic resonance imaging: Multi-modal analysis by variable vision transformer. Magn. Reson. Imaging 2024, 111, 266–276. [Google Scholar] [CrossRef]
  34. Li, Z.C.; Bai, H.; Sun, Q.; Zhao, Y.; Lv, Y.; Zhou, J.; Liang, C.; Chen, Y.; Liang, D.; Zheng, H. Multiregional radiomics profiling from multiparametric MRI: Identifying an imaging predictor of IDH1 mutation status in glioblastoma. Cancer Med. 2018, 7, 5999–6009. [Google Scholar] [CrossRef]
  35. Choi, K.S.; Choi, S.H.; Jeong, B. Prediction of IDH genotype in gliomas with dynamic susceptibility contrast perfusion MR imaging using an explainable recurrent neural network. Neuro-Oncology 2019, 21, 1197–1209. [Google Scholar] [CrossRef]
  36. Choi, Y.S.; Bae, S.; Chang, J.H.; Kang, S.G.; Kim, S.H.; Kim, J.; Rim, T.H.; Choi, S.H.; Jain, R.; Lee, S.K. Fully automated hybrid approach to predict the IDH mutation status of gliomas via deep learning and radiomics. Neuro-Oncology 2021, 23, 304–313. [Google Scholar] [CrossRef]
  37. Bjørkeli, E.B.; Esmaeili, M. Multi-task glioma segmentation and IDH mutation and 1p19q codeletion classification via a deep learning model on multimodal MRI. Meta-Radiology 2025, 3, 100152. [Google Scholar] [CrossRef]
  38. Ali, M.B.; Gu, I.Y.H.; Berger, M.S.; Jakola, A.S. A novel federated deep learning scheme for glioma and its subtype classification. Front. Neurosci. 2023, 17, 1181703. [Google Scholar] [CrossRef]
  39. Wanis, H.A.; Møller, H.; Ashkan, K.; Davies, E.A. Association of IDH1 Mutation and MGMT Promoter Methylation with Clinicopathological Parameters in an Ethnically Diverse Population of Adults with Gliomas in England. Biomedicines 2024, 12, 2732. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Patient cohort distribution in the UCSF-PDGM dataset and applied exclusion criteria.
Figure 1. Patient cohort distribution in the UCSF-PDGM dataset and applied exclusion criteria.
Diagnostics 16 00623 g001
Figure 2. Two example RGB image slices that belong to an IDH-wildtype. (left) and an IDH-mutant (right) glioma from the test dataset of Naïve Soft Filtering. R: FLAIR, G: T1c, B: T2w.
Figure 2. Two example RGB image slices that belong to an IDH-wildtype. (left) and an IDH-mutant (right) glioma from the test dataset of Naïve Soft Filtering. R: FLAIR, G: T1c, B: T2w.
Diagnostics 16 00623 g002
Figure 3. Two example RGB image slices that belong to an IDH-wildtype. (left) and an IDH-mutant (right) glioma from the test dataset of Gradient-Based Soft Filtering. R: FLAIR, G: T1c, B: T2w.
Figure 3. Two example RGB image slices that belong to an IDH-wildtype. (left) and an IDH-mutant (right) glioma from the test dataset of Gradient-Based Soft Filtering. R: FLAIR, G: T1c, B: T2w.
Diagnostics 16 00623 g003
Figure 4. The classifier network architecture.
Figure 4. The classifier network architecture.
Diagnostics 16 00623 g004
Figure 5. A schematic representation of the FL architecture.
Figure 5. A schematic representation of the FL architecture.
Diagnostics 16 00623 g005
Figure 6. Comparison of CL’s performance on different datasets. The radar plot on the left hand side shows a comparison of CL models that were trained on the two different data generation strategies. The bar plot on the right hand side shows the performance differences between those two models.
Figure 6. Comparison of CL’s performance on different datasets. The radar plot on the left hand side shows a comparison of CL models that were trained on the two different data generation strategies. The bar plot on the right hand side shows the performance differences between those two models.
Diagnostics 16 00623 g006
Figure 7. Comparison of CL’s and FL’s performance on different datasets. The radar plot on the left hand side shows a comparison of the CL and FL models that were trained on different data generation strategies. The bar plot on the right hand side shows the performance differences between CL and FL models trained on NSF dataset, and FL models with different aggregation strategies.
Figure 7. Comparison of CL’s and FL’s performance on different datasets. The radar plot on the left hand side shows a comparison of the CL and FL models that were trained on different data generation strategies. The bar plot on the right hand side shows the performance differences between CL and FL models trained on NSF dataset, and FL models with different aggregation strategies.
Diagnostics 16 00623 g007
Figure 8. Performance trajectories of all models during validation. The monitoring graphs show the performance metrics of CL trained on the data generated with GBSF (top left) and NSF (top right), and FL-FTM (bottom left) and FL-FA (bottom right) trained on the data generated by NSF.
Figure 8. Performance trajectories of all models during validation. The monitoring graphs show the performance metrics of CL trained on the data generated with GBSF (top left) and NSF (top right), and FL-FTM (bottom left) and FL-FA (bottom right) trained on the data generated by NSF.
Diagnostics 16 00623 g008aDiagnostics 16 00623 g008b
Table 1. The patient characteristics of the UCSF-PDGM dataset.
Table 1. The patient characteristics of the UCSF-PDGM dataset.
FeaturesTotal (n = 501)IDH-mut (n = 103)IDH-wt (n = 398)
Age (years)56.87 ± 15.02 (17–94)38.80 ± 11.5261.54 ± 11.98
Sex
Male299 (59.7%)63 (12.6%)236 (47.1%)
Female202 (40.3%)40 (8.0%)162 (32.3%)
WHO Grade
Grade 256 (11%)46 (9%)10 (2%)
Grade 343 (9%)29 (5.8%)14 (2.8%)
Grade 4402 (80%)28 (5.6%)374 (74.65%)
MRI SequencesT1c (IR-SPGR), T2w (3D FSE), FLAIRSameSame
SegmentationEnhancing, necrotic, FLAIR abnormality regionsSameSame
Table 2. The age, sex and IDH mutation distributions in the training and test datasets.
Table 2. The age, sex and IDH mutation distributions in the training and test datasets.
FeatureCentralized Train (n = 361)Client 1 Train (n = 182)Client 2 Train (n = 178)Validation (n = 44)Test (n = 59)
Age (years)48.94 ± 18.1358.61 ± 13.8659.15 ± 14.1054.23 ± 16.0148.94 ± 18.13
Sex (% male)57.50%52.24%57.87%70.45%61.02%
IDH Mutation Rate15.55%14.83%16.29%22.73%50.84%
Table 3. Performance comparison of the proposed CL and FL approaches for identifying IDH mutation in gliomas in the test set.
Table 3. Performance comparison of the proposed CL and FL approaches for identifying IDH mutation in gliomas in the test set.
MetricCentralized (NSF *)Centralized (GBSF **)Federated Trimmed Mean (NSF *)Federated Averaging Strategy (NSF *)
Accuracy0.9490.8130.9150.949
F1 Score0.9510.7840.9120.952
Precision0.9350.9520.9630.909
Recall0.9660.6670.8671.000
Specificity0.9310.9660.9660.896
ROC–AUC0.9710.9070.9570.967
* NSF: Naïve Soft Filtering; ** GBSF: Gradient-Based Soft Filtering; Bold values indicate the best performance for the corresponding metric.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bas, A.; Ozturk-Isik, E. IDH Mutation Assessment in Gliomas from Anatomical MRI Using Deep Learning: A Comparative Analysis of Centralized and Federated Learning Frameworks. Diagnostics 2026, 16, 623. https://doi.org/10.3390/diagnostics16040623

AMA Style

Bas A, Ozturk-Isik E. IDH Mutation Assessment in Gliomas from Anatomical MRI Using Deep Learning: A Comparative Analysis of Centralized and Federated Learning Frameworks. Diagnostics. 2026; 16(4):623. https://doi.org/10.3390/diagnostics16040623

Chicago/Turabian Style

Bas, Abdullah, and Esin Ozturk-Isik. 2026. "IDH Mutation Assessment in Gliomas from Anatomical MRI Using Deep Learning: A Comparative Analysis of Centralized and Federated Learning Frameworks" Diagnostics 16, no. 4: 623. https://doi.org/10.3390/diagnostics16040623

APA Style

Bas, A., & Ozturk-Isik, E. (2026). IDH Mutation Assessment in Gliomas from Anatomical MRI Using Deep Learning: A Comparative Analysis of Centralized and Federated Learning Frameworks. Diagnostics, 16(4), 623. https://doi.org/10.3390/diagnostics16040623

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop