MRI Radiomic Features to Predict IDH1 Mutation Status in Gliomas: A Machine Learning Approach using Gradient Tree Boosting

In patients with gliomas, isocitrate dehydrogenase 1 (IDH1) mutation status has been studied as a prognostic indicator. Recent advances in machine learning (ML) have demonstrated promise in utilizing radiomic features to study disease processes in the brain. We investigate whether ML analysis of multiparametric radiomic features from preoperative Magnetic Resonance Imaging (MRI) can predict IDH1 mutation status in patients with glioma. This retrospective study included patients with glioma with known IDH1 status and preoperative MRI. Radiomic features were extracted from Fluid-Attenuated Inversion Recovery (FLAIR) and Diffusion-Weighted-Imaging (DWI). The dataset was split into training, validation, and testing sets by stratified sampling. Synthetic Minority Oversampling Technique (SMOTE) was applied to the training sets. eXtreme Gradient Boosting (XGBoost) classifiers were trained, and the hyperparameters were tuned. Receiver operating characteristic curve (ROC), accuracy, and f1-scores were collected. A total of 100 patients (age: 55 ± 15, M/F 60/40); with IDH1 mutant (n = 22) and IDH1 wildtype (n = 78) were included. The best performance was seen with a DWI-trained XGBoost model, which achieved ROC with Area Under the Curve (AUC) of 0.97, accuracy of 0.90, and f1-score of 0.75 on the test set. The FLAIR-trained XGBoost model achieved ROC with AUC of 0.95, accuracy of 0.90, f1-score of 0.75 on the test set. A model that was trained on combined FLAIR-DWI radiomic features did not provide incremental accuracy. The results show that a XGBoost classifier using multiparametric radiomic features derived from preoperative MRI can predict IDH1 mutation status with > 90% accuracy.


Introduction
Gliomas are primary brain tumors that account for nearly 30% of all primary brain tumors and 80% of all malignant brain tumors, and they are accountable for majority of deaths from primary

Prediction of IDH1 Mutation Status Using XGBoost Models
Our FLAR-trained XGBoost model utilized 33 final radiomic features and achieved a Receiver operating characteristic Area Under the Curve (ROC AUC) of 95%, Accuracy of 90%, Precision/Recall/f1-score of 94%/94%/94% for IDH1 wildtype, and 75%/75%/75% for IDH1 mutants. Of the 20 cases in the test set, it correctly classified 15 wildtype cases, incorrectly classified one wildtype as mutant, correctly classified three mutants, and incorrectly classified one mutant as wildtype.
The AUC with 95% confidence interval, Accuracy, Precision, Recall, and f1-score for each model is aggregated in Table 3. Figure 1 shows the ROC curve with the AUC score for each model.

Discussion
In patients with glioma, IDH1 mutation has been shown to be an independent positive prognostic biomarker with improved progression-free survival and treatment outcome in comparison to the IDH1 wildtype [15]. Although the genetic biomarkers are determined by histopathological testing, the ability to predict biomarker status noninvasively is of clinical interest, as large tissue specimens are often needed for accurate histopathological diagnosis and potential inaccuracies that are related to tumoral tissue heterogeneity. Furthermore, pre-surgical identification of these biomarkers can help in surgical planning and the determination of the extent of surgical resection.
Qualitative MRI features have been shown to correlate with IDH1 genotypes in high grade gliomas [16][17][18]. More recently, the T2-FLAIR mismatch sign, which is defined as the presence of hyperintense signal on a T2-weighted image and a relatively hypointense signal on FLAIR (except for a hyperintense peripheral rim), has been described as a helpful imaging marker of IDH-mutant gliomas [19,20].
Expanding on above studies, we selected XGBoost as our classifier and trained models using FLAIR and DWI radiomic features, in a diverse cohort of patients, including grade II, III, and IV gliomas. Our DWI model achieved AUC of 0.97 (I: 0.898, 1.000) and 90% accuracy ( Figure 1A), and our FLAIR model achieved an AUC of 0.95 (CI: 0.864, 1.000) and 90% accuracy ( Figure 1B). XGBoost is a non-linear gradient boosted tree model with superior performance in comparison to conventional ML models [13]. RF and GB are both sets of decision trees. Whereas, RF builds each tree independently and combines the results at the end, GB builds each tree sequentially, and works to correct the error of the previous tree [14]. In GB, there are more hyperparameters than RF to optimize. Therefore, it may be more difficult to optimize a GB algorithm, but a better tuned GB algorithm may outperform a RF. In the training process, XGBoost calculates the importance score of each feature in each iteration, which provides a basis for establishing a new tree with gradient direction in the next iteration [13,34]. When two features are correlated, then, when deciding a split, the tree will only choose the one feature with greater importance, and this process is repeated. This automated feature selection structure is of particular use with high-dimensional data with potential multicollinearity, such as in radiomics. Another advantage of XGBoost is that it provides both L1 and L2 regularization, thus handling sparsity and reduces overfitting [13]. In this study, 92 radiomic features that were calculated by our postprocessing program were used as input to train our DWI and FLAIR classifiers, and a total of 184 radiomic features as input to train our DWI-FLAIR classifier. The DWI-FLAIR model utilized 88 out of the initial 184 features (Appendix A Table A3). The DWI model utilized 71 out of the 92 features (Appendix A Table A4). The FLAIR model utilized 33 out of the initial 92 features (Appendix A  Table A5). A comparison of features that were selected by XGBoost and analysis of features with statistically significant differences between IDH1 wildtype and mutants (Appendix A Table A1) and between glioblastomas and non-glioblastomas (Appendix A Table A2) shows the effectiveness of the automated feature selection. In review, all of the features with statistical significance between IDH1 wildtype and mutant were included in at least one of the XGBoost models, except for two features, DWI_Original Gray Level Dependence Matrix High Gray Level Emphasis and FLAIR_Original First Order Maximum. In comparison, there were seven features that had statistical significance between glioblastomas and non-glioblastomas, but they were not included in any of the XGBoost models. This is consistent with the fact that we trained the models based on IDH1 status as opposed to glioma grade. In review of features within the top 10 that were included in our XGBoost model by automatic feature selection, but did not have statistical significant difference between IDH1 wildtype and mutant, the mean p-values were 0.07 in the DWI model, 0.08 in the FLAIR model, and 0.23 in the DWI-FLAIR model. This may partially explain why we did not observe an incremental value in combining DWI and FLAIR in our XGBoost models ( Figure 1C).
Our results compared favorably with the results of a recent study conducted by Shboul et al., where XGBoost models were used to predict several glioma biomarkers, including IDH mutation [23]. In this study, a XGBoost model was trained while using conventional MRI sequences in patients with grade II and III gliomas and reported an AUC of 0.83 in predicting IDH status.
In our experience, the diagnostic performance of the combined trained model using combination of DWI and FLAIR (AUC: 91%, accuracy: 90%) was comparable to the isolated model (DWI only or FLAIR only) and with no significant incremental diagnostic value (Table 3). This is likely attributed to the fact that, in the combined model, the number of features was doubled without increasing the number of observations, which can result in overfitting during the training process.
Interestingly, we observed from feature importance assessment that six out of the top 10 important radiomic features for the combined model (DWI-FLAIR) were from the DWI dataset (Table 4). DWI has shown to corelate with physiological characterization of tumors, such as cellularity and proliferation index, as a function of water diffusivity [35]. Prior histopathological studies showed that IDH mutations can decrease glioma proliferation through the upregulation of miR-128a [36]. It is plausible that DWI may hold invaluable information regarding the IDH status, as shown in our results.
Our study has several limitations. Our XGBoost classifiers were trained with single-center data and, thus, the generalizability of our results may be impacted by differences in imaging acquisition protocols and the image postprocessing programs that were used for radiomic feature extraction.
The small total sample size of in combination with the skewed distribution of IDH1 wildtype and mutant is a notable limitation. We utilized stratified sampling when splitting our dataset into train, validation, and test sets in order to account for the small proportion of IDH1 mutants and the sampling error that could be introduced during randomization. Subsequently, as suggested by prior reports [10,32], we applied SMOTE on our train set in order to prevent biased training that favors the majority class. During our hyperparameter tuning step, the parameters were optimized for the highest ROC AUC score, as ROC curves are mathematically insensitive to class distribution unlike accuracy. Nonetheless, the effect of the skewed distribution is observed in our study ( Table 2). The test set had 16 wildtype and four mutants. Our XGBoost models correctly classified 15/16 as wildtype, and correctly classified 3/4 as mutants. As the denominator is four for the mutants, one error results in a numerically steep decrease ( Table 3). The skewed distribution of IDH1 genotypes have been acknowledged in literature; studies with low grade gliomas involved predominantly IDH1 mutants [33,37] and studies with glioblastomas involved predominantly IDH1 wildtypes [10,28]. This is reflected in our dataset, as 17 out of 19 (89%) low grade gliomas were IDH1 mutants and 76 out of 81 (94%) grade IV gliomas were wildtype ( Table 2).
Another limitation is that, within our IDH1 wildtype population, there were two cases with minor IDH2 (p.R172M, p.R172K) mutations. IDH2 mutations are much less common than IDH1 and they are mutually exclusive with IDH1 mutations [3]. To our knowledge, there is no study that has specifically compared the radiomic features between IDH1 and IDH2 mutants in gliomas.
Another limitation is that tumor segmentation to generate VOI was performed by one observer and under supervision of another board-certified neuroradiologist, who made necessary adjustments before the extraction of radiomic features. Therefore, inter-observer variability assessment was not performed.
Another limitation of our ML-approach is that it does not fully explain the physiological significance of radiomic features. The prolonged survival of patients with IDH1 mutations has previously been proposed to be associated with less aggressive biological behavior from the perspective of MRI tumor heterogeneity [16]. In this study, no additional categorical variables, such as list of comorbidities, were included or analyzed in the machine learning process. Because our models were trained purely with radiomic features, future work may focus on studying the relationship between these quantitative radiomic features and tumor heterogeneity across IDH1 genotypes.
In conclusion, training a XGBoost classifier while using multiparametric radiomic features derived from DWI and FLAIR images discriminated IDH1 mutation status with accuracy > 90% and AUC > 0.95, which may provide an approach for noninvasive assessment of IDH1 status in patients with gliomas. Further studies with larger and more diverse MRI datasets are required to validate and improve upon our findings.

Materials and Methods
The overall study design is summarized as a diagram in Figure 2.

Patient Population
An institutional review board approved this retrospective study and informed consent was waived. Patients with initial diagnosis of grade II, III, or IV glioma between January 2016 to September 2018 were reviewed (n = 151). Patients were included if they 1) had grade II, III, or IV glioma with known IDH1 status from surgical pathology and 2) had preoperative MRI, including FLAIR, T1c+, and diffusion within 30 days of biopsy or surgical resection. One patient was removed due to lack of IDH1 status. 20 patients were excluded due to lack of preoperative DWI. The patients were excluded if they had insufficient MR image quality (motion artifact, n = 8). Patients were excluded if they had prior surgeries involving the tumoral bed (n = 8). Patients were excluded if they had prior radiotherapy treatment (n = 4). In addition, 10 patients were excluded due to unavailable useable diffusion data. This yielded, in total, a cohort of 100 patients ( Figure 3). Assuming incidence of >15,000 new cases of malignant gliomas per year in the United States [38], 5% margin of error, 90% confidence interval, and estimated IDH mutation rate of 12% in literature [39], the recommended sample size is 114 [40]. With our sample size of 100, the margin of error is 5.33%.

Patient Population
An institutional review board approved this retrospective study and informed consent was waived. Patients with initial diagnosis of grade II, III, or IV glioma between January 2016 to September 2018 were reviewed (n = 151). Patients were included if they 1) had grade II, III, or IV glioma with known IDH1 status from surgical pathology and 2) had preoperative MRI, including FLAIR, T1c+, and diffusion within 30 days of biopsy or surgical resection. One patient was removed due to lack of IDH1 status. 20 patients were excluded due to lack of preoperative DWI. The patients were excluded if they had insufficient MR image quality (motion artifact, n = 8). Patients were excluded if they had prior surgeries involving the tumoral bed (n = 8). Patients were excluded if they had prior radiotherapy treatment (n = 4). In addition, 10 patients were excluded due to unavailable useable diffusion data. This yielded, in total, a cohort of 100 patients ( Figure 3). Assuming incidence of >15,000 new cases of malignant gliomas per year in the United States [38], 5% margin of error, 90% confidence interval, and estimated IDH mutation rate of 12% in literature [39], the recommended sample size is 114 [40]. With our sample size of 100, the margin of error is 5.33%.

Histopathological Data
Tissue samples were obtained from patients undergoing MRI-guided tissue biopsy or tumor resection, as part of routine clinical care and diagnostic neuropathology and molecular evaluation. Hematoxylin and eosin (H&E) sections and immunohistochemistry (IHC) slides were re-reviewed by pathologists (N.T) and (F.K). IHC was performed on 5 micron thick sections from paraffin embedded tumor sections of all the evaluated patients. Cut sections were backed in 60 °C for one hour to deparaffinize and enhance tissue adhesion; following deparaffinization, the sections were stained with pre-diluted monoclonal anti-mIDH1 antibody, purified from culture supernatant in PBS (2% BSA, 0.05% NaN3, pH 7.4, DIA-H09L; Dianova, Hamburg, Germany), which has specificity for human IDH1 R132H point mutation. The high frequency and distribution of the IDH1 R132H mutation allow the highly sensitive and specific discrimination of higher-grade gliomas by immunohistochemistry. The staining was performed using Ventana Benchmark XT stainer, (pretreated with CC1 mild, detection DAB ultraview kit; Ventana Medical Systems, Tucson, Arizona). The majority of IDH1 mutations in diffuse gliomas occur at a specific sites and they are characterized by a base exchange of guanine to adenine within codon 132, resulting in an amino acid change from arginine to histidine (R132H). Therefore, a monoclonal antibody has been developed in order to detect the consistent mutant iteration site of IDH mutant protein, allowing for its use in paraffinembedded specimens (mIDH1R132H). The ability of the antibody to detect a small number of cells as mutant makes IHC more sensitive than sequencing for identifying R132H mutant gliomas. However, mutations in IDH2 and other mutations in IDH1 will not be detected using IHC. Next generation sequencing (NGS) was performed to confirm an IDH1 R132H negative IHC results, and/or if the patient is less than 55years old, as IDH mutations in general are extremely rare in patients over 55 years, as per College of American Pathologists (CAP) recommendations.

NGS Analysis
Clinical samples after the histological diagnoses of primary CNS glioma were tested for prognostic molecular biomarkers, as outlined in the NCCN guidelines, depending on the clinical and pathological context, NGS analysis was performed according to a licensed protocol in the molecular pathology lab where histological evidence of tumor cellularity of >20% was considered to be

Histopathological Data
Tissue samples were obtained from patients undergoing MRI-guided tissue biopsy or tumor resection, as part of routine clinical care and diagnostic neuropathology and molecular evaluation. Hematoxylin and eosin (H&E) sections and immunohistochemistry (IHC) slides were re-reviewed by pathologists (N.T) and (F.K). IHC was performed on 5 micron thick sections from paraffin embedded tumor sections of all the evaluated patients. Cut sections were backed in 60 • C for one hour to deparaffinize and enhance tissue adhesion; following deparaffinization, the sections were stained with pre-diluted monoclonal anti-mIDH1 antibody, purified from culture supernatant in PBS (2% BSA, 0.05% NaN3, pH 7.4, DIA-H09L; Dianova, Hamburg, Germany), which has specificity for human IDH1 R132H point mutation. The high frequency and distribution of the IDH1 R132H mutation allow the highly sensitive and specific discrimination of higher-grade gliomas by immunohistochemistry. The staining was performed using Ventana Benchmark XT stainer, (pretreated with CC1 mild, detection DAB ultraview kit; Ventana Medical Systems, Tucson, Arizona). The majority of IDH1 mutations in diffuse gliomas occur at a specific sites and they are characterized by a base exchange of guanine to adenine within codon 132, resulting in an amino acid change from arginine to histidine (R132H). Therefore, a monoclonal antibody has been developed in order to detect the consistent mutant iteration site of IDH mutant protein, allowing for its use in paraffin-embedded specimens (mIDH1R132H). The ability of the antibody to detect a small number of cells as mutant makes IHC more sensitive than sequencing for identifying R132H mutant gliomas. However, mutations in IDH2 and other mutations in IDH1 will not be detected using IHC. Next generation sequencing (NGS) was performed to confirm an IDH1 R132H negative IHC results, and/or if the patient is less than 55 years old, as IDH mutations in general are extremely rare in patients over 55 years, as per College of American Pathologists (CAP) recommendations.

NGS Analysis
Clinical samples after the histological diagnoses of primary CNS glioma were tested for prognostic molecular biomarkers, as outlined in the NCCN guidelines, depending on the clinical and pathological context, NGS analysis was performed according to a licensed protocol in the molecular pathology lab where histological evidence of tumor cellularity of >20% was considered to be acceptable. Extracted DNA was submitted to amplicon-based library preparation and sequencing according to manufacturer's procedures for the 50-gene Hotspot panel on a (Ion Personal Genome Machine™ (PGM™) System, Thermo Fisher Scientific Inc. Waltham, MA). In each run, a low level VAF (variant allelic frequency) control was included for both SNVs and indels. For each specimen, an average minimum coverage of 500× was considered to be acceptable to proceed. Based on the variants identified by the Torrent Variant Caller, a pipeline was applied to permit annotation, filtering of variants with a VAF of >5% (unless clinically relevant and with a VAF of >10%), and the exclusion of those in Genome Aggregation Database (gnomAD https://gnomad.broadinstitute.org/) with a population frequency >0.05%. Read depths of IDH1, and IDH2 targets were confirmed in all cases to be adequate to rule out false negatives. Molecular pathology workup of Gliomas also included NGS based testing (IHC) of Isocitrate Dehydrogenase. If, through immunohistochemistry, the IDH result was negative then, for those patients with >55 years, NGS was not performed, unless clinically indicated. For patients <55 years, NGS was performed in order to identify other IDH1 or IDH2 mutations.

Volume Acquisition and Texture Analysis
The tumors were manually segmented with volume-of-interest (VOI) analysis on commercially available FDA-approved software (Olea Sphere software, Olea Medical SAS, La Ciotat, France). T1c+, FLAIR and diffusion images (ADC/b1000) were coregistered on each examination using a 6-df transformation and a mutual information cost function. Subsequently, a VOI was generated while using a voxel-based signal intensity threshold method subsuming the entire region of FLAIR hyperintensity. This VOI was then overlaid onto coregistered T1c+ and diffusion datasets. The segmentation was conducted by 1 radiology resident (S.K) under supervision of an experienced board certified neuroradiologist (K.N.), who made necessary adjustments before radiomic features were extracted. One VOI was segmented from each patient.
Radiomic features were calculated from the VOIs using Olea Sphere software. A total of 92 texture features were collected: 19 first-order metrics, including the mean, standard deviation, skewness, and kurtosis, and 73 second-order metrics consisting of 23 gray level run length matrix [41], 16 gray level run length matrix [42], 15 gray level size zone matrix [43], five neighboring gray tone difference matrix [44], and 14 gray level dependence matrix [45]. Definitions and calculations of these features are explained elsewhere [46].

Statistical Analysis and ML
All of the statistical analysis was performed using Python. ( Sampling: the patients were divided into train, validation, and test sets with a 60:20:20 ratio, thus resulting in 60 cases for the train set, 20 cases for the validation set, and 20 cases for the test set. Stratified random sampling was employed in this process, thus approximately maintaining the ratio of IDH1 wildtype to mutant across the subsets to be equal to the ratio in the original dataset. Oversampling: the Synthetic Minority Oversampling Technique (SMOTE) was applied to the train sets [47].
Feature importance by XGBoost: the importance of each radiomic feature was assessed and collected, ordered by the average Gain across all splits, the feature was used in. The Gain was calculated by taking each feature's contribution for each tree in the XGBoost models and, thus, represents the relative contribution of the feature to the model.
Testing: each of the final models were tested using the respective test sets: DWI test set, FLAIR test set, and the DWI-FLAIR test set. For each model, a classification report containing the Accuracy, Precision, Recall, and f1-score was collected. For each model, the confusion matrix depicting the number of true positives, false positives, true negatives, and false negatives was collected. For each model, a ROC curve was drawn and the AUC was calculated. For each AUC, the 95% confidence interval was also calculated. Funding: This research received no external funding.

Conflicts of Interest:
Kambiz Nael is a consultant to Olea, none for others.