Raman Spectroscopy and Machine Learning for IDH Genotyping of Unprocessed Glioma Biopsies

Simple Summary Isocitrate dehydrogenase (IDH) mutation is one of the most important prognostic markers in glioma tumors. Raman spectroscopy (RS) is an optical technique with great potential in intraoperative molecular diagnosis and surgical guidance. We analyzed RS’s ability to detect the IDH mutation onto unprocessed glioma biopsies. A total of 2073 Raman spectra were extracted from 38 tumor specimens. From the 103 Raman shifts screened, we identified 52 shifts (related to lipids, collagen, DNA and cholesterol/phospholipids) with the highest performance in the distinction of the two groups. We described 18 shifts never used before for IDH detection with RS in fresh or frozen samples. We were able to distinguish between IDH-mutated and IDH-wild-type tumors with an accuracy and precision of 87%. RS showed optimal accuracy and precision in discriminating IDH-mutated glioma from IDH-wild-type tumors ex-vivo onto fresh surgical specimens. Abstract Isocitrate dehydrogenase (IDH) mutational status is pivotal in the management of gliomas. Patients with IDH-mutated (IDH-MUT) tumors have a better prognosis and benefit more from extended surgical resection than IDH wild-type (IDH-WT). Raman spectroscopy (RS) is a minimally invasive optical technique with great potential for intraoperative diagnosis. We evaluated the RS’s ability to characterize the IDH mutational status onto unprocessed glioma biopsies. We extracted 2073 Raman spectra from thirty-eight unprocessed samples. The classification performance was assessed using the eXtreme Gradient Boosted trees (XGB) and Support Vector Machine with Radial Basis Function kernel (RBF-SVM). Measured Raman spectra displayed differences between IDH-MUT and IDH-WT tumor tissue. From the 103 Raman shifts screened as input features, the cross-validation loop identified 52 shifts with the highest performance in the distinction of the two groups. Raman analysis showed differences in spectral features of lipids, collagen, DNA and cholesterol/phospholipids. We were able to distinguish between IDH-MUT and IDH-WT tumors with an accuracy and precision of 87%. RS is a valuable and accurate tool for characterizing the mutational status of IDH mutation in unprocessed glioma samples. This study improves RS knowledge for future personalized surgical strategy or in situ target therapies for glioma tumors.


Introduction
Molecular classification of gliomas, the most common primary malignant brain tumors in adults, allows a better prognostic and therapeutic stratification and, to date, is the standard evaluation [1]. The treatment involves, when feasible, surgical resection [2] followed, eventually, by radiotherapy and chemotherapy. Both molecular and immunohistochemical data are routinely determined several days after samples collection, foreclosing a personalized intraoperative surgical and oncological strategy.
The Isocitrate dehydrogenase (IDH) gene mutation is one of the most critical molecular markers to influence oncological outcomes and tumor response to adjuvant treatments in low and high-grade gliomas. Patients with IDH-mutated (IDH-MUT) tumors have better overall and progression-free survival and benefit from more extended surgical resection than IDH wild-type (IDH-WT) glioma patients. IDH mutation modifies the metabolic activity and catabolic production of tumoral cells dramatically. In IDH mutant cells, the conversion of isocitrate to α-ketoglutarate (α-KG) is abolished, whereas the production of D-2-hydroxyglutarate (D-2-HG) is enhanced. D-2-HG, an oncometabolite, has a primary role in gliomas oncogenesis, altering different processes involved in DNA and histones methylation and gene expression that drives the cell toward a staminal phenotype.
The extent of surgical resection is one of the strongest prognostic element in IDH-MUT glioma management [2][3][4]. Surgical excision has to be pursued up to functional boundaries (i.e., eloquent cortical and subcortical sites) to achieve a gross-total or a supra-total resection, possibly resulting in a transient postoperative neurological impairment that can delay or, in the worst scenarios, exclude adjuvant therapies. Therefore, the balance between immediate postoperative patients' functional integrity and maximal tumor resection should consider the tumor's biological behaviour. In this light, non-invasive devices supplying fast molecular analysis are advocated.
Raman spectroscopy (RS) [5,6] and similar optical technology (e.g., Fourier Transform Infrared Spectroscopy (FTIR) [7,8]) resulted effective tools to discriminate between cancer and normal tissue and, more recently, to investigate IDH mutational status [9,10]. RS studies using fresh tissue samples are of primary importance to improve Raman measurement in vivo, avoiding the well-known samples artefacts due to the histological blocks processing and storage [11]. Only few studies can provide RS data from fresh glioma tissue [10].
In this study, we investigated RS's capacity to distinguish IDH-MUT glioma biopsies from IDH-WT glioma tumor ex-vivo on fresh tissue samples.

Study Population and Experimental Design
Thirty-eight (38) tumor samples were collected from adult subjects undergoing surgery for presumptive glioma tumors. All procedures were performed with imaging and neurophysiological guidance [12,13] in order to achieve, when feasible, a safe supra-marginal resection [3]. We collected 21 samples from IDH-mutated tumors and 17 samples from IDH-WT tumors. All patients signed informed consent for the procedure. The study was conducted in line with the 1964 Declaration of Helsinki and later amendments and authorized by the ethic committee. Demographic, clinical, and spectroscopic features were registered, analyzed and were reported in Table 1.

Samples Collection and Analysis
All patients were submitted to surgery for presumptive low-or high-grade glioma. Magnetic resonance imaging was performed 24 h before surgery and employed as input for the Neuronavigation system (Curve, Brainlab. A.G., Munich, Germany). During the surgical procedure, a small part of the bulk tumor tissue was collected, with accurate imaging verification of the sampling area. The samples were immediately provided to the Raman analyst for processing, cleaned with NaCl solution, and placed under a CaF 2 window before Raman investigation.
After Raman measurements, the specimens were fixed and provided to the pathologist, blinded to the RS findings, for pathological and molecular analysis ( Figure 1) and diagnosis according to the 2016 WHO classification of CNS tumor [1]. As a part of the standard pathological evaluation in our institute, IDH mutational status was assessed using immunohistochemistry and confirmed through direct DNA sequencing. Based on the IDH mutational status, each sample was marked as "IDH-mutated" (IDH-MUT) or "IDH wild-type" (IDH-WT).

Raman Analysis
A benchtop spectrometer system (model RA800 series-Renishaw plc, Wotton-underedge, Gloucestershire, UK) was used to collect the Raman spectra from tissue samples. The system ran Renishaw's WiRE 4.0 software and was equipped with a near-infrared (NIR) laser (785 nm) with a maximum power of 500 mW. The initial spectral region of interest was 90-1800 cm −1 . We used an exposure time ranging from 0.5 to 2.5 s per spectrum. A range from 1 to 4 spectral accumulations for each acquisition was obtained. The power of the laser was kept at 100% for all the measurements. Spectra were acquired from randomly located points across the sample. The line-focus laser minimized potential photodamage or fluorescence induction. An automatic performance quality check was performed on silicon and polystyrene internal standards before the beginning of each measurement to reduce sample-to-sample variation. Temperature and humidity conditions, exposure time, laser power, and numbers of accumulations were iteratively optimized for the sample acquisition. A total of 1157 (mean 54.55; SD 6.28; range 35-66) points was used for spectral acquisition for all the samples (a single spectrum for each point). A total of 504 (mean 55.29; SD 5.37; range 46-66) points were analyzed for IDH-WT group. A total of 653 (mean 53.95; SD 7.01; range 35-64) points were analyzed for IDH-WT group.
We obtained the maximum number of spectra with a higher S2N ratio with a punctual acquisition mode for each sample. We completed the assessment on each sample within 60 min from the withdrawal to reduce the biological changes and to best simulate an in-vivo analysis [14].
The resulting spectra were respectively 653 for IDH-MUT and 504 for IDH-WT. Figure 2 reports the normalized median spectra with the 1st and 3rd interquartile bounds for each group. The frequency of all notable peaks and slopes were outlined on the median Raman chart and were added to those reported in the pre-existing bibliography [9,10]. We detected 103 Raman shifts and we used the intensity at each frequency as input features for the classification algorithms. Figure 2. Normalized median spectra with IQR for IDH-WT specimens (blue) and IDH-MUT specimens (red). Arrows mark the most discriminant peaks with a known biological assignment.

Classification Models
Studies focusing on interpretable Machine Learning models are of particular interest to the field, as they could yield physical insights from automatically highlighted patterns in the data. The Support Vector Machine with Radial Basis Function kernel (RBF-SVM) and the eXtreme Gradient Boosted trees (XGB) are two of the most well-known learning methods due to their theoretical performance guarantees and strong experimental results [18]; we thus decided to investigate these models [19][20][21]. We trained each classifier in Leave-one-patient-out cross-validation (LOPO) to achieve a balanced trade-off between performance and robustness. A further nested 5-fold cross-validation was performed during a hyper-parameters optimization. Feature selection was achieved using a statistical algorithm (ANOVA-ScikitLearn Fclassif), providing the top 52 features with the highest discrimination ability among the two datasets.
Parameters for the hyper-parameters grid search were: -XGB parameters were set to default except for step size shrinkage (eta) (0.01, 0.05, 0.1, 0.2, 0.3); tree method was set to 'hist', the learning objective was set to: binary-logistic; the evaluation metric to: negative log-likelihood (logloss); the maximum depth of a tree: (5,8,10,12,15), the minimum loss reduction required to make a further partition on a leaf node of the tree (gamma): Grid search parameters for SVM were set default except for the kernel set to Radial Basis Function (RBF), and the regularization parameter (C):(0.01, 0.1, 1, 10, 100, 1000).

Statistical Analysis
Statistical analysis was conducted on the top 52 Raman shifts supplied by the algorithm to confirm the statistical difference between the two groups at each shift. A Mann-Whitney test (two-tailed, α = 0.05) was performed after checking normality using the Shapiro-Wilk test (software SPSS statistics 25.0; IBM SPSS Inc., Chicago, IL, USA). Mann-Whitney U values and p-values were calculated for each top Raman shift and reported. Mann-Whitney N value was the same for all the tests calculated: n 1 = 504; n 2 = 653; n 1 + n 2 = N = 1157.

Classification Performances
We obtained 2073 spectra from 38 un-treated specimens: 1133 marked as IDH-MUT and 940 marked as IDH-WT (Table 1). The resulting spectra obtained after the processing phases were respectively 653 for IDH-MUT and 504 for IDH-WT. From the 103 Raman shifts analyzed as input features by XGB and SVM, the cross-validation loop identified 52 shifts with the best ability in the distinction of the two groups. Table 2 shows the most representative Raman shifts with assigned biological significance.  The RBF-SVM algorithm presented the best average performance in distinguishing IDH-MUT from IDH-WT tumor, with an accuracy of 87%, precision 87%, recall 87% and F1-score 87%. These metrics were calculated on detection outcomes of True Negative (TN), True Positive (TP), False Negative (FN) and False Positive (FP).
RBF-SVM correctly classified IDH-MUT spectra with 90% precision, 87% accuracy, recall 87% and F1-score 87%. RBF-SVM had slightly lower performance in IDH-WT spectral classification with 84% precision, 87% accuracy, recall 87%, and F1-score 85%. The XGB showed an average accuracy and precision of 85% (recall and F1-score: 85%). Average performance metrics from the two classification models are reported in Table 3.  Figure 3 underlined that RFB-SVM had the best performances in the distinction between the two classes with an AUC of 0.87 compared to 0.85 of XGB.

Spectral Analysis
A total of 2073 spectra were acquired from 38 different samples: 1133 spectra from 21 samples labelled as IDH-MUT and 940 spectra from 17 specimens labelled as IDH-WT ( Table 1). The median spectra of the two groups were plotted and analyzed to identify Raman shifts with possible biological importance. The two algorithms (XGB and SVM) used the intensity of each shift as input features and tested for their discriminative ability. Among 103 peaks adopted as input features, the algorithms identified 52 different Raman peaks with the highest ability in differentiating the two molecular groups. The resulting 52 different spectra were examined to identify shifts with possible known biological significance from the published literature [11,[19][20][21][22][23][24][25][26][27][28].
The analysis of the Raman spectra assigned to nucleic acids, proteins, and lipids allowed to identify the different biochemical signature of the IDH-MUT and IDH-WT biopsies (Figure 2). Both groups were characterized by very intense bands around 1300 and 1440 cm −1 . These shifts are linked to protein and lipid, and phospholipids. Further notable peaks are evident at band from 600 to 720 cm −1 , such as peaks at 640 cm −1 (cysteine and tyrosin related to protein), 700 cm −1 (cholesterol) and 719/720 cm −1 (choline; DNA related to nucleic acids group). The group of peaks from 419 to 430 cm −1 showed contribution from cholesterol (419/421 cm −1 ) and cholesterol ester (430 cm −1 ).
All the 52 top peaks identified by the algorithms exhibited statistically significant differences in intensity between the two molecular groups. The prominent regions near 1300 cm −1 (p < 0.001; U = 74,008), 1439 cm −1 (p < 0.001; U = 82,982), 1440 cm −1 (p < 0.001; U = 83,904) and 1441 cm −1 (p < 0.001; U = 85,905) were reduced in IDH-WT specimens and are linked to CH 2 /CH 3 deformation of lipids side chains, amino acids, proteins and cholesterol/cholesterol ester [21]. The group of peaks between 1330 cm −1 and 1441 cm −1 showed the best graphical separation between the two plotted groups, containing 13 out of 52 (25%) of the top peaks analyzed. These shifts are more intense in the IDH-MUT specimens and are related to triglycerides, fatty acids at 1305 cm

Discussion
In this study we demonstrate that Raman Spectroscopy has the capability to determine the IDH mutational status of fresh glioma biopsies with good precision and accuracy. This can be readily achieved next to the operating theatre and in a short period of time without any additional tissue processing. Our findings added further evidence to the few available studies on molecular characterization of untreated [10] and treated [9] glioma tissue with standard Raman Spectroscopy. Adult patients harbouring IDH mutated low-grade glioma have a better prognosis than IDH-WT tumors after extensive tumor resection [2] and can therefore dedicate, if required, a longer time in rehabilitation after surgery before eventual adjuvant therapy. To date, the presence of IDH mutation is one of the strongest predictors of progression-free survival, overall survival, and response to chemotherapy and radiotherapy. Conversely, IDH WT tumors are often aggressive diseases that must be treated with adjuvant treatment soon after removal. For these reasons, it is crucial to properly balance the extent of resection with postoperative patients' functional integrity, considering gliomas' infiltrative nature and the importance of pursuing resection beyond imaging-defined limits. For these reasons, a supratotal resection can be a good strategy in IDH-MUT tumor but would not be the best oncological strategy in the case of IDH-WT tumors, due to the risk of transient neurological impairment that can hamper adjuvant treatments. A rapid, non-invasive, intraoperative technique that can provide an accurate molecular diagnosis is a powerful tool that would allow the surgeon to adapt the surgical strategy to a personalized approach. In the molecular era, the extent of tumor resection has to be critically balanced with the prognostic impact of surgery on different glioma molecular sub-groups and with the risk of postoperative new-onset neurological morbidity due to the closer proximity of the tumor boundaries to formal structures.

Raman Spectroscopy of Fresh Biopsies Next to the Operative Room
We performed the spectral acquisition within 60 min after surgical samples extraction [14] to avoid any tissue biological or chemical modification. All the analyses were completed adjacent to the theater, and the spectra were obtained from several points of the same specimen.
Future in-vivo detection of the IDH mutation needs data from fresh samples to replicate an online Raman use and to build a well-structured database that can improve in-vivo Raman analysis. To reach this goal, we decided to use fresh glioma tissue washed only with a saline solution without further treatment. To date, only few studies with data from Raman analysis of fresh and untreated tissue are available [10]. An online tool has been used to characterize molecular tissue changes even in the occurrence of tissue contaminants (i.e., in un-treated specimens) and with a low number of spectra. Data from cryosections or treated specimens can contain, as illustrated before [11,29], different types of artefacts (protein denaturation, cross-linking or loss of lipid) that can interfere with the discrimination process and undermine accurate tissue analysis. After the Raman analysis, the tissue samples were sent immediately to permanent immunohistochemical staining and molecular and pathological analysis that is, to date, the gold standard for diagnosis.
In line with previous results [9,10], this study shows that Raman Spectroscopy can identify chemical differences between IDH-MUT and IDH-WT fresh samples. The chemical characteristics of specimens and some distinctive shifts matched with previous authors' findings, proving that this technique is reliable and exploitable in different surgical scenarios. Two different algorithms allowed us to evaluate the 52 best representative Raman shifts among the 103 investigated. The RBF-SVM and eXGB showed excellent performances and allowed to distinguish IDH-MUT tumors from IDH-WT tumors with an average accuracy of 87% and 85%, respectively. These results are consistent with the performances reported in previous works on frozen samples [9] (ranging from 88 to 89%) and on fresh tissues (sensitivity and specificity ranging from 91 to 95%). RBF-SVM correctly classified Raman spectra with a precision of 90% in IDH-MUT tumors and 84% in the case of IDH-WT tumors.

Raman Spectroscopy and IDH Mutation
IDH mutation in CNS surgery is specific for glioma and can be pivotal in those cases where morphological, epidemiological, and radiological factors are confusing. In IDH-1 and IDH-2 mutated tumors, the abnormal production of D-2-HG is related to a profound alteration in cell metabolism, causing modifications in energetic status, altered response to oxidative stress, mutations in DNA and histones methylation status. Furthermore, Koivunen et al. demonstrated that D-2-HG could decrease the production and activity of HIF-1α, with a critical role in the cellular response to hypoxia and angiogenesis. This alteration can increase the risk of DNA damage and mutation due to a rise in cellular oxidative stress [30].
Our work demonstrated that the biochemical changes induced by IDH mutation could be detected by the mean of RS and exploited in fresh tissue glioma biopsies to distinguish between IDH-MUT and IDH-WT tumors. This tool, measuring the biological consequence of both IDH1 and IDH2 mutation, can overcome some limitations of immunohistochemistry that is sensitive to the common IDH1 mutation. RS could therefore be an essential tool in the molecular diagnosis of these tumors and minimize the error rate if the surgeon approaches a lesion that harbors a rare IDH mutation.
Furthermore, reliable and accurate information regarding IDH mutational status before or during surgery can help target glioma in surgical scenarios with genotype-specific local treatment against IDH, avoiding systemic toxicity [31].

Study Limitation
The small sample size of specimens analyzed and the time requested for each analysis, higher than some of the previously reported studies [5,10], were primary limitations. The limited study population hampers a stronger and more powerful data analysis. However, this study is one of the few clinical series available and represents a valuable contribution to this methodic. The decrease in the number of spectra before and after the pre-processing and processing steps indicates the need for particular attention to the raw data acquisition process to avoid excessive noise.
Although developed in an actual surgical scenario, the nature of this study was not prospective: the data obtained require validation in a prospective cohort to assess the real impact of this methodic on the surgical decision workflow and patients' oncological and functional outcomes.
We provided preliminary technical and clinical advances and a variegate spectrum database on untreated tissue to develop further this safe and reliable technology.

Conclusions
In conclusion, this study demonstrates the ability of Raman spectroscopy to detect changes in the biochemical composition of glioma tumors induced by IDH mutation, exvivo onto untreated specimens. The intraoperative detection of IDH mutational status can be of primary importance, allowing the surgeon to tailor the surgical workflow intraoperatively and eventually deliver future in situ therapies. This study adds an important contribution to the available knowledge on this field and is a critical footstep for obtaining an accurate in-vivo intraoperative IDH genotyping. Institutional Review Board Statement: The institutional ethics committee (Università degli studi di Milano/Humanitas Research Hospital Ethics Committee) approved the study (authorization no. 595) that was performed in accordance with the 1964 Declaration of Helsinki and later amendments.

Informed Consent Statement:
A written informed consent to join this study was subscribed before all surgical procedures. Data Availability Statement: Data acquired for this study are accessible from the corresponding author on reasonable request.

Conflicts of Interest:
Authors declare to have no conflict of interest.