Glioblastoma Molecular Classification Tool Based on mRNA Analysis: From Wet-Lab to Subtype

Most glioblastoma studies incorporate the layer of tumor molecular subtype based on the four-subtype classification system proposed in 2010. Nevertheless, there is no universally recognized and convenient tool for glioblastoma molecular subtyping, and each study applies a different set of markers and/or approaches that cause inconsistencies in data comparability and reproducibility between studies. Thus, this study aimed to create an applicable user-friendly tool for glioblastoma classification, with high accuracy, while using a significantly smaller number of variables. The study incorporated a TCGA microarray, sequencing datasets, and an independent cohort of 56 glioblastomas (LUHS cohort). The models were constructed by applying the Agilent G4502 dataset, and they were tested using the Affymetrix HG-U133a and Illumina Hiseq cohorts, as well as the LUHS cases. Two classification models were constructed by applying a logistic regression classification algorithm, based on the mRNA levels of twenty selected genes. The classifiers were translated to a RT-qPCR assay and validated in an independent cohort of 56 glioblastomas. The classification accuracy of the 20-gene and 5-gene classifiers varied between 90.7–91% and 85.9–87.7%, respectively. With this work, we propose a cost-efficient three-class (classical, mesenchymal, and proneural) tool for glioblastoma molecular classification based on the mRNA analysis of only 5–20 genes, and we provide the basic information for classification performance starting from the wet-lab stage. We hope that the proposed classification tool will enable data comparability between different research groups.


Introduction
The molecular classification of glioblastoma has received increasing scientific attention over the past 15 years [1,2]. Since glioblastoma is one of the most lethal (with a median survival of 15 months) and incurable human diseases [3,4], different types of diagnostic, prognostic, and therapeutic approaches have to be applied to comprehend these issues. The inter-and intra-tumor heterogeneity of the molecular landscape became one of the most interesting features of GBM that was analyzed by examining different types of molecules. The main purpose was to classify tumors into more homogenous groups with similar behavior with regard to similar response to therapy, comparable course of the disease, probability of tumor relapse, comparable patient survival, etc. One of the first GBM classification studies published in 2006 by Phillips et al. suggested the three-group (proneural-PN, proliferative-Prolif, and mesenchymal-Mes) subtyping model, based on the signature of 35 genes mRNA expression [5]. In 2010, a four-subtype classification model, referred to as proneural-PN, classical-CL, mesenchymal-ME, and neural-NE, which is tightly associated with genomic abnormalities, was proposed by Verhaak et al. based on the mRNA expression of 840 genes [6]. During the past 10 years, dozens of

Classifier Development
Data from the Agilent G4502 platform were applied for classification model building. The construction of the classifier incorporated only mesenchymal, classical and proneural cases, since recently it was shown that TCGA samples of the neural subtype are samples with a high content of normal tissue, rather than a separate subtype of tumor [12]. After the case filtering described in the methods section and neural subtype elimination, a total of 419 cases were used for model building (MES n = 151; CL n = 140; and PN n = 128). We tested two classification algorithms: logistic regression with LASSO (L1) and support vector machine (SVM) for classifier development to select the most suitable one. Both algorithms revealed highly comparable results applying a 10-fold cross-validation design. We reduced the number of genes to suggest the minimal number of genes that allows tumor classification with an acceptable accuracy. We found that a minimum of five genes is required to receive an overlap with the original classification of at least 85%. First, we selected the first top five ranked genes according to the ANOVA scoring method (Anova was revealed to be the most representative of all methods used, see Figure 1A). Next, we

Classifier Development
Data from the Agilent G4502 platform were applied for classification model building. The construction of the classifier incorporated only mesenchymal, classical and proneural cases, since recently it was shown that TCGA samples of the neural subtype are samples with a high content of normal tissue, rather than a separate subtype of tumor [12]. After the case filtering described in the methods section and neural subtype elimination, a total of 419 cases were used for model building (MES n = 151; CL n = 140; and PN n = 128). We tested two classification algorithms: logistic regression with LASSO (L1) and support vector machine (SVM) for classifier development to select the most suitable one. Both algorithms revealed highly comparable results applying a 10-fold cross-validation design. We reduced the number of genes to suggest the minimal number of genes that allows tumor classification with an acceptable accuracy. We found that a minimum of five genes is required to receive an overlap with the original classification of at least 85%. First, we selected the first top five ranked genes according to the ANOVA scoring method (Anova was revealed to be the most representative of all methods used, see Figure 1A). Next, we tested the combination of five features to select the most accurate, and we found that the panel of KLRC3, VAV3, EGFR, CSPG5, and FCGR2B ( Figure 1A) revealed the highest classification accuracy (>93%) that was comparable with the 20-gene classifier accuracy (>95%) (see Table 1). The classification accuracy applying the five-gene model revealed even slightly higher accuracy when recognizing the mesenchymal subtype (see Figure 1B).
Both algorithms revealed similar results. Nevertheless, the final model is built on the logistic regression classification algorithm, since it showed a more accurate classification during the validation stage (Table 1).

Validation and Testing of Classification Models
The classification models were tested on two public datasets generated by a gene expression array (Affymetrix HT) and by RNA sequencing (Illumina HiSeq). The gene expression microarray dataset consisted of 419 (MES n = 151; CL n = 140; PN n = 128) samples, while the RNA sequencing dataset consisted of 122 (MES n = 47; CL n = 41; PN n = 34) cases (neural subtype cases were removed from both datasets). We applied both classification models (20-gene and 5-gene) for testing. As was suspected, the classification accuracy of data generated by different array platforms or even methods was slightly lower, but it was, nevertheless, sufficiently high to be used as a tool. The accuracy of the 20-gene classifier of the Affymetrix HT dataset reached 90.7%, while the Illumina HiSeq dataset revealed 91% of overall accuracy. Five five-gene classifiers revealed 85.9% classification accuracy when testing the Affymetrix HT dataset and 87.7% accuracy when testing the Illumina HiSeq cases (see Figure 2).
Receiver operating characteristic (ROC) curve analysis revealed perfect test results and the ability to discriminate between subtypes (testing subtype versus other two subtypes). Comparable results between both classifiers using an expression array and sequencing data were obtained (area under the curve-AUC > 0.9, Figure 2). The discrimination of the PN subtype showed the best test parameters compared with MES and CL subtypes. To visualize the distance between the cases of different subtypes, we applied multidimensional scaling (MDS) analysis to the data (including the set of all 20 selected genes) and calculated the Euclidean distance between centroids of the clusters. Analysis revealed PN subtype specimens to be more distally located in relation to the MES and CL subtypes in both datasets (see Figure 3).  Receiver operating characteristic (ROC) curve analysis revealed perfect test results and the ability to discriminate between subtypes (testing subtype versus other two subtypes). Comparable results between both classifiers using an expression array and sequencing data were obtained (area under the curve-AUC > 0.9, Figure 2). The discrimination of the PN subtype showed the best test parameters compared with MES and CL subtypes. To visualize the distance between the cases of different subtypes, we applied multidimensional scaling (MDS) analysis to the data (including the set of all 20 selected revealed PN subtype specimens to be more distally located in relation to the MES and CL subtypes in both datasets (see Figure 3). The match of the classified cases to the original specimens' subtype assignment (according to Verhaak) is shown in the heatmap (see Figure 4).  The match of the classified cases to the original specimens' subtype assignment (according to Verhaak) is shown in the heatmap (see Figure 4). revealed PN subtype specimens to be more distally located in relation to the MES and CL subtypes in both datasets (see Figure 3). The match of the classified cases to the original specimens' subtype assignment (according to Verhaak) is shown in the heatmap (see Figure 4).  The gene selection approach allows one to select genes with significant expression differences between molecular subtypes (p < 0.001, ANOVA). BCAS1, GPR17, ERBB3, PDGFRA, SNAP91, DNM3, KLRC3, and PFN2 gene expressions were significantly increased in PN compared to MES and CL cases. MET, FCGR2B, DAB2, and PTPRC levels were significantly increased in MES cases compared with the other two subtypes, while EGFR, SPRY2, VAV3, CDH4, NR2E1, and NES expressions were significantly increased in CL cases. CHI3L1 and CSPG5 expression was significantly decreased in PN and MES subtypes, respectively (see Figure 4). The pattern of the heat map was repeated between datasets, and, for instance, the genes that were upregulated in the PN subtype in the Affymetrix HT expression array cohort were similarly upregulated in the Illumina HiSeq dataset.

The Construction of the Gene Expression System Applying Ordinary qPCR for Glioblastoma Subtyping
To deliver a functional glioblastoma molecular classification tool, we also developed technical conditions for the analysis of selected genes' expression by applying qPCR. Since different target sequences selected for amplification of the same gene may result in different gene expression measurements, we designed an expression analysis system based on microarray probe sequences. We used microarray probe sequences to construct qPCR primers that fully or partly covered the probe sequences. All the primers were designed to anneal at 60 • C to enable expression examination of all genes in a single plate. Primer sequences and qPCR conditions are provided in Table 2. Three endogenous controls previously reported to be suitable for glioma expression research were applied for this study. It should be noted that both the five-gene and 20-gene classification models are designed by applying the indicated housekeeping genes.

Subtype Analysis in LUHS Cohort
Gene expression data were obtained as CT values and were normalized to the geometric mean of three endogenous control genes' (ACTB, GAPDH, and YWHAZ) CT values (∆CT). Data of each analyzed gene were normalized (centered) based on the cohort average and x-x − (where x = data value; x − = mean of a dataset) was included. Then LUHS cohort was tested by applying both 20-gene and five-gene classification models. The calculated molecular subtype matched in 52 specimens out of 56 (92.86%) when comparing the 20-gene and five-gene models' outputs. The highest match was in the PN group, where 18 out of 19 (94.7%) specimens were identically assigned. MES and CL groups demonstrated a slightly lower match-17 out of 20 (85%) were identically assigned in both cases ( Figure 5A). We visualized the LUHS cohort gene expression profile in a heat map to elucidate if the same genes that were up-or down-regulated in the TCGA dataset subtypes were similarly aberrant in the LUHS dataset ( Figure 6A). The analysis demonstrated that the same genes that were significantly upregulated, for instance in the PN subtype in the TCGA cohorts (BCAS1, GPR17, ERBB3, PDGFRA, SNAP91, DNM3, KLRC3, and PFN2), were also Next, we performed a subtype proportion comparison after classifying the LUHS cohort by both models. A Z-score test was used to compare subtype proportions between cohorts. Analysis revealed no difference between LUHS cohort subtypes, calculated applying 20-gene or five-gene models, compared with the Affymetrix HG and Illumina HiSeq datasets classified according to the Verhaak system (p > 0.05). The proportions of each cohort are shown in Figure 5B.
We visualized the LUHS cohort gene expression profile in a heat map to elucidate if the same genes that were up-or down-regulated in the TCGA dataset subtypes were similarly aberrant in the LUHS dataset ( Figure 6A). The analysis demonstrated that the same genes that were significantly upregulated, for instance in the PN subtype in the TCGA cohorts (BCAS1, GPR17, ERBB3, PDGFRA, SNAP91, DNM3, KLRC3, and PFN2), were also upregulated in PN samples of the LUHS cohort that was classified by applying the 20-gene classifier (see Figure 6B). Similar data were obtained by comparing MES and CL-specific genes (Figures 4 and 6). It is worth mentioning that the expression levels of all screened-out genes (applying the design described above) were significantly deregulated between subtypes ( Figure 6B). Survival analysis did not reveal a significant difference between subtypes in any analyzed cohort (p > 0.05) (Figure 7). The log-rank test of the TCGA dataset (Affymetrics HT) showed a tendency towards better survival prognosis for PN patients (χ 2 = 3.601, df = 2, p = 0.165), as well as PN, MES and CL survival means of 669, 484, and 503 days, respectively ( Figure 7A). Neither the Illumina HiSeq cohort classified according to the original Ver- Survival analysis did not reveal a significant difference between subtypes in any analyzed cohort (p > 0.05) (Figure 7). The log-rank test of the TCGA dataset (Affymetrics HT) showed a tendency towards better survival prognosis for PN patients (χ 2 = 3.601, df = 2, p = 0.165), as well as PN, MES and CL survival means of 669, 484, and 503 days, respectively ( Figure 7A). Neither the Illumina HiSeq cohort classified according to the original Verhaak system, nor the LUHS cohort classified using the 20-gene classifier, revealed such a tendency (Illumina HiSeq: χ 2 = 0.404, df = 2, p = 0.817; LUHS cohort 20-gene classified: χ 2 = 0.84; df = 2, p = 0.657) ( Figure 7B,C). The shortest mean survival estimates were in the MES subtype patients in all cohorts.

Discussion
Most glioblastoma studies incorporate a layer of tumor molecular subtyping based on the classification system instituted by Verhaak et al., 2010 [6]. Originally, Verhaak et al. classified glioblastomas into four subtypes based on the expression of 840 genes [6]. However, the large number of genes used in the original classifier encourages scientists to optimize the number of features to save expenses and time and to make the molecular classification more applicable. Even though the Verhaak subtyping system has become the most acceptable model, researchers are still applying different methods and markers to assign subtypes to the newly analyzed glioblastoma cohort. Several glioblastoma molecular classification models have been suggested since the Verhaak four-group subtype system was proposed [7,8,[10][11][12][20][21][22][23][24]. Nevertheless, most of them provide lists of informative subtyping molecules rather than the classification tool or algorithm itself [7,8,[10][11][12]23,24]. Some of the provided glioblastoma classification schemes reclassify tumors in novel subtypes/groups/clusters. However, such a reclassification is incomparable among the studies that were previously published [7,8,24]. During the past decade, two GBM classification tools, based on the Verhaak four-group classification, were proposed in 2014 [20] and 2016 [21], which use 121 and 48 molecules for GBM classification, respectively. Relatively large numbers of genes do not make these tools highly attractive, since the analysis of at least 48 targets still needs high throughput approach. Moreover, recently, few studies have demonstrated that specimens of the neural subtype are samples with high content of normal tissue rather than separate glioblastoma subtypes [11,12,25], indicating the need for a standardized three-group glioblastoma molecular classification tool. Thus, with this work, we suggest a simple and cost-efficient three-class glioblastoma molecular classification tool based on mRNA analysis of five to 20 genes. We hope that this model will encourage researchers and physicians to use the suggested glioblastoma subtyping model more frequently in the future.
The overall accuracy of the proposed glioblastoma subtyping models (five-gene and 20-gene) varied between ~86% and 91%. The original Verhaak classifier was developed by applying silhouette width filtering when selecting only the "core samples", which then were used for model construction [6]. Thus, achieving absolute classification accuracy is not a feasible task, even when incorporating all 840 features for the classification. Current

Discussion
Most glioblastoma studies incorporate a layer of tumor molecular subtyping based on the classification system instituted by Verhaak et al., 2010 [6]. Originally, Verhaak et al. classified glioblastomas into four subtypes based on the expression of 840 genes [6]. However, the large number of genes used in the original classifier encourages scientists to optimize the number of features to save expenses and time and to make the molecular classification more applicable. Even though the Verhaak subtyping system has become the most acceptable model, researchers are still applying different methods and markers to assign subtypes to the newly analyzed glioblastoma cohort. Several glioblastoma molecular classification models have been suggested since the Verhaak four-group subtype system was proposed [7,8,[10][11][12][20][21][22][23][24]. Nevertheless, most of them provide lists of informative subtyping molecules rather than the classification tool or algorithm itself [7,8,[10][11][12]23,24]. Some of the provided glioblastoma classification schemes reclassify tumors in novel subtypes/groups/clusters. However, such a reclassification is incomparable among the studies that were previously published [7,8,24]. During the past decade, two GBM classification tools, based on the Verhaak four-group classification, were proposed in 2014 [20] and 2016 [21], which use 121 and 48 molecules for GBM classification, respectively. Relatively large numbers of genes do not make these tools highly attractive, since the analysis of at least 48 targets still needs high throughput approach. Moreover, recently, few studies have demonstrated that specimens of the neural subtype are samples with high content of normal tissue rather than separate glioblastoma subtypes [11,12,25], indicating the need for a standardized three-group glioblastoma molecular classification tool. Thus, with this work, we suggest a simple and cost-efficient three-class glioblastoma molecular classification tool based on mRNA analysis of five to 20 genes. We hope that this model will encourage researchers and physicians to use the suggested glioblastoma subtyping model more frequently in the future.
The overall accuracy of the proposed glioblastoma subtyping models (five-gene and 20-gene) varied between~86% and 91%. The original Verhaak classifier was developed by applying silhouette width filtering when selecting only the "core samples", which then were used for model construction [6]. Thus, achieving absolute classification accuracy is not a feasible task, even when incorporating all 840 features for the classification. Current classification models are comparable with Crisman et al.'s published four-subtype 48-gene model, where accuracy varied between 81.48% and 91.88%, depending on the dataset used [21]. Madurga et al. reduced the classification genes list to 20 genes that showed 89-90% classification accuracy [12]. Pal et al. defined a four-group glioblastoma molecular classifier based on 121 isoform-level gene signatures and received 92% accuracy [20]. Taking this all together, the proposed classification models show similar classification accuracy to those published previously. Nevertheless, the numbers of the features (genes) are considerably smaller. As might be expected, the five-gene model revealed slightly lower classification accuracy values (85.9-87.7%), as compared to the 20-gene model (90.7-91%). Nevertheless, considering that the five-gene model requires four times fewer resources and time, the five-gene model is an excellent choice to classify glioblastoma cohorts in a cost-effective way. Differently from the previously proposed models, current tools provide not only qualitative information about the calculated molecular subtype, but they also provide qualitative information-the probability of the predicted subtype.
The current study, in addition to a list of the genes and the classification model, also provides a user-friendly tool with all instructions for glioblastoma cohort subtype identification. Moreover, the current study provides technical information on optimized and validated qPCR conditions that enable data reproducibility and comparability between the researchers if they follow the described protocol. Different target sequences selected for amplification of the same gene may result in different expression levels. Thus, here we are providing designed and tested primers and optimized qPCR conditions to maximize the reproducibility of the data. Thus, we hope that the proposed glioblastoma classification model with basic information for target analysis in a wet-lab stage will encourage researchers and physicians to use the suggested system more frequently in the future.

TCGA Gene Expression Data Processing
The Cancer Genome Atlas (TCGA) coordination center data [27] were used for developing the classifier. Gene expression data of GBM patients with known IDH status, survival data, and Verhaak subtype were collected from the UCSC Xena [28] and the GlioVis [29] data portals. We used level 3 interpreted data of gene expression estimates given in log space, which were mapped onto the human genome coordinates using the UCSC Xena HUGO probeMap [28]. The analysis encompassed three public datasets generated by three gene expression array platforms: the Affymetrix HT Human Genome U133a microarray platform (n = 539), the Agilent 244K custom gene expression G4502A microarray platform (n = 585), and the RNA sequencing platform Illumina HiSeq 2000 RNA Sequencing (n = 172). The cases for the analysis were selected on the basis of four criteria: the subtype of a sample according to Verhaak was specified; the sample type was a primary tumor (recurrent or secondary tumors were eliminated); the case had information about all targets selected for the analysis; and the tumor was not treated prior to resection. After data filtering, based on the above-listed criteria, 505 cases from the Affymetrix HG-U133a, 505 cases from the Agilent G4502, and 162 cases from the Illumina HiSeq datasets were selected for further steps, respectively. Cases that were assigned to the neural subtype were removed from the datasets. The final gene expression microarray dataset consisted of 419 cases, and the RNA sequencing dataset consisted of 122 cases. Because the development of models incorporated different datasets, the data were normalized by applying a standard score: z = (x − u)/σ.

Patient Samples
A total of 56 human glioblastoma specimens diagnosed according to the guidelines of WHO, classification 2016, 4th edition (update 3), were used for gene expression analysis. The surgical resections of GBM tumors were performed at the Department of Neurosurgery, Hospital of Lithuanian University of Health Sciences, from September 2017 to June 2021. Tumorous tissues of glioblastoma patients were collected, stored, and analyzed following written informed consent after approval by the Kaunas region Ethics Committee for Biomedical Research (permission code: P2-9/2003). The study was performed following the Lithuanian regulations, alongside the principles of the Helsinki and Taipei Declarations. All tissue samples after surgical resection were snap-frozen and stored in liquid nitrogen. Clinical data (gender and age at the time of resection,) as well as survival data, were collected for each patient. The overall patient survival rates were calculated from the date of tumor resection to the date of patient death or database closure (15 May 2022). None of the patients had received chemotherapy or radiotherapy prior to surgery. The demographic and molecular characteristics of patients are shown in Table 3.

RNA Isolation and qRT-PCR
GBM tumor specimens were homogenized by applying cryogenic grinding with liquid nitrogen, and total RNA from pulverized tissue was extracted using the TRIzol reagent (Invitrogen, Vilnius, Lithuania, cat. #: 15596026). In total, 2 µg of total RNA was used for cDNA synthesis by applying the High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Bleiswijk, The Netherlands, cat. #: 4374966). Selected targets' mRNA expression was investigated by applying quantitative RT-PCR SYBR Green I assay, in three replicates, on a 7500 Fast Real-time PCR detection system (Applied Biosystems, Foster City, CA, USA). The PCR reaction consisted of 3 µL (15 ng when calculating from RNA used for cDNA synthesis) of cDNA, 6 µL of 2x Power SYBR™ Green PCR Master Mix (Applied Biosystems, Bleiswijk, The Netherlands, cat. #: 4368702), 0.29-0.67 µM of primer, and nuclease-free water. For detailed information on primer sequences, amplicons, primer amounts per reaction, as well as PCR cycling conditions, see Table 2. Relative quantification method-∆CT (when normalized to reference genes) was used for data normalization. Data were normalized according to the geometric mean of the CT estimate of three reference genes (ACTB, GAPDH, and YWHAZ). The final values were used as 2 −∆CT (fold change) calculations.

Data Analysis
Differences across two independent groups were analyzed by applying a t-test. A chi-square test was used for categorical data analysis. Survival analysis was performed by applying the Kaplan-Meier curve method, and a log-rank test was used to compare the difference in survival curves across groups. To show the reliability of the survival estimate, the confidence interval (CI), with a 95% confidence level, was presented. The level of significance was p < 0.05. Data visualization, target selection, model construction, and testing were performed using the machine learning and data visualization toolkit "Orange"  Informed Consent Statement: Tumorous tissues of glioblastoma patients were collected, stored, and analyzed following written informed consent, as well as after approval by the Kaunas region Ethics Committee for Biomedical Research (permission code: P2-9/2003). The study was performed following the Lithuanian regulations, alongside the principles of the Helsinki and Taipei Declarations.
Data Availability Statement: Data is available from the corresponding author upon reasonable request via email.

Conflicts of Interest:
The authors declare no conflict of interest.