A SEMA3 Signaling Pathway-Based Multi-Biomarker for Prediction of Glioma Patient Survival

Glioma is a lethal central nervous system tumor with poor patient survival prognosis. Because of the molecular heterogeneity, it is a challenge to precisely determine the type of the tumor and to choose the most effective treatment. Therefore, novel biomarkers are essential to improve the diagnosis and prognosis of glioma tumors. Class 3 semaphorin proteins (SEMA3) play an important role in tumor biology. SEMA3 transduce their signals by using neuropilin and plexin receptors, which functionally interact with the vascular endothelial growth factor-mediated signaling pathways. Therefore, the aim of this study was to explore the potential of SEMA3 signaling molecules for prognosis of glioma patient survival. The quantitative real-time PCR method was used to evaluate mRNA expression of SEMA3(A-G), neuropilins (NRP1 and NRP2), plexins (PLXNA2 and PLXND1), cadherins (CDH1 and CDH2), integrins (ITGB1, ITGB3, ITGA5, and ITGAV), VEGFA and KDR genes in 59 II-IV grade glioma tissues. Seven genes significantly associated with patient overall survival were used for multi-biomarker construction, which showed 64%, 75%, and 68% of accuracy of predicting the survival of 1-, 2-, and 3-year glioma patients, respectively. The results suggest that the seven-gene signature could serve as a novel multi-biomarker for more accurate prognosis of a glioma patient’s outcome.


Introduction
Gliomas are the most common and lethal central nervous system (CNS) tumors with unfavorable patient survival prognosis. The incidence rate of glioma patients is about 7.3 per 100,000 persons a year, which is higher in men than in women (8.7 cases and 5.9 cases, respectively). The peak of incidence is reached at the age of ≥65 years [1]. Among glioma tumors, diffuse glioma is the most common type, according to WHO classification 2016, graded as WHO grade II (diffuse), WHO grade III (anaplastic) and WHO grade IV (glioblastoma) [2,3]. Glioblastoma (GBM) is a very aggressive and malignant tumor, which exhibits differentiated and undifferentiated cells with multiple genetic alterations and variations [4,5]. Because of the molecular heterogeneity, it is difficult to determine the type and behavior of the tumor since tumor cells spread quickly and infiltrate into the surrounding tissues. It is noteworthy that some of the tumor cells may be resistant to therapy, and drugs used for treatments can lead to side effects [5]. Despite the complex treatment (surgery, chemotherapy, and radiation therapy), the survival time of patients is generally short (approximately 14 months after initial diagnosis) [6].
Initially, the expression of SEMA3(A-G), NRP1, NRP2, PLXNA2, PLXND1, CDH1, CDH2, ITGB1, ITGB3, ITGA5, ITGAV, KDR, and VEGFA genes at mRNA level were compared between astrocytoma and healthy human brain (control) samples. As shown in Figure 1, the mRNA levels of all analyzed genes were altered in tumor samples, compared to control (zero point), indicating their importance in gliomagenesis process. The highest fold change was observed in expression of SEMA3E and ITGB3 (the medians of mRNA levels of SEMA3E and ITGB3: -2.78 and 2.41, respectively). In contrast, the expression of PLXND1 and CDH2 genes was slightly dispersed around the zero point (the median of mRNA levels of PLXND1 and CDH2: −0.4 and 0.01, respectively). The Mann-Whitney U-Test was used to analyze whether expression of analyzed genes were associated with patient gender, age at the time of surgery, malignancy grade of astrocytoma, IDH mutation and MGMT gene methylation status (Table 1).

Construction of the Multi-Biomarker for Better Patient Survival Prediction
To evaluate the relationship between patient clinicopathological factors (gender, age until operation, glioma malignancy grade, methylation of MGMT, and mutation status of IDH), and patient overall survival (OS), as well as to determine survival-related genes, an univariate Cox proportional hazard regression analysis was performed on expression data of 19 SEMA3 signaling genes. As shown in Table 2, patient age, tumor grade, status of IDH mutation and expression of seven genes (SEMA3A, SEMA3D, SEMA3F, SEMA3G, ITGB3, ITGA5, and VEGFA) significantly correlated with patient OS (p < 0.05). Younger patient group and lower tumor malignancy grade was associated with longer OS (p < 0.01). Better OS prognosis was also related to mutation of IDH, lower expression levels of SEMA3A, SEMA3F, ITGB3, ITGA5, and VEGFA or higher expression of SEMA3D and SEMA3G genes (p < 0.05).
Based on the results of univariate Cox regression analysis, seven genes, significantly associated with patient OS, were selected to construct the risk signature. The risk-score formula was constructed as follows (described in "Statistical analysis" section): risk score . Among them, high levels of SEMA3A, SEMA3F, ITGB3, ITGA5, and VEGFA genes were defined as risky factors (HR > 1), while high levels of SEMA3D and SEMA3G genes-were defined as protective factors (HR < 1).
According to median value of the risk score (2.555), all 59 patients were divided into low-risk (≤ 2.555, n = 29) and high-risk (> 2.555, n = 30) groups. The Kaplan-Meier analysis with log-rank test was used to compare overall survival between these two patient groups. As shown in Figure 4A, patients in high-risk group had poor prognosis compared to low-risk group (χ 2 = 24.7, p < 0.001). Among these two groups, the higher expression levels of SEMA3G and SEMA3D genes were noticed in the low-risk group, while SEMA3A, SEMA3F, ITGB3, ITGA5, and VEGFA-were noticed in the high-risk group ( Figure 4B). The dot-plot graph was used to represent the distribution of patient status (alive or dead) with the diagnosis of different malignancy astrocytoma in relationship with patient survival time and risk score ( Figure 4C). The correlation analysis showed that lower risk-score was significantly associated with better patient prognosis-longer survival time and low tumor malignancy (r = −0.66, p < 0.001). In the high-risk group 13.56% (8/59) of patients died within 1-year and 37.29% (22/59)-within 2-year period, while in the low-risk group 8,47% 5/59 of patients died within 2-years. Furthermore, Mann-Whitney and Kruskal-Wallis tests revealed significant multi-biomarker risk-score associations with astrocytoma pathological grade (p < 0.001), patient disease appearance age (≤ 50 and >50 years, p < 0.001), IDH status (wild-type and mutation, p < 0.001), and MGMT status (methylated and unmethylated, p = 0.027), but not gender (p = 0.067) ( Table 1). The diagnostic values of the multi-biomarker of selected seven genes for 1-, 2-, and 3-year survival prediction were confirmed with the receiver operating characteristic (ROC) curve analysis ( Figure 4D). The calculated areas under the ROC curves (AUC) for 1-, 2-, and 3-year survival were 0.81 (95% CI: 0.685−0.937), 0.85 (95% CI: 0.74−0.95), and 0.86 (95% CI: 0.76−0.952), respectively (p < 0.001). The accuracy of 1-, 2-, and 3-year survival prediction were: 64%, 75%, and 68%, respectively. In addition, to analyze patient survival with respect to several factors simultaneously, the clinical characteristics and genes from univariate Cox regression with significance p < 0.05 were sorted out and multivariate Cox regression analysis performed. The analysis revealed that the expression of 3 genes (Sema3A, Sema3D, and ITHB3) and IDH status are independent indicators increasing the risk of patient death. Wild-type IDH type increased event risk by 5.54-fold (95% CI: 1.46-21.023; p = 0.012), while higher mRNA expression of SEMA3A and ITGB3, but lower mRNA expression of SEMA3E increased event risk by 1.36-, 1.27-, and 0.88-fold, respectively; p < 0.05) ( Table 2). However, after the multi-biomarker construction using significant variables from multivariate Cox regression test, the patient survival analysis between high-and low-risk patient groups showed no significance (χ 2 = 0.76, p = 0.383) and the accuracy of the model for 1-, 2-and 3-year survival was low (51%, 47% and 47%, respectively, data not-shown). Therefore, the first combination of the multi-biomarker with greater accuracy was used for further analysis. After demonstrating the survival predicting power of the created signature, the multivariate Cox regression analysis with backwards conditional was performed to clarify the prognostic value in accordance to other clinical features that were significantly associated with patient survival in univariate Cox regression ( Table 2). The analysis revealed that the signature remained to be significantly associated with OS when adjusting other clinical factor including age, tumor grade and IDH status (HR = 1.27, 95% CI: 1.06-1.54, p = 0.012). Importantly, although patient age also showed significant predictive power, the effect was lower compared to the effect of the multi-biomarker (HR = 1.04, 95% CI: 1.00-1.08, p = 0.031; Table 2, Multivariate*).

Validation of the Survival Prediction Model
The seven-gene signature was further validated in The Cancer Genome Atlas GBM-LGG dataset via the GlioVis portal [25], including 276 different malignancy grade glioma patients. In this cohort, the risk-score was calculated as described in the multi-biomarker construction section. Mann-Whitney U-test revealed that patients with lower risk-score had methylated MGMT (p < 0.001), mutant IDH status (p < 0.001) and were in a younger patient group (≤ 50-year, p < 0.001). However, gender showed no significant associations (p = 0.714). According to Kruskal Wallis test, higher tumor grade was significantly associated with higher risk-score (p < 0.001) (data not-shown). Furthermore, the Cox regression analysis showed that the multi-biomarker was significantly associated with glioma patient OS (HR 4.76, 95% CI: 3.199-7.087, p < 0.001). ROC analysis confirmed that the multi-biomarker had a good accuracy of predicting 1-

Discussion
Gliomas are rare primary brain tumors (7.3/100,000 cases); nevertheless, very unfavorable functional prognoses and high mortality rates indicate that substantial efforts are necessary to mitigate adverse outcomes of the disease [1]. The 5-year survival rate for the primary CNS cancers is very low (17% for males and 19% for females) [26]. Moreover, glioblastoma, and lower malignancy grade glioma tumor cells have multiple genetic alterations and variations which makes it difficult to precisely determine the type and behavior of the glioma tumor, as well as to choose the most benefiting treatment. Therefore, the newest update of the WHO classification system included several molecular biomarkers for better tumor characterization [2]. In terms of astrocytoma, the updated classification allowed to classify astrocytoma tumors more accurately, as well as, to predict the outcome of patients, and to select treatment strategy, according to the MGMT methylation or IDH status. However, since astrocytomas exhibit very high inter-and intra-heterogeneity and therefore, the disease course is poorly predictable, the search for novel molecules specifically associated with gliomagenesis still remains an important task for more accurate disease prognosis. Whereas a single marker cannot completely identify the type of glial tumor, as it is in the case of the prostate (PSA gene) [27] or breast cancer (BRCA gene) [28,29], it is crucial to attempt creation of a multi-biomarker with higher predictive value than a single-biomarker, which provides composite biological and therapeutic information about the tumor.
Recently, several studies have shown that the analysis of gene expression profile could detect gene signatures to predict the outcome of patients with glioma tumors. For example, six protein-coding genes (PCG) and five lncRNAs were screened out by a risk score model and a PCG-lncRNA signature was formed that which predicted survival and TMZ-chemoradiation response in GBM patients [30]. Other research groups identified novel biomarkers that have a potential in the outcome prediction of GBM [31][32][33] and which enable to classify patients into high-and low-risk groups based on expression level analysis of the survival relevant genes [34]. Moreover, the novel signature composed of five genes (DES, RANBP17, CLEC5A, HOXC11, and POSTN) was significantly associated with the 1-, 3-, and 5-year survival of GBM patients, IDH status, MGMT methylation status, and radio-chemotherapy [35]. However, the clinical applications of these genomic profiles and their molecular mechanisms have not been fully revealed. Considering the poor prognosis of glioma tumors, novel molecular biomarkers are still needed to reveal the mechanisms of gliomagenesis process and to improve patient survival.
The studies of SEMA3 family protein function in glioma tumors are not new in our laboratory. In previous studies, we found that accumulated high SEMA3C protein levels, which did not correlate with mRNA levels, were related to the clinicopathological characteristics of astrocytoma patients [36]. By continuing the previous theme, in our present study, we used different grade astrocytic origin glioma tumor samples to identify SEMA3 signaling pathway genes, significantly associated with gliomagenesis process and patient OS. The prognostic value of SEMA3 signaling pathway members was shown in several studies. Patients with high-VEGF/low-SEMA3 signature had poor prognoses in breast [37] and prostate cancer [38]. Moreover, in the study by Karyian-Tapon and co-authors, higher mRNA expression of SEMA3B, SEMA3G and NRP2 were related to prolonged survival of patients with the diagnosis of glial tumors [39]. In contrast, high VEGF expression correlated with higher grade of tumors and poor survival prognosis. However, according to the multivariate Cox analysis, SEMA3G and patient age were the only significant prognostic markers associated with a better OS prognosis [39]. In agreement with this study, we also noticed similar tendencies of SEMA3B, SEMA3G, SEMA3D and VEGFA mRNA level associations with patient survival and tumor malignancy grade. Importantly, our study revealed that expression of other SEMA3 pathway members (SEMA3F, NRP1, PLXNA2, ITGB3, and ITGA5) were significantly related to unfavorable patient outcomes and may have promising diagnostic value as well ( Table 2). The differences between our study and the study by Karyian-Tapon might be because they used different type of tumor specimens (GBM, oligoastrocytoma, and oligodendroglioma) as one study cohort [39], while our study consisted only of astrocytic origin tumors.
To discover a more accurate model for astrocytoma patient survival prognosis, the multi-biomarker was constructed. Since the multivariate Cox regression analysis revealed that the expression of three genes (Sema3A, Sema3D, and ITHB3) and IDH status are independent factors and after the multi-biomarker creation using these variables, the survival analysis between high-and low-risk patient groups showed no significance (χ 2 = 0.76, p = 0.383) and the accuracy of the signature for 1-, 2-and 3-year survival was low (51%, 47%, and 47%, respectively); it was chosen not to follow conventional data analysis steps. The non-standard path for target selection led to the design of the combination of SEMA3 signaling pathways targets for relatively high patient survival accuracy prediction. Therefore, seven genes significantly associated with patient OS of univariate Cox regression analysis were screened out, namely SEMA3A, SEMA3D, SEMA3F, SEMA3G, ITGB3, ITGA5, and VEGFA. These seven genes were used to construct an independent prognostic multi-biomarker, predicting 1-, 2-and 3-year survival rates (64%, 75% and 68% of accuracy, respectively) of patients with the diagnosis of various malignancies, not for a specific one. To assess the reliability of the model, the novel multi-biomarker was validated in an independent cohort from The Cancer Genome Atlas (TCGA) database via the GlioVis portal (https://gliovis.bioinfo.cnio.es/, accessed August 2019) [25] and similar results were obtained.
By analyzing patient survival in relation to clinical and pathological variables, the results revealed that age, GBM status and IDH mutation are good predictors of patient survival [12]. However, they have some limitations. As shown in Figure 4C, in some cases, short survival is observed at a young age (39-year patient survived only 14 months) and GBM patients with poor prognosis survive more than 2-or 3-years. IDH status is also useful in predicting the OS of patients diagnosed with GBM, and in the selection of appropriate therapeutic strategies [40,41], however, the majority of grade II and III gliomas have a mutant-type of IDH (Figure 2), while survival times vary between these patients ( Figure 4C). These exceptions appear due to the heterogeneity of gliomas, which hinders the accurate identification of tumor malignancies. More importantly, the predicting power of the newly created signature was the strongest compared to other clinical characteristics ( Table 2, Multivariate*), indicating that the multi-biomarker has a great prognostic value in accordance to other clinical factors and is important for the current clinical management of glioma.
The seven genes of the signature play an important role in molecular pathogenesis of glioma tumors by regulating tumor cell migration, proliferation, invasion, as well as tumor angiogenesis, and immune response [42]. It was demonstrated that expression of SEMA3A, SEMA3D, and SEMA3F proteins significantly inhibited angiogenesis process and the formation of subcutaneous tumors derived from glioblastoma cells U87MG [43]. The overexpression of SEMA3G in U251MG cells inhibited tumor cell migration and invasion in vitro [44], and suppressed angiogenesis process, when SEMA3G overexpressing U87MG cells were implanted in the brain cortex of mice [43]. Integrins ITGA5 and ITGB3 participate in angiogenesis process by regulating endothelial cell migration and survival [45]. It was showed that ITGA5 and ITGB3 are expressed in glioma vessels and tumor cells and the gene expression is significantly associated with the malignancy grade of the tumor and poor patient prognosis [34,46]. Glioblastoma tumors are highly vascularized, therefore the expression of pro-angiogenic factor VEGFA is increased [39]. Importantly, glioma stem cells also produce VEGFA protein which is carried in extracellular vesicles and induce the formation of blood vessels by targeting brain endothelial cells [47]. In summary, all seven genes of the multi-biomarker participate in the regulation of angiogenesis process, which is a key mediator of the tumor microenvironment development, especially of highly vascularized glioma tumors. Therefore, these molecules are of great importance of controlling the process of gliomagenesis.
The findings of this study should be seen in the light of some limitations. One of the limitations is that the increase in the experiment-wise error rate across the reported statistical analyses was not controlled. The correction for multiple testing was not applied since gliomas are known as extremely heterogeneous tumors, consequently, the significant statistical differences could be reduced. Therefore, markers with even a "small" impact are very important to indicate a specific subclass of tumors in an overall combination of signature. Second, the low amount of astrocytoma specimens of training data may result in overfitting that could be resolved by increasing the sample size in biomarker training cohort. Third, due to the relatively small number of tumor samples, it was difficult to investigate the multi-biomarker in the diverse subgroups of gliomas. Therefore, the cohort of tumor samples should be expanded, and follow-up studies performed, to validate the applicability of the biomarker to the prognosis of glioma patients. Overall, we consider this research relatively preliminary and should be critically evaluated.

Data of Patients
The study consisted of 59 patients diagnosed with astrocytic origin glioma tumors of grade II-IV. The malignancy grade was determined according to WHO classification 2016 [2,3]. All patients were operated on at the Department of Neurosurgery, Hospital of Lithuanian University of Health Sciences from April 2015 to April 2018. The written patient consent and permission from Kaunas Regional Biomedical Research Ethics Committee was taken. The clinical data (gender, age at the time of operation, tumor grade) were collected for each patient. The overall survival of the patient was calculated from the date of tumor resection to the date of patient death or database closure (15 March 2019). After surgical resection, tumor samples were frozen in liquid nitrogen. IDH mutation (the R132H mutation in the IDH1 gene) and MGMT methylation status of the tumor samples were confirmed by the pathologists. After mRNA expression analysis of SEMA3 pathway members in tumor tissue specimens, genes, significantly associated with patient survival were used to construct a multi-biomarker, which was validated in an independent cohort, consisting of 276 different malignancy grade glioma tumors. The data (expression of SEMA3 pathway genes and housekeeping (ACTB and GAPDH) genes) were obtained from TCGA GBM-LGG dataset via the GlioVis portal [25].
The clinical characteristics of patients in the training and validation cohorts are shown in Table 3.  Postoperative astrocytic origin glioma tumors were used for training set and mRNA expression data from The Cancer Genome Atlas GBM-LGG dataset was used for validation set. GBM-glioblastoma, Wt-wild-type, Mut-mutation, M-methylated, U-unmethylated.

Total RNA Extraction and cDNA Synthesis
Total RNA was extracted from 59 frozen tumor tissue specimens using mirVana miRNA Isolation Kit (Life Technologies, USA), according to the manufacturer's instructions. RNA yield and purity were evaluated with spectrophotometer Nanodrop 2000 (Eppendorf, USA). High Capacity cDNA Reverse Transcription Kit (Applied Biosystems, USA) was used for cDNA synthesis. All samples were stored at −80 • C until gene mRNA expression analysis.

Quantitative Real-Time PCR
Quantitative real-time PCR (qRT-PCR) with SYBR green fluorescent dye was performed to analyze mRNA expression of semaphorins SEMA3(A-G), neuropilins (NRP1 and NRP2), plexins (PLXNA2 and PLXND1), cadherins (CDH1 (E-cadherin) and CDH2 (N-cadherin)), integrins (ITGB1, ITGB3, ITGA5 and ITGAV), VEGFA and VEGF receptor (KDR) genes in 59 II-IV malignancy grade gliomas and healthy human brain RNA samples "FirstChoice Human Brain Reference RNA" (Ambion, Life Technologies, USA). Housekeeping genes ACTB and GAPDH were used as an endogenous control. All primer sequences of analyzed genes, primer melting temperatures and amplicon lengths are shown in Supplementary Table S2. The qRT-PCR was carried out in a total reaction volume of 12 µL: 6 µL of Power SYBR Green PCR Master Mix dye (ThermoFisher Scientific, USA), 15 ng of cDNA sample, and 0.2 µM of each primer and nuclease-free water. Amplification was executed with a Real-Time PCR System "Applied Biosystems 7500 Fast" (Applied Biosystems, USA). The cycling of qRT-PCR was as follows: 10 min at 95 • C for polymerase activation; cycling of 40 cycles: denaturation at 95 • C for 15 s, annealing 58-61 • C (depending on the gene analyzed) for 30 s and elongation at 72 • C. Melting curve was generated for each run. Comparative 2 -∆∆Ct method was used to evaluate analyzed gene expression in tumor samples compared to healthy brain tissue.

Statistical Analysis
GraphPad Prism (version 7.0, Graph-Pad Software, Inc., San Diego, CA, USA) and SPSS software (version 25.0, IBM SPSS, Armonk, NY, USA) were used to evaluate the experimental results. The Mann-Whitney U-test was applied to evaluate the statistical significance of gene mRNA expression changes in groups of patient gender, age, IDH mutation, or MGMT gene methylation status. Differences in gene mRNA expression between malignancy grade of the tumor were analyzed by Kruskal-Wallis Test. The survival time distribution between gene expression groups was assessed with Kaplan-Meier analysis and Log-rank test. Pearson's correlation coefficient (r) was used to evaluate correlations between gene expression levels. Univariate and multivariate (with backward conditional method) Cox proportional hazards regression analyses were applied to assess the relationships between clinical and molecular variables, and patient's overall survival. According to the output of univariate Cox regression analysis, the risk score evaluation formula was composed to create a multi-biomarker for patient survival prognosis: Risk score = (coef. β gene 1 × expr. gene 1) + (coef. β gene 2 × expr. gene 2) + (coef. β gene n × expr. gene n) [35], where the regression coefficient (β) and the expression level (expr) of the gene was derived from the univariate Cox regression model. Time-dependent receiver operating characteristic (ROC) curve analysis was applied to assess the prognostic value of the multi-biomarker for 1-, 2-and 3-year patient survival. Areas under the curves (AUC) were calculated along with 95% confidence interval (CI) and higher AUC was associated with a better model of the risk prediction [48]. The accuracy was calculated as follows: Accuracy = (TN + TP)/(TN + TP + FN + FP) = (Number of correct assessments)/Number of all assessments), where TP-true positive, TN-true negative, FN-false negative, and FP-false positive [49]. The level of significance was p < 0.05.

Conclusions
In conclusion, the analysis of mRNA expression of SEMA3 signaling pathway members provided new insights into the pathogenesis and prognosis of glioma tumors. The constructed biomarker of seven genes may have a promising practical value to predict the 1-, 2-, and 3-year patient survival. However, more detailed experiments are needed to confirm the prognostic value of multi-biomarker in glioma patients and to examine how this multi-biomarker is associated with the clinical treatment responses.
Author Contributions: A.K., and I.V., generated an idea and design of the research; A.T., gathered postoperative astrocytoma tissue samples and patient clinical data; I.V., prepared samples for gene expression analysis, designed primers for qRT-PCR; I.V., and D.K., performed gene expression analysis by qRT-PCR in tumor specimens; G.S., analyzed data from TCGA database and performed statistical analysis; I.V., G.S., and A.K., were major contributors in writing the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the Study Fund of the Lithuanian University of Health Sciences.

Conflicts of Interest:
The authors declare no conflict of interest.