Predictive Analyses of Prognostic-Related Immune Genes and Immune Infiltrates for Glioblastoma

Glioblastoma (GBM), the most common and aggressive brain tumor, has a very poor outcome and high tumor recurrence rate. The immune system has positive interactions with the central nervous system. Despite many studies investigating immune prognostic factors, there is no effective model to identify predictive biomarkers for GBM. Genomic data and clinical characteristic information of patients with GBM were evaluated by Kaplan–Meier analysis and proportional hazard modeling. Deseq2 software was used for differential expression analysis. Immune-related genes from ImmPort Shared Data and the Cistrome Project were evaluated. The model performance was determined based on the area under the receiver operating characteristic (ROC) curve. CIBERSORT was used to assess the infiltration of immune cells. The results of differential expression analyses showed a significant difference in the expression levels of 2942 genes, comprising 1338 upregulated genes and 1604 downregulated genes (p < 0.05). A population of 24 immune-related genes that predicted GBM patient survival was identified. A risk score model established on the basis of the expressions of the 24 immune-related genes was used to evaluate a favorable outcome of GBM. Further validation using the ROC curve confirmed the model was an independent predictor of GBM (AUC = 0.869). In the GBM microenvironment, eosinophils, macrophages, activated NK cells, and follicular helper T cells were associated with prognostic risk. Our study confirmed the importance of immune-related genes and immune infiltrates in predicting GBM patient prognosis.


Introduction
Glioma is the most common cancer in the central nervous system (CNS), and about half of patients present with a highly aggressive form of glioblastoma (GBM). GBM has a poor prognosis despite maximal surgical resection and subsequent chemoradiation. To the best of our knowledge, most GBM patients will relapse after first-line treatment; the 1-year survival rate is approximately 25% and the mean 5-year survival rate is less than 3% [1,2]. Currently, pathological studies are inadequate in predicting the prognosis of patients. With the development of tumor molecular genetics, classification at the gene level can more accurately reflect disease outcome compared with histological classification [3]. For example, IDH (Isocitrate Dehydrogenase) mutation has been included in the clinical diagnostic criteria of gliomas, which contributes to the prediction of GBM prognosis. Nevertheless, there is still an urgent demand for the development of new strategies to effectively assess the GBM prognosis [4].
Intrinsic genes of tumor cells, especially master transcription factors, dominate the initiation and progression of GBM [5,6]. Additionally, the tumor microenvironment contains infiltrating immune cells that also have a significant influence on tumor gene expression, which affect clinical outcomes [7,8]. The GBM microenvironment is generally immunosuppressive and contains infiltrating immune

Raw Data Preparation
The clinical data and transcriptome files of GBM HTSeq-Counts and HTSeq-FPKM were downloaded from the TCGA official website (available online: https://portal.gdc.cancer.gov/, accessed on 19 March 2020). The mRNA data from GENECODE (available online: https://www.gencodegenes. org/human/, accessed on 19 March 2020) was extracted for differential analysis through R language, while logFC (log fold change) >2 and FDR (false discover rate) <0.05 were used as screening conditions. The normalization value of fragments per kilobase million (FPKM) was presented as a transcripts per million (TPM) value.

Screening of Prognostic-Related Immune Genes
The immune-related gene list was downloaded from the ImmPort website (available online: https://www.immport.org/shared/home, accessed on 19 March 2020). With the integration with clinical data, prognostic immune genes were screened with single-factor Cox analysis. Here, p values <0.05 were considered as the prognostic-related immune gene, hazard ratio >1 was considered as the prognostic pathogenic gene, and 0< hazard ratio <1 was considered as the prognostic protection gene. The results were shown in a forest plot.

Construction of Prognostic Immune Genes and Immune Gene Transcription Factor Regulatory Networks
The immune gene transcription factors were downloaded from the Cistrome website (available online: http://www.cistrome.org/, accessed on 19 March 2020). The correlation test between prognostic immune genes and immune gene transcription factors was performed. Correlation coefficient >0.4 was regarded as positively correlated, while < −0.4 was regarded as negatively correlated. Cytoscape software was used to construct the regulatory network, and p values <0.05 were considered statistically different.

Establishment of Immune Gene Model
In our work, the immune gene model was built using the TCGA dataset (test set n = 132), while the CGGA (Chinese Glioma Genome Atlas) dataset was used for validation (n = 220). Then, univariate Cox analysis was used to filtrate genes affecting overall survival of patients (p < 0.05), followed by multivariate Cox analysis to identify genes as independent prognostic indicators. Subsequently, based on the expression level of each gene and the coefficient obtained from multivariate Cox analysis were used to conduct a risk score. The algorithm is shown below.

Survival Analysis
Survival analysis was performed on patients in the high-risk score and low-risk score groups, and p values < 0.05 were considered statistically different.

Receiver Operating Characteristic (ROC) Curve
To evaluate the sensitivity and specificity of the model, the ROC curve was drawn, while the area under curve (AUC) value was calculated to assess the model, with values of 0.5-0.7 indicating moderate, 0.7-0.9 indicating better, and >0.9 indicating superior values.

Risk Curve
To exhibit the results of the model construction, patients were ranked according to the risk score and survival time, and the heat map was drawn to show the trend of gene expression required for the model.

Independent Prognostic Analysis
To assess whether the model can be used as an independent factor to predict the prognosis of patients, univariate and multivariate independent prognostic analyses were performed. Age and gender were comparative factors, and age comparison was separated by 50 years.

Clinical Correlation Analysis
To evaluate differences of prognostic immune genes and risk scores in clinical characteristics, p values >0.05 were considered to be related to corresponding clinical features.

Correlation Analysis of Immune Cell Infiltration and Risk Score
The limma package was used to revise the original expression matrix, then CIBERSORT was used to predict the immune cell composition. Here, p values <0.05 were considered to be highly accurate and were used for subsequent analysis, including the correlation between the content of immune cells and the risk score, percentage of immune cells, and correlation between immune cells.

Identification of Prognostic-Related Immune Genes
We collected a total of 161 samples from the TCGA official website, including 5 normal samples and 156 GBM samples. The vst function was used to normalize the read counts and a difference analysis was performed, then logFC was corrected by IfcShrink (Figure 1a,b). Next, we performed principal component analysis (PCA) on the samples (Figure 1c). Consequently, we obtained a total of 2942 differential expressed genes, of which 1338 genes were up-regulated and 1604 genes were down-regulated (Figure 1d,g).
Then, we downloaded 2499 immune-related genes from the ImmPort database. After the intersection with 2942 differential expressed genes, a total of 291 genes were obtained, of which 183 genes were up-regulated and 108 genes were down-regulated (Figure 1e,h). After screening these 291 immune genes, 24 immune genes related to prognosis were collected (Figure 2a).

Analysis of Regulatory Network of Immune Transcription Factors and Prognostic-Related Immune Genes
We also downloaded 318 immune gene transcription factors from Cistrome. After intersection with differentially expressed genes, a total of 41 immune-related transcription factor genes were obtained, of which 32 were up-regulated genes and 9 were down-regulated genes (Figure 1f, 1i). Then, by calculating its correlation with 24 prognostic immune genes, we constructed a regulatory network, with BATF (Basic Leucine Zipper ATF-Like Transcription Factor), SNAI2 (Snail Family Transcriptional Repressor 2), GATA4 (GATA Binding Protein 4), HOXB13 (Homeobox B13), RUNX1 (Family Transcription Factor 1), RUNX1T1 (RUNX1 Partner Transcriptional Co-Repressor 1), and WWTR1 (WW Domain Containing Transcription Regulator 1)found to have certain regulatory effects on related genes ( Figure 2b, Table 1).

Analysis of Regulatory Network of Immune Transcription Factors and Prognostic-Related Immune Genes
We also downloaded 318 immune gene transcription factors from Cistrome. After intersection with differentially expressed genes, a total of 41 immune-related transcription factor genes were obtained, of which 32 were up-regulated genes and 9 were down-regulated genes (Figure 1f,i). Then, by calculating its correlation with 24 prognostic immune genes, we constructed a regulatory network, with BATF (Basic Leucine Zipper ATF-Like Transcription Factor), SNAI2 (Snail Family Transcriptional Repressor 2), GATA4 (GATA Binding Protein 4), HOXB13 (Homeobox B13), RUNX1 (Family Transcription Factor 1), RUNX1T1 (RUNX1 Partner Transcriptional Co-Repressor 1), and WWTR1 (WW Domain Containing Transcription Regulator 1)found to have certain regulatory effects on related genes ( Figure 2b, Table 1).

Validation of Prognosis Prediction Model and Survival Analysis
We modeled 24 prognostic-related immune genes, from which 9 genes were chosen for modeling ( Table 2). To evaluate the sensitivity and specificity of the model, we drew an ROC curve, for which the area under the curve (AUC) was 0.869, which showed the relative authenticity of the model (Figure 3a). The Kaplan-Meier survival curve showed statistically significant differences between high-and low-risk patients (p < 0.001). Notably, patients in the low-risk group had significantly longer survival time (Figure 3b). The survival rate corresponding to each time point is shown in Table 3. The validation results were similar in survival analysis (p < 0.05, more details in Figure S10).  (Figure 3a). The Kaplan-Meier survival curve showed statistically significant differences between high-and low-risk patients (p < 0.001). Notably, patients in the low-risk group had significantly longer survival time (Figure 3b). The survival rate corresponding to each time point is shown in Table 3. The validation results were similar in survival analysis (p < 0.05, more details in Figure S10).

Risk Curve Analysis
According to the risk curve, it is obvious that the survival of low-risk patients is relatively longer than that of high-risk group (Figure 4a,b). The amount of each gene expression needed to construct the model is shown in the heat map. With the rise of the risk score, the expression of FABP5 (Fatty Acid Binding Protein 5), BMP1 (Bone Morphogenetic Protein 1), and OSMR (Oncostatin M Receptor) gradually increased, while the expression of CCL1 (C-C Motif Chemokine Ligand 1) and LPA (Lipoprotein(A)) decreased (Figure 4c).

Risk Curve Analysis
According to the risk curve, it is obvious that the survival of low-risk patients is relatively longer than that of high-risk group (Figure 4a, 4b). The amount of each gene expression needed to construct the model is shown in the heat map. With the rise of the risk score, the expression of FABP5 (Fatty Acid Binding Protein 5), BMP1 (Bone Morphogenetic Protein 1), and OSMR (Oncostatin M Receptor) gradually increased, while the expression of CCL1 (C-C Motif Chemokine Ligand 1) and LPA (Lipoprotein(A)) decreased (Figure 4c).

Univariate and Multivariate Independent Prognostic Analysis
To study the prognosis-related factors, we performed univariate and multivariate independent prognostic analyses on the model. It can be seen that the univariate (Figure 5a) and multivariate (Figure 5b) analysis results are consistent. Both age and risk score can be taken as independent predictors, while gender cannot. To study the prognosis-related factors, we performed univariate and multivariate independent prognostic analyses on the model. It can be seen that the univariate (Figure 5a) and multivariate (Figure 5b) analysis results are consistent. Both age and risk score can be taken as independent predictors, while gender cannot.

Clinical Relevance Verification
We also studied the correlation of immune genes and risk scores with clinical characteristics. We found that BMP1 and OSMR scores for all GBM patients were closely related to age, and that the

Clinical Relevance Verification
We also studied the correlation of immune genes and risk scores with clinical characteristics. We found that BMP1 and OSMR scores for all GBM patients were closely related to age, and that the expression of these two genes was significantly higher at >50 years old (Figure 6a,b). Meanwhile, CCL1 score was related to gender, the expression of which was higher in male patients than females (Figure 6c). Patients over 50 had higher risk scores, while patients under 50 had lower risk scores (Figure 6d). The remaining genes were not found to be related to clinical characteristics.

Clinical Relevance Verification
We also studied the correlation of immune genes and risk scores with clinical characteristics. We found that BMP1 and OSMR scores for all GBM patients were closely related to age, and that the expression of these two genes was significantly higher at >50 years old (Figure 6a, 6b). Meanwhile, CCL1 score was related to gender, the expression of which was higher in male patients than females (Figure 6c). Patients over 50 had higher risk scores, while patients under 50 had lower risk scores (Figure 6d). The remaining genes were not found to be related to clinical characteristics. Figure 6. The expression of BMP1 (Bone Morphogenetic Protein 1) and OSMR (Oncostatin M Receptor) was related to age (a,b), the expression of CCL1 (C-C Motif Chemokine Ligand 1) was related to the gender (c), and the risk score was obviously related to age (50 years old as the age cut-off) (p < 0.05) (d). Figure 6. The expression of BMP1 (Bone Morphogenetic Protein 1) and OSMR (Oncostatin M Receptor) was related to age (a,b), the expression of CCL1 (C-C Motif Chemokine Ligand 1) was related to the gender (c), and the risk score was obviously related to age (50 years old as the age cut-off) (p < 0.05) (d).

Correlation between Immune Cell Infiltration and Risk Score
We evaluated the impacts of different immune infiltrates on GBM clinical outcomes, and the infiltration of eosinophils, macrophages, activated NK cells, and T cell follicular helper was found to be associated with prognostic risk. Among them, M0 macrophages and activated NK cells were positively correlated with risk (Figure 7b,c), while eosinophils and T cell follicular helpers were negatively correlated (Figure 7a,d). No statistical significance was found between other infiltrates and the risk score.
We also screened the samples and obtained a total of 20 samples that could be well analyzed. Compared to other cells, the proportions of T cell follicular helper cells, monocytes, and M0 macrophages were relative higher (Figure 8a,b).
Meanwhile, we assessed the relevance between types of immune cells. Significantly, the relationships between CD4 naïve T cells and regulatory T cells, activated regulatory T cells and NK cells, and activated CD4 naïve T cells and NK cells were highly relevant (correlation coefficient > 0.9). Regarding the relationships between T cells and gamma delta monocytes, CD4 naïve T cells and M0 macrophages, and activated dendritic cells and resting mast cells, these three groups of cells were secondarily related (0.7< correlation coefficient <0.8) (Figure 9). Meanwhile, we assessed the relevance between types of immune cells. Significantly, the relationships between CD4 naïve T cells and regulatory T cells, activated regulatory T cells and NK cells, and activated CD4 naïve T cells and NK cells were highly relevant (correlation coefficient > 0.9). Regarding the relationships between T cells and gamma delta monocytes, CD4 naïve T cells and M0 macrophages, and activated dendritic cells and resting mast cells, these three groups of cells were secondarily related (0.7< correlation coefficient <0.8) (Figure 9).

Discussion
Currently, the prognosis of GBM is very poor because of numerous complicating factors, such as degree of surgical resection, drug resistance, blood-brain barrier permeability, and radiotherapy dose selection. Even with the recent advent of tumor immunotherapy, the anti-tumor immune response is quite limited because of the difficulty of lymphocyte infiltration into the GBM microenvironment. It was reported that tumor intrinsic factors were associated with GBM microenvironment immunosuppression by inducing immune suppressive signaling pathways [10,11]. Recent molecular research has demonstrated that immune infiltration can promote and regulate tumor progression via direct interactions, although their roles in tumor origination and patient prognosis are still poorly understood. Therefore, we focused on the gene expressions of immune infiltrates and their relevance to clinical prognosis.
During the progression of GBM, some abnormally expressed genes, including immune genes, are closely related to the clinical prognosis. Therefore, it is important to determine how to identify these genes to predict the prognosis of GBM. For this, we screened prognostic immune genes and transcription factors from relevant databases to construct a regulatory network (Figure 2b). It was reported that the BATF family, which belongs to a class of transcription factors containing a basic leucine zipper domain, regulates a variety of immune functions and controls the development and

Discussion
Currently, the prognosis of GBM is very poor because of numerous complicating factors, such as degree of surgical resection, drug resistance, blood-brain barrier permeability, and radiotherapy dose selection. Even with the recent advent of tumor immunotherapy, the anti-tumor immune response is quite limited because of the difficulty of lymphocyte infiltration into the GBM microenvironment. It was reported that tumor intrinsic factors were associated with GBM microenvironment immunosuppression by inducing immune suppressive signaling pathways [10,11]. Recent molecular research has demonstrated that immune infiltration can promote and regulate tumor progression via direct interactions, although their roles in tumor origination and patient prognosis are still poorly understood. Therefore, we focused on the gene expressions of immune infiltrates and their relevance to clinical prognosis.
During the progression of GBM, some abnormally expressed genes, including immune genes, are closely related to the clinical prognosis. Therefore, it is important to determine how to identify these genes to predict the prognosis of GBM. For this, we screened prognostic immune genes and transcription factors from relevant databases to construct a regulatory network (Figure 2b). It was reported that the BATF family, which belongs to a class of transcription factors containing a basic leucine zipper domain, regulates a variety of immune functions and controls the development and differentiation of immune cells [12]. Compared with other transcription factors, we found that BATF had the strongest positive regulation on related immune genes.
After the optimization of 24 immune genes, we found that nine genes were suitable for establishing a prognostic prediction model. The AUC value of the ROC curve was 0.869, which indicated the reliability of the model. When the risk score was increased, FABP5, BMP1, OSMR2, CCL1, and LPA gene expressions also changed. In short-term survivors of GBM (≤6 months), high levels of FABP5 protein were expressed, which was associated with highly proliferating tumor cells and the activation of the v-akt murine thymoma viral oncogene homolog and 3-phosphoinositide-dependent protein kinase-1 [13]. It was previously reported that BMP1 was involved in multiple signaling pathways of GBM and was associated with a poor prognosis for glioma patients [14]. OSMR is a member of the interleukin-6 receptor family and is a key regulator of GBM growth. It regulates the feed-forward signaling pathway to promote tumorigenesis with EGFRvIII and STAT3. A correlation between the upregulation of OSMR and poor survival was previously confirmed [15]. CCL1 is an inflammatory mediator that stimulates human monocyte migration. It was reported that the introduction of CCL1 induced tumor regression and resistance against tumors [16].
Further research has indicated a correlation between four immune cell types in the tumor microenvironment and risk score. Tepper found that malignant cell lines induced the infiltration of eosinophils in a nude mouse model, leading to the attenuation of malignant tumors [17]. A clinical trial reported that the improved survival of GBM patients was linked to increased numbers of eosinophils detected post-surgery [18]. Our study revealed that an increase in infiltrating eosinophils and follicular helper T cells was associated with a decreased patient risk score, indicating an improved prognosis. However, there is little evidence that follicular helper T cells are negatively correlated with risk score. With the progress of GBM, monocytes enter the central nervous system from the periphery through the damaged blood-brain barrier. A previous study reported that the proportion of tumor-associated macrophages (TAMs) in glioma tissues reached 30% [19], and that TAMs were the core components that promoted tumor progression and immunotherapy resistance in a GBM immunosuppressive microenvironment [20]. Furthermore, a high level of macrophage infiltration in GBM resulted in a worse prognosis [21]. Our results also verified that the infiltration of M0 macrophages was positively correlated with the risk score. However, the infiltration of pro-inflammatory M1 and anti-inflammatory M2 macrophages did have a direct link to prognosis. NK cells secrete tumor necrosis factor (TNF) and interferon (IFN), which exert a killing effect on cells, as well as having an important role in anti-tumor innate immunity. Our statistical analyses also showed that activated NK cell infiltration correlated with poor prognosis. Indeed, NK activity in glioma tissues was inhibited and the number of activated NK cells was very low, which reduced the killing effect of NK cells on GBM in the GBM microenvironment [9,[22][23][24].
In summary, we analyzed the prognosis of GBM patients using prognostic-related immune genes and immune infiltration, which identified predictive biomarkers and valuable clues for new immunotherapy strategies. However, there were potential limitations in our analysis. The related predictions were based on the sequencing data analysis of a public database. Furthermore, the type of immune cells was not verified by immunohistochemistry, and the results have not been verified by biological experiments. Subsequent in vivo or in vitro and clinical verification is needed to consolidate our research results.

Conclusions
In our study, relevant immune genes were identified to establish a prediction model for the prognosis of GBM patients. We also analyzed the impact of immune cell infiltration in the microenvironment on the risk score. Further research of these immune genes or immune infiltrates may provide potential therapeutic targets for GBM clinical treatment.