A Gene Signature of Survival Prediction for Kidney Renal Cell Carcinoma by Multi-Omic Data Analysis

Kidney renal cell carcinoma (KIRC), which is the most common subtype of kidney cancer, has a poor prognosis and a high mortality rate. In this study, a multi-omics analysis is performed to build a multi-gene prognosis signature for KIRC. A combination of a DNA methylation analysis and a gene expression data analysis revealed 863 methylated differentially expressed genes (MDEGs). Seven MDEGs (BID, CCNF, DLX4, FAM72D, PYCR1, RUNX1, and TRIP13) were further screened using LASSO Cox regression and integrated into a prognostic risk score model. Then, KIRC patients were divided into high- and low-risk groups. A univariate cox regression analysis revealed a significant association between the high-risk group and a poor prognosis. The time-dependent receiver operating characteristic (ROC) curve shows that the risk group performs well in predicting overall survival. Furthermore, the risk group is contained in the best multivariate model that was obtained by a multivariate stepwise analysis, which further confirms that the risk group can be used as a potential prognostic biomarker. In addition, a nomogram was established for the best multivariate model and shown to perform well in predicting the survival of KIRC patients. In summary, a seven-MDEG signature is a powerful prognosis factor for KIRC patients and may provide useful suggestions for their personalized therapy.


Introduction
Kidney cancer is a complicated disease that is composed of a variety of different types of kidney tumors. Renal cell carcinoma (RCC) accounts for 90% of all kidney cancers. The main three types of kidney cancer are: kidney renal clear cell carcinoma or clear cell RCC (KIRC or ccRCC, accounting for 70-75% of all kidney cancers), kidney renal papillary cell carcinoma or papillary RCC (KIRP or pRCC, accounting for 10-16% of all kidney cancers), and kidney chromophobe or chromophobe RCC (KICH or chRCC, accounting for 5% of all kidney cancers) [1]. KIRC, as the most common subtype of kidney cancer, usually has the worst overall survival rate [2]. The loss of the von Hippel-Lindau (VHL) gene was found to be involved in more than 60% of all sporadic renal cell cancers, which results in a high expression level of hypoxia-inducible factors (HIFs) and their downstream genes, including vascular endothelial growth factor (VEGF) [3]. Studies suggest that VHL-HIF is a key oncogenesis mechanism in KIRC [4]. According to this mechanism, tyrosine kinase inhibitors (TKIs), including sunitinib and pazopanib, have been developed to treat KIRC patients by targeting the VEGF Int. J. Mol. Sci. 2019, 20, 5720 2 of 20 pathway, and have shown good performance in increasing patients' overall survival rate under certain circumstances [5,6]. Besides VHL, frequent genetic mutations have been detected in BAP1, PBRM1, and SETD2, all of which are related to chromatin and histone regulation and serve as tumor suppressor genes in kidney cancer. Bihr et al. concluded that a combination of PBRM1, BAP1, SETD2, and VHL dysfunction is fatal to KIRC progression [7]. It was reported that VHL mutation has no prognostic value [8], while BAP1 mutation was related to overall survival in KIRC patients [9]. The SETD2 protein was reported to be a good prognosis marker in KIRC [10]. In RCC, decreased PBRM1 expression is associated with a poor prognosis [11]. Considering mutation information may provide us with new insights into the prognosis of KIRC.
Although many prognosis factors have been reported for KIRC, the pathological stage of RCC patients remains the main predictor of prognosis [12]. However, survival differences exist between patients at the same stage [13]. Therefore, new reliable prognostic biomarkers urgently need to be developed in order to realize personalized medicine and treatment. In addition, DNA methylation is a major epigenetic modification that plays important roles in tumor development. Many genes (such as CRB3 [14], SFRP1 [15], ZNF492, and GPR149 [16]) with an abnormal methylation status have been reported to be prognosis biomarkers for KIRC. Signatures formed by multiple genes with a changed methylation status have been reported to be powerful predictors of prognosis for hepatocellular carcinoma [17] and pancreatic cancer [18]. Recently, a nine-gene signature and a five-gene signature were reported to be prognosis biomarkers for KIRP and stage III KIRC, respectively [19,20]. However, both studies did not use the methylation information. Moreover, multi-omics based survival prediction models has been applied to cancers, such as breast cancer [21] and pancreatic ductal adenocarcinoma [22].
In our study, we obtained a seven-gene signature by combining methylation and expression profiles to provide a reliable prognostic model for KIRC patients.

Summary of Datasets
There were 529 KIRC samples and 79 normal samples with the Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/) expression data. There were 537 patients with clinical information. In addition, 336 samples had somatic mutation information. After comparing, we divided the TCGA data into two groups: 330 patients for whom three types of data were available (clinical data, a gene expression profile, and somatic mutation information) were used as a TCGA discovery dataset; and 196 patients for whom only clinical data and gene expression information were available were used as a TCGA validation dataset. Table 1 lists information on the TCGA discovery dataset and the validation dataset. Figure 1 shows the flowchart of this study.
The TCGA gene expression data contained 20,501 genes. We omitted 2398 genes that were not expressed in more than 50% of the samples, leaving 18,103 genes for the study.
DNA methylation data on 485,577 loci for paired KIRC samples and normal samples were obtained from the TCGA. After filtering the loci without a value for more than 50 samples, 396,064 loci remained for further analysis.
The somatic mutation data contained 26,693 somatic mutations involving 9290 genes. However, most genes had a very low mutation frequency. The somatic mutation information for the top five genes (VHL, PBRM1, SETD2, TTN, and BAP1) with the highest mutation frequency were extracted for this study (Table 1). In detail, VHL had the highest mutation frequency, with a mutation in 150 patients (accounting for 45.5% of all patients in the TCGA discovery dataset). PBRM1 had the second-highest mutation frequency, with a mutation in 128 patients (accounting for 38.8% of all patients in the TCGA discovery dataset). TTN had the third-highest mutation frequency, with a mutation in 58 patients (accounting for 17.6% of all patients in the TCGA discovery dataset). SETD2 had the fourth-highest mutation frequency, with a mutation in 39 patients (accounting for 11.8% of all patients in the TCGA discovery dataset). BAP1 had the fifth-highest mutation frequency, with a mutation in 29 patients (accounting for 8.8% of all patients in the TCGA discovery dataset).   After performing Student's t-test, a Benjamini-Hochberg correction was conducted to obtain the false discovery rate (FDR, i.e., the q-value) [23]. Then, with cut-off values of FDR < 0.05 and log 2 (FC) > 1, 2033 downregulated and 3466 upregulated genes were identified as being differentially expressed genes (DEGs) (Figure 2A). Furthermore, by comparison with the DNA methylation status, 863 DEGs were detected as being MDEGs in KIRC, including 234 downregulated genes with a hypermethylated status ( Figure 2B) and 629 upregulated genes with a hypomethylated status ( Figure 2C). Moreover, a function enrichment analysis was done through the database for annotation, visualization, and integrated discovery (DAVID, https://david.ncifcrf.gov/) for the 863 MDEGs, which revealed that these MDEGs were significantly enriched in cell proliferation, angiogenesis, and signal transduction, among other things ( Figure 3A). Enriched Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.genome.jp/kegg/) pathways were also detected for these MDEGs. The analysis showed that they were enriched in pathways in cancer, HTLV-I infection, and PI3K-Akt signaling, among others ( Figure 3B). Therefore, MDEGs were found to be involved in many important biological processes and pathways in cancer. Moreover, a function enrichment analysis was done through the database for annotation, visualization, and integrated discovery (DAVID, https://david.ncifcrf.gov/) for the 863 MDEGs, which revealed that these MDEGs were significantly enriched in cell proliferation, angiogenesis, and signal transduction, among other things ( Figure 3A). Enriched Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.genome.jp/kegg/) pathways were also detected for these MDEGs. The analysis showed that they were enriched in pathways in cancer, HTLV-I infection, and PI3K-Akt signaling, among others ( Figure 3B). Therefore, MDEGs were found to be involved in many important biological processes and pathways in cancer.

Construction and Assessment of a Prognostic Risk score Model for KIRC
By performing both a univariate and a multivariate Cox regression analysis for the 863 MDEGs, we identified 189 MDEGs as potential prognosis-related genes (p-value < 0.05). Then, we performed LASSO Cox regression on the 189 MDEGs to select the most informative gene set for prognosis ( Figure S1). Eventually, seven MDEGs (BID, CCNF, DLX4, FAM72D, PYCR1, RUNX1, and TRIP13) that were highly associated with prognosis were selected as features for a risk score model (Table 2). Finally, we constructed a formula to calculate the risk score for predicting prognosis based on the expression levels of the seven genes in KIRC patients.

RS BID
where "RS" is short for "risk score"; the number before each gene symbol in Formula (1) is the regression coefficient of the gene in the best LASSO Cox regression model; and the gene symbol represents the expression of this gene. Figure 4 and Figures S2-S6 show the univariate and multivariate Cox regression analysis results ( Table 2) for the selected seven MDEGs (BID, CCNF, DLX4, FAM72D, PYCR1, RUNX1, and TRIP13). The differential expression of the seven genes between tumor and normal samples suggests that they could be used as potential biomarkers for the diagnosis of KIRC patients ( Figure 5).

Construction and Assessment of a Prognostic Risk score Model for KIRC
By performing both a univariate and a multivariate Cox regression analysis for the 863 MDEGs, we identified 189 MDEGs as potential prognosis-related genes (p-value < 0.05). Then, we performed LASSO Cox regression on the 189 MDEGs to select the most informative gene set for prognosis ( Figure  S1). Eventually, seven MDEGs (BID, CCNF, DLX4, FAM72D, PYCR1, RUNX1, and TRIP13) that were highly associated with prognosis were selected as features for a risk score model (Table 2). Finally, we constructed a formula to calculate the risk score for predicting prognosis based on the expression levels of the seven genes in KIRC patients.
where "RS" is short for "risk score"; the number before each gene symbol in Formula (1) is the regression coefficient of the gene in the best LASSO Cox regression model; and the gene symbol represents the expression of this gene. Figure 4 and Figures S2-S6 show the univariate and multivariate Cox regression analysis results ( Table 2) for the selected seven MDEGs (BID, CCNF, DLX4, FAM72D, PYCR1, RUNX1, and TRIP13). The differential expression of the seven genes between tumor and normal samples suggests that they could be used as potential biomarkers for the diagnosis of KIRC patients ( Figure 5). 1 Figure 4. The expression signature that is associated with survival outcomes in KIRC. (A) Kaplan-Meier curves show the overall survival of KIRC patients with a high and a low expression level of DLX4. Patients with a high expression level of DLX4 exhibited a significantly worse outcome compared with those with a low expression level of DLX4. A multivariable Cox regression analysis showed that DLX4 was an independent prognostic factor as compared with other confounding factors. (B) PYCR1 was found to be an independent prognostic marker for KIRC. A high expression level of this factor was associated with a worse outcome.
factors. (B) PYCR1 was found to be an independent prognostic marker for KIRC. A high expression level of this factor was associated with a worse outcome. Furthermore, according to the formula of the risk score model (Equation 1), a risk score can be calculated for each patient. The patients were divided into high-and low-risk subgroups by using the median of the risk scores as a cut-off. The distribution of the risk scores is shown in Figure 6A. The survival status for the high-and low-risk subgroups is shown in Figure 6B, which shows that the high-risk subgroup contained a higher number of dead patients than the low-risk subgroup. The expression heatmap for the seven MDEGs shows that they tend to have a higher expression level in the high-risk subgroup as compared with the low-risk subgroup ( Figure 6C). Furthermore, according to the formula of the risk score model (Equation 1), a risk score can be calculated for each patient. The patients were divided into high-and low-risk subgroups by using the median of the risk scores as a cut-off. The distribution of the risk scores is shown in Figure 6A. The survival status for the high-and low-risk subgroups is shown in Figure 6B, which shows that the high-risk subgroup contained a higher number of dead patients than the low-risk subgroup. The expression heatmap for the seven MDEGs shows that they tend to have a higher expression level in the high-risk subgroup as compared with the low-risk subgroup ( Figure 6C). Significant survival differences between the high-risk subgroup and the low-risk subgroup were observed in the TCGA discovery dataset ( Figure 7A, left panel; the p-value of the log-rank is 1.17 × 10 −7 ). The receiver operating characteristic (ROC) analysis showed that the risk score based on the seven MDEGs performed well in death prediction ( Figure 7A, right panel). The area under the curve (AUC) at 1, 3, and 5 years was 0.798, 0.736, and 0.758, respectively, in the TCGA discovery dataset. To confirm the prognostic value of the risk score, we validated it using the TCGA validation dataset. Similar results were achieved in the validation dataset. Significant survival differences between the high-risk subgroup and the low-risk subgroup were also observed ( Figure 7B, left panel; the p-value of the log-rank is 1.43 × 10 −4 ). The AUC at 1, 3, and 5 years was 0.677, 0.66, and 0.71, respectively, in the TCGA validation dataset ( Figure 7B, right panel). In the analysis of the entire set, we also obtained similar results. The high-risk subgroup had a worse survival rate ( Figure 7C, left panel; the p-value of the log-rank is 4.80 × 10 −11 ). The AUC at 1, 3, and 5 years was 0.735, 0.702, and 0.735, respectively, in the entire TCGA dataset ( Figure 7C, right panel). Significant survival differences between the high-risk subgroup and the low-risk subgroup were observed in the TCGA discovery dataset ( Figure 7A, left panel; the p-value of the log-rank is 1.17 × 10 −7 ). The receiver operating characteristic (ROC) analysis showed that the risk score based on the seven MDEGs performed well in death prediction ( Figure 7A, right panel). The area under the curve (AUC) at 1, 3, and 5 years was 0.798, 0.736, and 0.758, respectively, in the TCGA discovery dataset. To confirm the prognostic value of the risk score, we validated it using the TCGA validation dataset. Similar results were achieved in the validation dataset. Significant survival differences between the high-risk subgroup and the low-risk subgroup were also observed ( Figure 7B, left panel; the p-value of the log-rank is 1.43 × 10 −4 ). The AUC at 1, 3, and 5 years was 0.677, 0.66, and 0.71, respectively, in the TCGA validation dataset ( Figure 7B, right panel). In the analysis of the entire set, we also obtained similar results. The high-risk subgroup had a worse survival rate ( Figure 7C, left panel; the p-value of the log-rank is 4.80 × 10 −11 ). The AUC at 1, 3, and 5 years was 0.735, 0.702, and 0.735, respectively, in the entire TCGA dataset ( Figure 7C, right panel).

Establishment of a Nomogram for Overall Survival (OS) Prediction in KIRC
Based on the best multivariate model, a prognostic nomogram was developed to predict the probabilities of 1-, 3-, and 5-year OS in KIRC using our defined risk group, tumor stage, age, and mutation status of VHL as variables ( Figure 8A). Harrel's concordance index (C-index) for OS prediction was 0.795, which suggests that our developed nomogram had high accuracy. The calibration curves for this nomogram were plotted ( Figure 8B). They showed good agreement between the nomogram's prediction and the actual observation for 1-, 3-, and 5-year OS rates. In addition, the predictive accuracy of the nomogram was compared with a system of other clinical characteristics, including sex, tumor stage, and age. The C-index of OS prediction was 0.53, 0.705, and 0.64 for sex, tumor stage, and age, respectively. These values were all lower than the C-index for our constructed nomogram (0.795). This indicates that our nomogram is a more accurate predictor of OS in KIRC than sex, tumor stage, or age.

Discussion
In this study, we developed a seven-MDEG signature to predict survival in KIRC patients. In brief, we first identified 863 DEGs with an altered DNA methylation status. Then, 189 MDEGs were found to be prognosis-related genes through univariate and multivariate Cox regression analyses. Seven MDEGs (BID, CCNF, DLX4, FAM72D, PYCR1, RUNX1, and TRIP13) of the 189 prognosis-related genes were selected to generate a risk score model by LASSO Cox regression. A survival analysis suggested that our seven-MDEG signature was an independent prognostic factor of KIRC. A best multivariate model that included our recorded risk group, tumor stage, age, and mutation status of VHL was obtained by a multivariate stepwise analysis. A prognostic nomogram was established for the best multivariate model, which showed its capability to predict survival for KIRC patients.
Differential expression genes were firstly identified using Student's t-tests and fold-change (FC). To check whether different methods will make big difference to the results, we compared the DEGs found by Student's t-test with the DEGs found by wilcoxon rank-sum test and DESeq2 [24] under the same cut-off levels, i.e., FDR < 0.05 and log 2 (FC) > 1. As it is shown in Figure S7, the DEGs detected by the three methods had high overlap rate, and only 37 DEGs (only accounting for 0.67%) found by Student's t-test were not detected by wilcoxon rank-sum test or DESeq2, which suggests that using Student's t-tests to detect DEGs is acceptable for the study.
The 863 MDEGs were enriched in 29 KEGG pathways. Many of the enriched KEGG pathways play important roles in cancer development and progression. The PI3K-Akt signaling pathway is an intracellular signal transduction pathway of great importance in promoting metabolism, proliferation, cell survival, cell growth, and angiogenesis. The PI3K-Akt signaling pathway was reported to be a target for cancer treatment [25]. Guo et al. reported on the roles of the PI3K-Akt signaling pathway in renal cell carcinoma (RCC), and pointed out that the genes in the PI3K-Akt pathway are frequently mutation genes in KIRC [26], which indicates that the PI3K-Akt pathway may provide information that is helpful to the treatment of KIRC. Cell adhesion molecules (CAMs) mediate cell-cell and cell-extracellular matrix adhesion. Okegawa et al. reported on the roles of CAMs in cancer progression and discussed their potential for treating cancer, especially urogenital cancer [27]. CADM4, one of the CAMs, was identified as being a candidate tumor suppressor gene in RCC [28]. Changes in the structure of the extracellular matrix (ECM) mediate proliferation, differentiation, morphogenesis, and tumor progression [29]. The axon guidance pathway plays core roles in kidney development, and death can result from kidney abnormalities when it is abnormal [30]. The p53 pathway has been shown to be repressed in RCC by an unknown dominant mechanism [31]. Zhang et al. summarized the roles that Rap1 signaling plays in tumor cell invasion and metastasis [32]. A dysregulated carbon metabolism was found in KIRC [33]. Altered focal adhesion was discovered in a KIRC tumor [34]. The RAS signaling pathway was associated with colorectal cancer [35]. HIF1-α is related to the development of KIRC [36] and has prognostic value for KIRC [37]. The MAPK signaling pathway is implicated in cell proliferation and differentiation, and can suppress RCC when it is repressed [38].
Furthermore, we have also done the enrichment analysis for both 629 upregulated genes with hypo-methylated status and 234 downregulated genes with hyper-methylated status, respectively. The results were shown Figure S8, many enriched Gene Ontology (GO) and KEGG pathways were consistent with the enrichment analysis for all 863 MDEGs. It is worth mention that metabolic pathways were down-regulated which was also reported by Tun, et al. [39].
We identified seven MDEGs that were associated with the prognosis of KIRC from the TCGA discovery dataset. All seven genes have been associated with cancers. BID (BH3-interacting domain death agonist) is involved in extrinsic apoptotic signaling [40] and mediating the DNA damage response [41], and may be used as a therapeutic target [42]. BID is also a pro-apoptotic member of the B-cell lymphoma-2 (Bcl-2) group of proteins, which have a high expression level in RCC and may participate in the progression of cancer [43]. Moreover, BID was reported to be an independent prognostic gene in colon cancer [44]. Therefore, BID is crucial to KIRC and may be used as a prognosis biomarker for KIRC. Cyclin F, which is encoded by CCNF, can form the Skp1-Cul1-F-box E3 protein ubiquitin ligase complex. E3 ubiquitin ligases are implicated in ubiquitin-proteasome-mediated protein degradation. Low CCNF expression has been found in hepatocellular carcinoma (HCC) and is connected with a poor prognosis [45]. It was reported that CCNF controls tumorigenesis through IDH1-R132H [46]. High CCNF expression was associated with a poor outcome in melanoma patients [47]. An abnormal DLX4 expression level has been reported in inflammatory breast cancer, leukemia, lung cancer, ovarian cancer, and prostate cancer [48][49][50][51][52]. Overexpression of DLX4 leads to cancer migration, invasion, and metastasis by driving the expression of TWIST [53]. FAM72D, a family with sequence similarity 72 member D, can be considered to be a target gene in glioblastoma multiform (GBM) [54]. FAM72D was reported to be a new target for cancer therapies since its expression correlates with UBE2C, whose higher expression leads to a worse OS prognosis [55]. Pyrroline-5-carboxylate reductase 1 (PYCR1) can promote tumor cell growth in breast cancer [56]. PYCR1 was reported to be overexpressed in prostate cancer [57]. Since PYCR1 was found to be negatively regulated by miR-488, which is related to cell proliferation and tumorigenesis, it was reported to be a novel therapeutic target in lung cancer [58]. PYCR1 may be a potential therapeutic target for treating prostate cancer and breast cancer [59,60]. Runt-related transcription factor 1 (RUNX1) has many roles in cancer. For example, RUNX1 was found to be downregulated in gastric cancer [61] and hepatocellular carcinoma [62]. Genetic variation in RUNX1 has been related to prostate cancer [63] and colorectal cancer [64]. A novel RUNX1-RUNX1T1 pathway was reported in KIRC, which may provide new ideas for treating KIRC [65]. The RUNX1/AKT pathway may be a new therapeutic target in kidney fibrosis [66]. Thyroid hormone Receptor Interactor 13 (TRIP13) was reported to be a prognostic biomarker for colorectal cancer [67] and prostate cancer [68]. Silencing TRIP13 can activate TGF-β1/smad3 signaling, resulting in a repression of cell growth and metastasis of hepatocellular carcinoma (HCC) [69]. TRIP13-deficient tubular epithelial cells tend to progress towards apoptotic cell death following acute kidney injury [70].
In our LASSO Cox regression, seven MDEGs were selected in the end which met the one in ten rule [72] since we have 72 patients who died. And through multivariate stepwise analysis, a best multivariate model with four predictors was obtained, which also satisfied the rule that a minimum of 10 outcome events per predictor variable (EPV) since we had 72 patients who died. In the multivariate Cox regression analysis for each MDEG, nine predictors were included in the model to investigate the influences of the eight confounding factors on the MDEG. According to the study of Vittinghoff E. [73], one in 10 rule can be relaxed. They suggested that five to nine events per predictor can be enough. Our study met this requirement.
Our univariate Cox regression analysis showed that VHL mutation was not related to survival in KIRC patients, which is consistent with the research done by Kim et al. [8]. We also found that BAP1 mutation has some prognostic value in KIRC patients, which is consistent with the report done by Wang et al. [9]. Although the mutation statue of VHL was not shown in the univariate Cox regression analysis to have prognostic value, it was contained in the best multivariate model together with our recorded risk group, suggesting that it has a potential role in prognosis for KIRC patients.
Since we cannot find the KIRC data with both RNA-seq and clinical data from the public database, we used KIRP (kidney renal papillary cell carcinoma) data from TCGA as an external validation dataset. KIRP and KIRC share many characteristics since both of them are subtypes of kidney cancer and originate from the same tissue. The results showed that our risk score model achieved a good result in KIRP data ( Figure S9). Significant survival differences between the high-risk subgroup and the low-risk subgroup were observed in the KIRP dataset ( Figure S9A; the p-value of the log-rank is 5.00 × 10 −4 ). The receiver operating characteristic (ROC) analysis showed that the risk score based on the seven MDEGs performed well in death prediction ( Figure S9B). The area under the curve (AUC) at 1, 3, and 5 years was 0.871, 0.796, and 0.719, respectively.
The main limitations of this paper are: (1) the seven-gene signature was generated from the TCGA dataset, in which most patients are Caucasian; and (2) this signature was only validated in the TCGA cohort. Therefore, this signature needs to be further investigated in multiple datasets with different populations.

Datasets and Networks
The multiplatform genomics datasets including gene expression, DNA methylation, and somatic mutation information were downloaded from the Cancer Genome Atlas (TCGA, http://cancergenome. nih.gov/) for KIRC. The clinical information was obtained through the TCGA Data Commons (https: //gdc.cancer.gov/).

Identification of Differentially Expressed Genes (DEGs) with an Altered DNA Methylation Status in KIRC
Gene expression data and DNA methylation data were employed to identify DEGs with an altered methylation status. Firstly, Student's t-test and fold-change (FC) were used to determine DEGs in KIRC. Next, the DNA methylation status of genes was compared between KIRC samples and matched normal samples by using the Chip Analysis of Methylation Pipeline (ChAMP) [74,75] available in the R Bioconductor package. Then, both downregulated genes with a hypermethylated status and upregulated genes with a hypomethylated status were identified as candidate genes for prognosis.

Functional and Pathway Enrichment Analyses
The GO functional enrichment analysis and the KEGG pathway enrichment analysis were performed using the DAVID platform [76] for the downregulated genes with a hypermethylated status and upregulated genes with a hypomethylated status.

Establishment of the MDEG Signature for Prognosis of KIRC
For each MDEG, all patients were divided into high-and low-expression subgroups by using the median expression of the gene as a cut-off. The association between MDEG expression and patients' OS was assessed by a Kaplan-Meier (KM) survival curve and a log-rank test between subgroups of high-and low-expression patients for each MDEG. MDEGs that show obvious differences between two groups can be used as potential prognosis biomarkers for cancer. Furthermore, multivariate Cox models (function "coxph", R package "survival") were further used to investigate the influences of the confounding factors on MDEGs. The confounding factors included sex information, pathologic stages of a tumor, age at initial pathologic diagnosis, and the mutation status of genes in KIRC. MDEGs with a p-value of less than 0.05 in the log-rank test for both the univariate analysis and the multivariate analysis were further screened and confirmed by LASSO Cox regression analysis. Then, independent MDEG biomarkers were integrated into an MDEG signature by calculating a risk score with the following formula: Coe f f icient o f Gene (i) * Expression o f Gene(i) where "RS", which is short for "risk score", is a MDEG signature risk score for each KIRC patient; Coefficient of Gene (i) is the regression coefficient of Gene (i) in the best LASSO Cox regression model; and Expression of Gene (i) is the expression value of Gene (i) for the patient. Based on this formula, a MDEG signature risk score can be obtained for each patient. Then, the median risk score was used as the cutoff to divide patients into a high-risk subgroup and a low-risk subgroup. A KM survival curve analysis and a log-rank test was then performed to compare survival between the high-and low-risk subgroups.

Statistical Analysis
To evaluate the prognostic performance of the risk score system, univariate and multivariate models were computed using Cox proportional-hazards regression. The univariate analysis was used to independently assess the prognostic performance of each variable (Table 1). Moreover, a multivariate model including risk group (high-risk group and low-risk group) and all of the other variables (Table 1) was developed. Then, to select the most informative variables, a backward-forward step procedure was carried out in the R package "stats". Furthermore, a nomogram was constructed for the best multivariate model using the R package "rms". The performance of the nomogram was assessed with Harrel's concordance index (C-index), which is a useful evaluation measure that is similar to calculating the area under the receiver operating characteristic (ROC) curve. C-indices range from 0.5 to 1.0, where 0.5 denotes a bad performance and 1 denotes a good performance [77]. Calibration curves were drawn to check for consistency between the predicted survival and the observed survival.

Conclusions
In conclusion, we built a seven-MDEG signature for predicting survival in KIRC patients by making full use of multi-platform datasets including DNA methylation data, gene expression profiles, somatic mutation data, and clinical information. Our results showed that this signature is an independent prognostic factor in KIRC patients and can more accurately predict OS in KIRC patients than a tumor stage system. Therefore, our constructed signature has potential application in realizing future personalized medicine for KIRC patients.