Accurate Prediction of Cancer Prognosis by Exploiting Patient-Specific Cancer Driver Genes

Accurate prediction of the prognoses of cancer patients and identification of prognostic biomarkers are both important for the improved treatment of cancer patients, in addition to enhanced anticancer drugs. Many previous bioinformatic studies have been carried out to achieve this goal; however, there remains room for improvement in terms of accuracy. In this study, we demonstrated that patient-specific cancer driver genes could be used to predict cancer prognoses more accurately. To identify patient-specific cancer driver genes, we first generated patient-specific gene networks before using modified PageRank to generate feature vectors that represented the impacts genes had on the patient-specific gene network. Subsequently, the feature vectors of the good and poor prognosis groups were used to train the deep feedforward network. For the 11 cancer types in the TCGA data, the proposed method showed a significantly better prediction performance than the existing state-of-the-art methods for three cancer types (BRCA, CESC and PAAD), better performance for five cancer types (COAD, ESCA, HNSC, KIRC and STAD), and a similar or slightly worse performance for the remaining three cancer types (BLCA, LIHC and LUAD). Furthermore, the case study for the identified breast cancer and cervical squamous cell carcinoma prognostic genes and their subnetworks included several pathways associated with the progression of breast cancer and cervical squamous cell carcinoma. These results suggested that heterogeneous cancer driver information may be associated with cancer prognosis.


Introduction
The accurate prediction of the prognoses of cancer patients is important since it can allow the provision of improved treatment and help design anticancer drugs by enhancing our understanding of cancer progression. Numerous bioinformatics studies have previously been conducted for the accurate prediction of cancer prognoses and the identification of prognostic biomarkers. These studies were primarily focused on developing statistical [1] or machine learning methods [2,3] and then applying them to various types of omics data. Among these two approaches, machine learning methods have been gaining increasing attention recently, and have shown good performances as a result of recent advances in machine learning, such as deep learning, and the accumulation of omics data.
Bioinformatics studies for the prediction of cancer patients can be divided roughly into those that are focused on each gene independently and those that consider the relationship between genes. The latter category uses genetic network data, such as the protein-protein interaction network (PPI) and the gene regulation network [4,5], alongside omics data. The exploitation of genetic network data is advantageous since it provides a better understanding of cancer development and progression, considering that prognostic genes can be identified at the genetic network level. Additionally, this can reduce problems with dimensionality that are caused by having few samples compared to numerous genes. Therefore, machine-learning methods can benefit from the use of genetic network data. and TRRUST were 14,071/110,721, 23,336/372,774, and 2852/9383, respectively. Furthermore, the number of genes and edges in the integrated network was 25,167 and 490,200, respectively.
Based on the patient's death information and survival period from the clinical data, the prognosis was predicted to be bad if the patient died before the criteria year, and good if not. Cancer types have different criteria over time, with BLCA, BRCA, KIRC, LIHC, LUAD, PAAD and STAD being the known criteria. For other cancer types, we set a criterion year that balanced the number of good and bad samples in the gene expression data. Table 1 shows the data for each cancer sample obtained from the TCGA database.

Model Configuration
To demonstrate that gene selection based on p-values was appropriate, the performance of the DNN was compared using p-value-based genes, all genes, KEGG genes [19], and randomly selected genes. After generating good and bad samples, the training and test sets were randomly divided using a ratio of 5:5. In the case of the p-value, 5-fold cross-validation was performed on the training set. Genes were then selected using [0.1, 0.05, 0.01, 0.005] as the thresholds for each fold change. Thereafter, the model was constructed by selecting the p-value with the highest average AUC of 5-fold tests, with the results being derived using the test set. The p-value selected for each cancer or configured training set was varied; therefore, the p-value was not fixed separately, and the optimal p-value was determined according to the situation. The number of genes in the method of randomly selecting genes was selected based on the number of genes selected using the p-value. Finally, as shown in Figure 1, the genes selected based on p-values showed the highest performance. Next, to demonstrate that the proposed patient-specific gene network was suitable, its performance was compared using the proposed patient-specific gene network, a randomly weighted network, and an unweighted network, as in Dawnrank. As shown in Figure 2a, the average AUC of 10 random samples was the highest for the patient-specific network in most carcinomas. Although the AUC of randomly weighted networks was Next, to demonstrate that the proposed patient-specific gene network was suitable, its performance was compared using the proposed patient-specific gene network, a randomly weighted network, and an unweighted network, as in Dawnrank. As shown in Figure 2a, the average AUC of 10 random samples was the highest for the patient-specific network in most carcinomas. Although the AUC of randomly weighted networks was higher in PAAD, the use of patient-specific networks provided stability in terms of prediction by showing significantly smaller deviations in patient-specific networks across all cancers. Next, to demonstrate that the proposed patient-specific gene network was suitable, its performance was compared using the proposed patient-specific gene network, a randomly weighted network, and an unweighted network, as in Dawnrank. As shown in Figure 2a, the average AUC of 10 random samples was the highest for the patient-specific network in most carcinomas. Although the AUC of randomly weighted networks was higher in PAAD, the use of patient-specific networks provided stability in terms of prediction by showing significantly smaller deviations in patient-specific networks across all cancers.
The results were also compared using the gene score adjustment scheme, the gene score without win rate, and the gene scores as inputs. Figure 2b shows that the adjustment using the win rate was critical for predicting cancer prognosis.  The results were also compared using the gene score adjustment scheme, the gene score without win rate, and the gene scores as inputs. Figure 2b shows that the adjustment using the win rate was critical for predicting cancer prognosis.

Hyperparameter Tuning
The hyperparameters adjusted in the DNN model were the learning rate and batch size. Other model configurations had three hidden layers, with ReLU being used between the hidden layers as the activation function, while Sigmoid was used for the output layer. The total number of epochs was 200, although learning was terminated if a loss value of less than 0.0001 was accumulated more than 20 times during learning. For comparison, the AUC of the DNN model was measured using the LIHC sample. The gene was selected based on the p-value with the highest average AUC in the five folds of the training set. After training the entire training set with the selected gene, the test set was evaluated. This method was subsequently repeated a total of 10 times. Figure 3 shows that the average AUC was highest when the learning rate was 0.001 and the batch size was 2. When the learning rate was 0.0001, the AUC remained similarly high, although it took more than twice as long as when at 0.001, so this was not selected.
the AUC of the DNN model was measured using the LIHC sample. The gene was selected based on the p-value with the highest average AUC in the five folds of the training set. After training the entire training set with the selected gene, the test set was evaluated. This method was subsequently repeated a total of 10 times. Figure 3 shows that the average AUC was highest when the learning rate was 0.001 and the batch size was 2. When the learning rate was 0.0001, the AUC remained similarly high, although it took more than twice as long as when at 0.001, so this was not selected. We also compared various machine learning methods such as Random Forest, XGBoost [20], LightGBM [21], CatBoost [22], and DNN. In Figure 4, DNN shows a significantly higher AUC, so we selected DNN as our classifier.  We also compared various machine learning methods such as Random Forest, XG-Boost [20], LightGBM [21], CatBoost [22], and DNN. In Figure 4, DNN shows a significantly higher AUC, so we selected DNN as our classifier.
the AUC of the DNN model was measured using the LIHC sample. The gene was selected based on the p-value with the highest average AUC in the five folds of the training set. After training the entire training set with the selected gene, the test set was evaluated. This method was subsequently repeated a total of 10 times. Figure 3 shows that the average AUC was highest when the learning rate was 0.001 and the batch size was 2. When the learning rate was 0.0001, the AUC remained similarly high, although it took more than twice as long as when at 0.001, so this was not selected. We also compared various machine learning methods such as Random Forest, XGBoost [20], LightGBM [21], CatBoost [22], and DNN. In Figure 4, DNN shows a significantly higher AUC, so we selected DNN as our classifier.

Comparison on Different Machine Learning Methods
To evaluate the performance of the proposed method, the AUC, PR-AUC, balanced accuracy, F1-score, and Matthews correlation coefficient values of the CPR, GEDFN, Wu and Stein [23], WGCNA, GVES, and DeepProg were compared. In most cancers, except for BLCA, LIHC, and LUAD, the proposed method demonstrated a superior average performance compared to the majority of previous studies, as shown in Figure 5 and Supplementary Figure S1. The proposed method also significantly outperformed in cases of BRCA, CESC, and PAAD. DeepProg showed an excellent average performance in most cancers, especially BLCA, LIHC, and LUAD; however, the accuracies were not significant compared to the proposed method and showed a large deviation compared to the proposed method. The low deviation of the proposed method indicated that stable prediction was possible through the generation of patient-specific impact vectors.
mentary Figure S1. The proposed method also significantly outperformed in cases of BRCA, CESC, and PAAD. DeepProg showed an excellent average performance in most cancers, especially BLCA, LIHC, and LUAD; however, the accuracies were not significant compared to the proposed method and showed a large deviation compared to the proposed method. The low deviation of the proposed method indicated that stable prediction was possible through the generation of patient-specific impact vectors.

Functional Analysis of Prognostic Genes
The proposed method exhibited significantly higher AUC values for BRCA, CESC, and PAAD. A functional analysis was then performed for the prognostic genes BRCA and CESC, which commonly affect women.
To select prognostic genes for BRCA, genes were initially selected with a final score (win rate) in the top 30 for ten experiments. As a result, 25 genes were identified thrice out of ten experiments and were subsequently selected as prognostic genes (Supplementary Table S1). Eleven of the twenty-five prognostic genes were selected more than four times, while six genes were known BRCA driver genes (APC, BRCA1, MAX, RB1, RUNX1, and SMAD2) in the CGC [24] and Intogen [25] databases. A subgraph of the 25 prognostic genes is shown in Figure 6. Using patient-specific gene networks for all samples, 25 genes had an average edge density of 479.2, which was relatively high compared to the whole network, which had an average edge density of 65.8.
The functional analysis of 25 prognostic genes was then performed using DAVID [26], and the terms of Benjamini p-value < 0.05, among the terms corresponding to the Biological Process among KEGG Pathway and GO, are presented in Supplementary Table  S2. Of the total 38 terms, 31 had been studied in the literature for relevance to breast cancer, while 10 were relevant to the prognosis of breast cancer (Supplementary Table S2).

Functional Analysis of Prognostic Genes
The proposed method exhibited significantly higher AUC values for BRCA, CESC, and PAAD. A functional analysis was then performed for the prognostic genes BRCA and CESC, which commonly affect women.
To select prognostic genes for BRCA, genes were initially selected with a final score (win rate) in the top 30 for ten experiments. As a result, 25 genes were identified thrice out of ten experiments and were subsequently selected as prognostic genes (Supplementary Table S1). Eleven of the twenty-five prognostic genes were selected more than four times, while six genes were known BRCA driver genes (APC, BRCA1, MAX, RB1, RUNX1, and SMAD2) in the CGC [24] and Intogen [25] databases. A subgraph of the 25 prognostic genes is shown in Figure 6. Using patient-specific gene networks for all samples, 25 genes had an average edge density of 479.2, which was relatively high compared to the whole network, which had an average edge density of 65.8.
The functional analysis of 25 prognostic genes was then performed using DAVID [26], and the terms of Benjamini p-value < 0.05, among the terms corresponding to the Biological Process among KEGG Pathway and GO, are presented in Supplementary Table S2. Of the total 38 terms, 31 had been studied in the literature for relevance to breast cancer, while 10 were relevant to the prognosis of breast cancer (Supplementary Table S2).
Furthermore, Kaplan-Meier analysis was performed on 25 prognostic genes. In total, 18 genes showed a p-value < 0.05 (Supplementary Figure S2), and of these, nine genes (APC, BRCA1, COPS5, FOXD1, MAPK10, MAPK14, NCOA3, PAX6, PLK1, Supplementary Figure S2a) showed a low survival probability in the highly expressed group, among which two BRCA driver genes, APC and BRCA1, were included. In addition, nine genes (BATF, BCL3, FLT3, JUND, MAX, NFATC1, RUNX1, TGIF1, XBP1, Supplementary Figure S2b) showed high survival probability in the high-expressed group, among which two BRCA driver genes, MAX and RUNX1, were included. We validated 25 genes using KMplot web [27] and found that the p-values of 17 genes were <0.05 (Supplementary Table S1 and Supplementary Figure S3). Among 17 genes, 15 genes were also significant in our Kaplan-Meier analysis. We also validated 25 genes using Protein Atlas [28]. Among the 25 genes, JUND and BCL3 were reported as prognostic for BRCA (p-value = 0.000061 and 0.00097, respectively), and were also significant in our Kaplan-Meier analysis (p-value = 0.0005 and 0.0025, respectively). In addition, 19 genes showed p-value < 0.05 in Protein Atlas. Among 19 genes, 15 genes also showed significant p-values in our Kaplan-Meier analysis (Supplementary Table S1). 0.00097, respectively), and were also significant in our Kaplan-Meier analysis (p-value = 0.0005 and 0.0025, respectively). In addition, 19 genes showed p-value < 0.05 in Protein Atlas. Among 19 genes, 15 genes also showed significant p-values in our Kaplan-Meier analysis (Supplementary Table S1).
In addition, we performed a correlation analysis with immune cells using TIMER [29] for 25 genes. We found many of them to have significant positive and/or negative correlations with ten T cells (Supplementary Figure S4). Among them, RUNX1 and XBP1 showed generally negative correlations, while IRF7, KAT2B, MAX, and NCOA3 showed generally positive correlations, and BATF and FLT3 showed positive correlations in BRCA-Basal and BRCA-Her2. These results indirectly showed how the selected gene affected the prognosis. Figure 6. Network for genes derived more than thrice out of the top 25 genes in BRCA. MAX was selected six times, three genes (APC, FOXD1 and TGIF1) were selected five times, seven genes (COPS5, RUNX1, BATF, MAPK10, NCOA3, JUND and IRF7) were selected more than four times, and the remaining 13 genes were selected three times out of ten times. USF2 (selected twice) was not included in top 25 genes.
Next, prognostic genes were selected for CESC, including BRCA, while 20 genes were identified in three out of the ten experiments and were selected as prognostic genes (Supplementary Table S3). Six of the twenty prognostic genes were selected more than four times. Unlike in BRCA, no known driver genes were included among the 20 prognostic genes. A subgraph of the 20 prognostic genes is shown in Figure 7. Similarly to BRCA, the average edge density of these 20 genes was higher than that of the entire network (357.2 Figure 6. Network for genes derived more than thrice out of the top 25 genes in BRCA. MAX was selected six times, three genes (APC, FOXD1 and TGIF1) were selected five times, seven genes (COPS5, RUNX1, BATF, MAPK10, NCOA3, JUND and IRF7) were selected more than four times, and the remaining 13 genes were selected three times out of ten times. USF2 (selected twice) was not included in top 25 genes.
In addition, we performed a correlation analysis with immune cells using TIMER [29] for 25 genes. We found many of them to have significant positive and/or negative correlations with ten T cells (Supplementary Figure S4). Among them, RUNX1 and XBP1 showed generally negative correlations, while IRF7, KAT2B, MAX, and NCOA3 showed generally positive correlations, and BATF and FLT3 showed positive correlations in BRCA-Basal and BRCA-Her2. These results indirectly showed how the selected gene affected the prognosis.
Next, prognostic genes were selected for CESC, including BRCA, while 20 genes were identified in three out of the ten experiments and were selected as prognostic genes (Supplementary Table S3). Six of the twenty prognostic genes were selected more than four times. Unlike in BRCA, no known driver genes were included among the 20 prognostic genes. A subgraph of the 20 prognostic genes is shown in Figure 7. Similarly to BRCA, the average edge density of these 20 genes was higher than that of the entire network (357.2 vs. 65.8). TBP was selected seven times, whereas the proto-oncogene MYC was not selected at all. This may have been because MYC was a driver gene regardless of prognosis, whereas TBP may be a novel driver candidate related to MYC and the prognosis of CESC.
Similarly to BRCA, a functional analysis was performed here, with the results shown in Supplementary Table S4. Of the total 37 terms, 25 had previously been studied for their relevance to cervical squamous cell carcinoma, while 11 were relevant to the prognosis of cervical squamous cell carcinoma. Four KEGG pathways (human T-cell leukemia virus 1 infection, colorectal cancer, breast cancer, and pathways in cancer) and five GO terms, including the regulation of transcription from the RNA polymerase II promoter and DNA-templated regulation of transcription, were commonly enriched in BRCA and CESC. Among these common terms, the most unexpected was human T-cell leukemia virus 1 infection, which may share pathways related to the prognosis of CESC and BRCA.

Discussion
In this study, it was demonstrated that patient-specific cancer driver genes could b used to predict cancer prognoses more accurately. Firstly, patient-specific gene network Kaplan-Meier analysis was also performed on the 20 CESC prognostic genes. Twelve genes showed a p-value < 0.05 (Supplementary Figure S5), and of these, eight genes (ATF2, BRCA2, FAM120B, GTF2I, NUP58, PLAGL1, SKIL, and TBP; Supplementary Figure S5a) showed a low survival probability in the high-expression group. Furthermore, four genes (HLF, POU2F1, SOX10, and TCF7; Supplementary Figure S5b) were associated with a high probability of survival in the high-expression group. Finally, we analyzed 20 genes using Protein Atlas [28]. Among the 20 genes, NUP58 was reported as prognostic for CESC (p-value = 0.00026, high expression is unfavorable), and was also significant in our Kaplan-Meier analysis (p-value = 0.0075). In addition, 10 genes (SKIL, KDM5B, TCF7, RPS6KB2, PLAGL1, NFYA, HLF, TGFB1, PTTG1, and GTF2I) showed p-value < 0.05 in Protein Atlas (Supplementary Table S3). Among 10 genes, five genes (SKIL, TCF7, PLAGL1, HLF, and GTF2I) also showed significant p-values in our Kaplan-Meier analysis. Similarly to BRCA, these results showed how the selected genes affected prognosis.

Discussion
In this study, it was demonstrated that patient-specific cancer driver genes could be used to predict cancer prognoses more accurately. Firstly, patient-specific gene networks were generated using the differences in gene expression between cancer and normal samples for cancer prognosis prediction. Subsequently, modified PageRank was used to generate the feature vectors. These feature vectors represented the impact or influence of genes on all genes in the patient-specific gene network. The feature vectors of the goodand poor-prognosis groups were subsequently used to train the deep feedforward network.
The proposed method generally outperformed the existing state-of-the-art methods in predicting the prognoses of 11 cancer types. In particular, the proposed method significantly outperformed DeepProg, which was the second-best method for BRCA, CESC, and PAAD, while outperforming DeepProg for five more cancer types. These results were relatively surprising, considering that the classification model of the proposed method was a simple deep feed-forward network, whereas DeepProg adopted a sophisticated deep-learningbased classification model. It was concluded that a good performance originated from the proper feature set, which was the driver gene score.
The novelty of the proposed method can be summarized as: (1) the novel patientspecific gene network generation scheme; and (2) the generation of a feature vector of driver gene scores, which is appropriate for cancer prognosis.
The advantage of the proposed method over previous methods is its high accuracy in the prediction of cancer prognosis and its ability to select proper prognostic genes. However, the running time of the proposed method is relatively slower than other methods because patient-specific gene networks are generated and processed for each sample.
The results are limited in that further validation through wet lab experiments is still necessary to confirm that the derived driver genes were actually driver genes. Alternatively, it can be indirectly inferred that driver genes were properly derived using known driver genes from the CGC or Intogen databases. However, the number of known driver genes was not sufficient for a meaningful statistical evaluation. For example, there were six known driver genes in 30 genes selected from BRCA, while there were no known driver genes in 25 genes from CESC. However, some of these 25 genes were clearly related to CESC and were likely to have been driver genes of CESC. The proposed method could be improved upon by developing a better patient-specific driver gene method, which will be investigated in our ongoing research paper.

Overview
A patient-specific gene network was first constructed, before generating feature vectors by scoring patient-specific genes using PageRank and then correcting patient-specific gene scores using somatic mutation data. For each cancer sample, a set of feature vectors was created by comparing it to the normal sample group, with a feature vector of the gene representing its influence on the genes in the patient-specific gene network. Therefore, these feature vectors could be used to identify cancer driver genes, while the feature vectors of the good and poor prognosis groups showed differences in the similarity of the potential driver genes for each group. The feature vectors of the good-and poor-prognosis groups were then used to train the deep feedforward network. Figure 8 presents a general overview of the proposed method.

Building the Patient-Specific Gene Network
First, the patient-specific gene network was constructed using gene expression data and integrated gene networks, which consisted of FI networks from Reactome [16] and gene regulation networks from RegNetwork [17] and TRRUST [18]. The proposed model was designed to search for prognosis-specific genes that have a significant influence on other genes. Therefore, the directions of all edges with direction were reversed, as in DawnRank.
A patient-specific gene network is represented by W, which is a weighted adjacency matrix. W is calculated using two matrices, A and Φ, as shown in Equation (1).
where A is an n × n adjacent matrix of the integrated gene network calculated by Equation (2), while Φ is a matrix calculated based on R. Φ and R are calculated using Equations (3) and (4), respectively.
Φ ij = R g i + R g j × min R g i , R g j (4) Figure 8. Overview of the proposed method.

Building the Patient-Specific Gene Network
First, the patient-specific gene network was constructed using gene expression data and integrated gene networks, which consisted of FI networks from Reactome [16] and gene regulation networks from RegNetwork [17] and TRRUST [18]. The proposed model was designed to search for prognosis-specific genes that have a significant influence on other genes. Therefore, the directions of all edges with direction were reversed, as in DawnRank.
A patient-specific gene network is represented by , which is a weighted adjacency matrix.
is calculated using two matrices, and , as shown in Equation (1). In Equation (3), T rank is the rank of the expression value of a gene in a single cancer sample, while N rank is the rank of the average expression of a gene in all normal samples. The larger the value of the expression, the higher the rank. Additionally, R g represents the difference in the expression of gene g between the cancer and normal samples. A larger R g value indicates that gene g is more significant for the prediction of prognosis. In (4), Φ ij is a weight calculated as the rank difference between gene i and gene j . The term min R g i , R g j is multiplied since an edge with two significant nodes (genes) is better than an edge with only one significant gene. For example, given R a = 6, R b = 1, R c = 3, R d = 4, Φ ab = 7, and Φ cd = 21, although (R a + R b ) = (R c + R d ). An example of calculating R, Φ, and W is shown in Figure 9.
sample, while is the rank of the average expression of a gene in all normal samples. The larger the value of the expression, the higher the rank. Additionally, represents the difference in the expression of gene between the cancer and normal samples. A larger value indicates that gene is more significant for the prediction of prognosis. In (4), is a weight calculated as the rank difference between and . The term , is multiplied since an edge with two significant nodes (genes) is better than an edge with only one significant gene. For example, given 6, 1, 3, 4, 7, and 21, although . An example of calculating , , and is shown in Figure 9. Figure 9. Example for building a patient-specific genetic network. and have somatic mutations.

Calculation of Genetic Impact Scores
The score of for each patient can be calculated using Equation (5).
where is the absolute value of the difference in between a cancer sample and a group of normal samples. The damping factor is expressed in Equation (6) as in Dawn-Rank, where the number of incoming edges for each gene is . Figure 9. Example for building a patient-specific genetic network. g 1 and g 5 have somatic mutations.

Calculation of Genetic Impact Scores
The score S i of gene i for each patient can be calculated using Equation (5).
where f i is the absolute value of the difference in gene i between a cancer sample and a group of normal samples. The damping factor d is expressed in Equation (6) as in DawnRank, where the number of incoming edges for each gene is A sum .
After a vector S is created for each cancer sample, a penalty is given for genes without genetic mutations, as shown in Equation (7), to correct the genetic score calculated in Section 2.3. for somatic cell mutations.
In Equation (7), S i represents the genetic score of gene i , while p is a penalty, which was set to 0.85 here, as in DawnRank.
The winning rate vector V is calculated and then used for training. If genes have a somatic mutation, its winning rate is calculated by comparing it with all other genes; otherwise, it is calculated by comparing it with genes that had only somatic mutations. An example is shown in Figure 10.
Considering that f is a vector of initial gene expression differences, and is propagated through the inversely directed gene network, S and V can be seen as vectors of the impacts or influences that genes have on the patient-specific gene network and can also be seen as the likelihood that genes can be cancer driver genes for a specific cancer sample.
Subsequently, only genes with significant p-values were selected by performing a t-test on the gene expression data of the good-and poor-quality groups. The prognostic prediction model was trained using a deep neural network with only genes having significant p-values in V.

, ℎ
In Equation (7), represents the genetic score of , while is a penalty, which was set to 0.85 here, as in DawnRank.
The winning rate vector is calculated and then used for training. If genes have a somatic mutation, its winning rate is calculated by comparing it with all other genes; otherwise, it is calculated by comparing it with genes that had only somatic mutations. An example is shown in Figure 10. Considering that is a vector of initial gene expression differences, and is propagated through the inversely directed gene network, and can be seen as vectors of the impacts or influences that genes have on the patient-specific gene network and can also be seen as the likelihood that genes can be cancer driver genes for a specific cancer sample.
Subsequently, only genes with significant p-values were selected by performing a ttest on the gene expression data of the good-and poor-quality groups. The prognostic prediction model was trained using a deep neural network with only genes having significant p-values in .
Supplementary Materials: The following supporting information can be downloaded at: www.mdpi.com/xxx/s1, Figure S1