Pathway-Based Personalized Analysis of Pan-Cancer Transcriptomic Data

The occurrence of cancer is closely related to the deregulation of certain pathways. Based on pathway deregulation scores (PDS) inferred by the Pathifier algorithm, we analyzed transcriptomic data of 13 different cancer types in The Cancer Genome Atlas database to identify cancer-specific deregulated pathways and prognostic pathways. The results showed that the individual-specific pathway deregulation scores can clearly distinguish different cancer types and their tumor-adjacent tissues. In addition, the cancer-specific deregulated pathways and prognostic pathways of different cancer types had high heterogeneity, and the identified cancer prognostic pathways have been reported to be closely related to the corresponding cancers. Furthermore, we also found that cancers with more deregulation pathways tend to be malignant and have worse prognoses. Finally, a Cox proportional Hazards model was constructed based on the prognostic pathways; this model successfully predicted survival and prognosis based on data from cancer samples. In addition, the performance of the breast cancer prognostic model was validated with an independent data set in the METABRIC database. Therefore, the prognostic pathways we identified have the potential to become targets for the treatment of cancer.


Introduction
The latest data released by the International Agency for Research on Cancer (IARC) of the World Health Organization show that, in 2020, there were 19.3 million new cancer cases diagnosed worldwide and nearly 10 million deaths from cancer [1]. Although there have been a large number of studies related to the prevention, diagnosis, and treatment of cancer, its complicated pathogenic mechanism is still unclear. With the continuous development of high-throughput technology, a large amount of omics data have been generated, which provides unprecedented opportunities for in-depth study of the mechanisms underlying the occurrence and development of cancer and of cancer prevention and treatment strategies.
Years of research have shown that cancer is generally considered to be driven by the continuous accumulation of somatic mutations throughout an individual's life, as well as by changes in epigenetics and transcription. Genes do not exist in isolation but interact with each other to form an organic biological network. The network-based cancer prognostic prediction model is more robust than the prediction model based on a single gene. Deregulation of biological pathways or biological networks often leads to the occurrence and development of cancer. Therefore, mining cancer-specific deregulated pathways can better explain the mechanisms of cancer occurrence and development at the system level [2,3].
At present, there are many methods for performing pathway analysis by combining high-throughput data. However, almost all of these methods can only characterize the path activity of the entire sample set and cannot provide information about the individualspecific related deregulated pathways in a specific cancer sample. For example, Efroni S A total of 185 KEGG pathways were downloaded from the MSigDB database (http: //www.gsea-msigdb.org/gsea/msigdb/, accessed on 2 March 2020) [13].

Overview of the Approach
There are three major steps in our method (see the flowchart in Figure 1). Step 1: Transform the gene expression matrix into the PDS matrix by using the Pathifier algorithm for each cancer type.
Step 2: Identify cancer-specific deregulated pathways based on statistic model and prognostic pathways for each cancer type based on a Cox-PH model.
Step 3: Analyze the distribution of the PDS in different cancers, identify the deregulated pathways in the cancer sample and all cancer samples with deregulation of this pathway, analyze the correlations among the prognostic pathways, and build prognostic prediction models for different cancer types based on the prognostic pathways.

Calculating the PDS
For any given pathway, Pathifier calculates a PDS for each cancer sample based on gene expression data [6]. The score represents the extent to which the activity of the pathway in a particular cancer tissue differs from that in normal cells of the same tissue.
Pathifier first calculates a score ( ), which measures the extent to which the behavior of pathway in sample deviates from that in normal tissue. To determine the pathway deregulation score (PDS) of this pathway, the expression level of the gene belonging to pathway is used. Each sample is a point in the dimensional space, and the entire sample set forms a point cloud. A (nonlinear) "principal curve" [14] is calculated to capture the variation of this cloud. Then, each sample is projected onto the curve, and the PDS is defined as the distance ( ) measured along the curve between the projection of the sample and the projection of the normal sample [6].
Based on the above process, the PDS of each sample in each pathway can be calculated. For each pathway, we calculate the mean and standard deviation of the PDS of all samples of each cancer type. If the mean PDS of a cancer sample ( = 1, 2, ⋯ ,13, where i is one of 13 cancer types) differed from the PDS mean of a normal sample by two or more standard deviations , that is, then this pathway is deregulated in the cancer sample; that is, the deregulated activity of this pathway in the cancer sample is significantly different from its activity in the normal sample. In this case, this pathway is defined as a deregulated pathway in the cancer sample, and the cancer sample is also called deregulated in this pathway.

Constructing Classifier to Distinguish Cancers from Normal
For better quantifying the differences between cancer samples and their tumor-adjacent tissue, we constructed random forest classifiers. Specifically, for each cancer type, we randomly selected 70% of all samples to train the random forest module, and tested the remaining 30% of samples. The performance ability was evaluated by the sensitivity (SN), specificity (SP), and accuracy (ACC), given by

Calculating the PDS
For any given pathway, Pathifier calculates a PDS for each cancer sample based on gene expression data [6]. The score represents the extent to which the activity of the pathway in a particular cancer tissue differs from that in normal cells of the same tissue.
Pathifier first calculates a score D P (s), which measures the extent to which the behavior of pathway P in sample s deviates from that in normal tissue. To determine the pathway deregulation score (PDS) of this pathway, the expression level of the d P gene belonging to pathway P is used. Each sample s is a point in the d P dimensional space, and the entire sample set forms a point cloud. A (nonlinear) "principal curve" [14] is calculated to capture the variation of this cloud. Then, each sample is projected onto the curve, and the PDS is defined as the distance D P (s) measured along the curve between the projection of the sample s and the projection of the normal sample [6].
Based on the above process, the PDS of each sample in each pathway can be calculated. For each pathway, we calculate the mean and standard deviation of the PDS of all samples of each cancer type. If the mean PDS of a cancer sample X c i (i = 1, 2, · · · , 13, where i is one of 13 cancer types) differed from the PDS mean X n i of a normal sample by two or more standard deviations s i , that is, then this pathway is deregulated in the cancer sample; that is, the deregulated activity of this pathway in the cancer sample is significantly different from its activity in the normal sample. In this case, this pathway is defined as a deregulated pathway in the cancer sample, and the cancer sample is also called deregulated in this pathway.

Constructing Classifier to Distinguish Cancers from Normal
For better quantifying the differences between cancer samples and their tumoradjacent tissue, we constructed random forest classifiers. Specifically, for each cancer type, we randomly selected 70% of all samples to train the random forest module, and tested the remaining 30% of samples. The performance ability was evaluated by the sensitivity (SN), specificity (SP), and accuracy (ACC), given by where TP is True Positive, FP is False Positive, TN is True Negative, and FN is False Negative. This process was performed 100 times, and the mean values of Sn, SP, and ACC were calculated finally.

Identifying Deregulated Cancer-Specific Deregulated Pathways
Based on the PDS scores, we use the R package "heatmap" (version 1.0.12) to perform unsupervised hierarchical cluster analysis on all cancer samples by a statistical model. For a certain pathway, the mean PDS of all cancer samples Y c and the mean PDS of all cancer samples of a certain cancer, Y c i , i = 1, 2, · · · , 13, is calculated. If then the cancer type is deregulated in this pathway, which is called a cancer-specific deregulated pathway, where s represents the standard deviation of the mean PDS Y c of this pathway in the 13 cancer types, that is,

Identifying Prognostic Pathways
According to the clinical data corresponding to the cancer samples, the survival outcomes (survival time and survival status) of the patients corresponding to the cancer samples were used as dependent variables, and a univariate Cox-PH model was established based on the PDSs. Pathways with p values less than 0.05 were identified as prognostic pathways.
Furthermore, a multifactorial Cox-PH model [15] for each cancer type was established based on the PDSs of the cancer prognostic pathways. According to the median risk score, the samples were divided into high-risk groups and low-risk groups, and Kaplan-Meier curves were generated. The "survival" (version 3.2-13) and "survminer" (version 0.4.9) packages in R/Bioconductor were used in the prognostic analysis.
In addition, gene expression data of breast cancer samples from the METABRIC database were used to verify the prognostic model for breast cancer, and the corresponding gene expression data of normal samples were also obtained from data for the 113 breast tumor-adjacent tissues in TCGA. Prognostic analysis was conducted after data standardization (min-max normalization) of the two databases.
For the identified prognostic pathways, the frequencies of the cancer prognostic pathways and their related pathways were determined based on the related pathway information in the KEGG database. The pathways with higher frequencies were used to construct a pathway-pathway association network to explore the relationships among the prognostic pathways.

The Heterogeneity of the PDS
For tissues from 13 cancer types and the corresponding tumor-adjacent tissues in TCGA, the PDS score of each pathway in KEGG was calculated by the Pathifier algorithm.
The t-SNE plot for the tissues from the 13 cancer types and the corresponding tumoradjacent tissues is shown in Figure 2. We can easily see that samples of the same cancer type are clearly clustered together and that different cancer types are separated from each other well, indicating the high heterogeneity of the PDS across cancer types. In addition, we found that there is a good distinction between the tissue samples of different cancer types and the corresponding tumor-adjacent tissues and that the tumor-adjacent tissues also cluster together (the tumor-adjacent tissues are circled in red in Figure 2). Biomedicines 2021, 9, 1502 6 of 16 types and the corresponding tumor-adjacent tissues and that the tumor-adjacent tissues also cluster together (the tumor-adjacent tissues are circled in red in Figure 2).  We constructed random forest classifier to distinguish cancer samples from their tumor-adjacent tissues. We can see that the performance of the random forest classifier is excellent to identify the cancer samples from their tumor-adjacent tissues. This result also shows that there is huge distance between cancer and adjacent tissue in terms of PDSs (see Table 2). Cancer-specific deregulated pathways were identified by a statistical model, and the number of cancer-specific deregulated pathways varied greatly (see Figure 3). The cancer-specific deregulated pathways of each cancer type are shown in Supplementary Table S1. The numbers of COAD-specific and PRAD-specific pathways are as high as approximately 30, while the numbers of LUAD-specific and STAD-specific pathways are relatively low. The cluster heatmap of the PDS scores of all cancer samples is shown in Figure 4, which also shows cancer-specific deregulated pathways in dark red and dark blue. It is easy to see that samples of the same cancer type are well clustered, that samples of different cancer types are separated from each other, and that cancer-specific deregulated pathways in different cancer types are significantly different.
Biomedicines 2021, 9, 1502 7 of 16 We constructed random forest classifier to distinguish cancer samples from their tumor-adjacent tissues. We can see that the performance of the random forest classifier is excellent to identify the cancer samples from their tumor-adjacent tissues. This result also shows that there is huge distance between cancer and adjacent tissue in terms of PDSs (see Table 2). Cancer-specific deregulated pathways were identified by a statistical model, and the number of cancer-specific deregulated pathways varied greatly (see Figure 3). The cancer-specific deregulated pathways of each cancer type are shown in Supplementary Table S1. The numbers of COAD-specific and PRAD-specific pathways are as high as approximately 30, while the numbers of LUAD-specific and STAD-specific pathways are relatively low. The cluster heatmap of the PDS scores of all cancer samples is shown in Figure 4, which also shows cancer-specific deregulated pathways in dark red and dark blue. It is easy to see that samples of the same cancer type are well clustered, that samples of different cancer types are separated from each other, and that cancer-specific deregulated pathways in different cancer types are significantly different.  In addition, we found that cancer-specific deregulated pathways are related to the corresponding cancer. For example, the genes in MAPK signaling pathway, a COADspecific pathway, encode a MAPKKK (Raf ) and a MAPKK (MEK1/2), which are frequently mutated in colon cancer [16]. The JAK/STAT signaling pathway is identified as both a STAD-specific and THCA-specific pathway. The JAK/STAT signaling pathway has been shown to be aberrantly activated in thyroid cancer. In addition, the role of deregulated JAK/STAT signaling in the molecular pathogenesis of gastric cancer has been shown [17,18]. The Wnt signaling pathway, a COAD-specific pathway, is significantly deregulated in COAD. Studies have shown that mutations and defects in the Wnt signaling pathway are often found in colon cancer. In addition, the Wnt signaling pathway is constitutively deactivated by the destruction complex, which is assembled around the tumor suppressors APC and Axin and targets β-catenin for destruction [19].   In addition, we observed the heterogeneity of different cancers by analyzing the distribution of the proportion of cancer samples deregulated in each pathway. As shown in Figure 5A, among the 13 cancer types, there is a significant difference in the percentage of cancer samples deregulated in each pathway. COAD, LUSC, and KIRC have relatively high percentages of deregulated cancer samples, with averages of approximately 89%, 86%, and 86%, respectively. The percentage of UCEC samples (85%) is also very high. In contrast, PRAD has the lowest percentage (42%).
Similarly, the distribution of the deregulated pathways in each cancer sample is also significantly different across cancer types ( Figure 5B). COAD, LUSC, KIRC, and UCEC have greater deregulation than other cancer types, and PRAD has significantly lower deregulation than other cancer types. This pattern is consistent with the results of related studies showing that COAD has the third highest incidence but second highest mortality [1]. In contrast, due to its slow growth, PRAD causes less damage to the human body and less distant metastasis, and most prostate cancers never cause symptoms or death [20,21].
shown to be aberrantly activated in thyroid cancer. In addition, the role of deregulated JAK/STAT signaling in the molecular pathogenesis of gastric cancer has been shown [17,18]. The Wnt signaling pathway, a COAD-specific pathway, is significantly deregulated in COAD. Studies have shown that mutations and defects in the Wnt signaling pathway are often found in colon cancer. In addition, the Wnt signaling pathway is constitutively deactivated by the destruction complex, which is assembled around the tumor suppressors APC and Axin and targets β-catenin for destruction [19].
In addition, we observed the heterogeneity of different cancers by analyzing the distribution of the proportion of cancer samples deregulated in each pathway. As shown in Figure 5A, among the 13 cancer types, there is a significant difference in the percentage of cancer samples deregulated in each pathway. COAD, LUSC, and KIRC have relatively high percentages of deregulated cancer samples, with averages of approximately 89%, 86%, and 86%, respectively. The percentage of UCEC samples (85%) is also very high. In contrast, PRAD has the lowest percentage (42%).
Similarly, the distribution of the deregulated pathways in each cancer sample is also significantly different across cancer types ( Figure 5B). COAD, LUSC, KIRC, and UCEC have greater deregulation than other cancer types, and PRAD has significantly lower deregulation than other cancer types. This pattern is consistent with the results of related studies showing that COAD has the third highest incidence but second highest mortality [1]. In contrast, due to its slow growth, PRAD causes less damage to the human body and less distant metastasis, and most prostate cancers never cause symptoms or death [20,21].

Prognostic Pathways
Prognostic pathways for each cancer type were identified by univariate Cox-PH regression analysis. There were significant differences in the number of prognostic pathways among the different types of cancer. There were more prognostic pathways in KIRC, THCA, STAD, and HNSC and fewer in LUAD, PRAD, and UCEC. For example, 60 and 7 prognostic pathways were identified in KIRC and in PRAD, respectively, which were the highest and lowest numbers (see Supplementary Table S2). This difference may arise because the clinical outcomes of different cancer types are very diverse.
In addition, different cancer prognostic pathways rarely overlap. Thus, prognostic pathways have strong cancer specificity, as confirmed by Uhlen et al. [22]. Here, the prognostic pathways in the 13 cancers are distributed among different types, including pathways related to metabolism, organismal systems, environmental information processing, genetic information processing, cellular processes, and human diseases (see Supplementary Table S3).
Among the prognostic pathways and their related pathways, the MAPK signaling pathway, apoptosis, glycolysis/gluconeogenesis, PI3K-Akt signaling pathway, and cell cycle pathway have higher frequencies in the 13 cancer types. In addition, other known carcinogenic pathways, such as the p53 signaling pathway, Wnt signaling pathway, and TGFbeta signaling pathway, also show high connectivity. In particular, the MAPK signaling pathway shows the highest connectivity among these prognostic pathways and is related to the prognostic pathways in all cancers. In addition, the MAPK signaling pathway promotes cell survival by a dual mechanism comprising the post-translational modification and inactivation of a component of the cell death machinery and the increased transcription of pro-survival genes [23].
Based on the assumption that similar diseases may be caused by deregulation of common oncogenic pathways, we constructed a cancer-cancer association network and found that most cancer types have very few shared prognostic pathways. In other words, most prognostic pathways are cancer specific. However, KIRC and KIRP shared the most common pathways, which indicates that cancer types with similar origin cell types share more oncogenic pathways (Table 3). In addition, a pathway-pathway association network for the prognostic pathways and their related pathways was constructed ( Figure 6). MAPK signaling pathways and apoptosis, cell cycle, PI3K-Akt, TGF-beta, Wnt, Jak-STAT, and p53 signaling pathways were tightly connected, indicating that synergistic deregulation of these oncogenic pathways may contribute to tumorigenesis. However, we found that cytokine-cytokine receptor interaction with higher frequencies has less contact with other pathways, which may also be an important pathway. There is not enough relevant research on this pathway, so further exploration and study are needed. In COAD, Notch signaling pathway was identified as prognostic pathway, which is consistent with the studies showing that the misregulation or loss of Notch signaling underlies a wide range of human disorders, from developmental syndromes to adult-onset diseases and cancer [24]. As an identified prognostic pathway, Toll like receptor signaling pathway is supported by a recent study showing that it is a potential therapeutic target in COAD, and correlated with COAD prognosis [25]. In addition, Toll like receptor signaling is involved in activating innate and adaptive immune responses and plays a critical role in COAD [25]. Elsewhere, we identified Cell cycle pathway as a prognostic pathway for HNSC. Cell cycle regulators are considered attractive targets in cancer therapy, and over expression of several of these cell cycle proteins induces or contributes to tumorigenesis, revealing their prominent oncogenic roles [26]. VEGF signaling pathway was identified as a prognostic pathway in HNSC. VEGF inhibitors play an increasingly important role in the management of solid tumors, and anti-VEGF therapy has established itself as one of the most important classes of drugs for the treatment of human cancer [27]. VEGF correlates with worse prognosis or outcome in general [28].
Moreover, N-Glycan biosynthesis, amino sugar, nucleotide sugar metabolism, and so on were prognostic pathways in BRCA; steroid hormone biosynthesis, insulin signaling pathway, and so on were prognostic pathways in KIRC; and other prognostic pathways among the different types of cancer may be specific pathways of different cancers. Our research indicates that they may be repurposed for the treatment of these cancers.
Biomedicines 2021, 9,1502 11 of Figure 6. Pathway-pathway association network for the prognosis pathways and their related pathways.
In COAD, Notch signaling pathway was identified as prognostic pathway, which consistent with the studies showing that the misregulation or loss of Notch signaling u derlies a wide range of human disorders, from developmental syndromes to adult-on diseases and cancer [24]. As an identified prognostic pathway, Toll like receptor signal pathway is supported by a recent study showing that it is a potential therapeutic targe COAD, and correlated with COAD prognosis [25]. In addition, Toll like receptor signal is involved in activating innate and adaptive immune responses and plays a critical r in COAD [25]. Elsewhere, we identified Cell cycle pathway as a prognostic pathway HNSC. Cell cycle regulators are considered attractive targets in cancer therapy, and o expression of several of these cell cycle proteins induces or contributes to tumorigene revealing their prominent oncogenic roles [26]. VEGF signaling pathway was identif as a prognostic pathway in HNSC. VEGF inhibitors play an increasingly important r in the management of solid tumors, and anti-VEGF therapy has established itself as o of the most important classes of drugs for the treatment of human cancer [27]. VEGF c relates with worse prognosis or outcome in general [28].
Moreover, N-Glycan biosynthesis, amino sugar, nucleotide sugar metabolism, a so on were prognostic pathways in BRCA; steroid hormone biosynthesis, insulin signal pathway, and so on were prognostic pathways in KIRC; and other prognostic pathwa among the different types of cancer may be specific pathways of different cancers. O research indicates that they may be repurposed for the treatment of these cancers.

Prognostic Models Based on Pathways
The Kaplan-Meier curves for all cancers are shown in Figure 7. It can be seen fr the figures that the prognoses of patients in the high-risk score group are significantly l favorable (p < 0.05) than those of patients in the low-risk score group in 12 cancer typ (except BLCA) in TCGA; this finding verifies the effectiveness of the prognostic mo based on prognostic pathways.

Prognostic Models Based on Pathways
The Kaplan-Meier curves for all cancers are shown in Figure 7. It can be seen from the figures that the prognoses of patients in the high-risk score group are significantly less favorable (p < 0.05) than those of patients in the low-risk score group in 12 cancer types (except BLCA) in TCGA; this finding verifies the effectiveness of the prognostic model based on prognostic pathways.
In addition, the validation data in the METABRIC database were analyzed using the same process. The breast cancer samples in METABRIC were divided into a high-risk score group and a low-risk score group based on the prognostic model constructed from the 21 breast cancer prognostic pathways identified in TCGA. There was a significant difference in survival between these two groups (log-rank test, p < 2.2 × 10 −16 , see Figure 8). In addition, the validation data in the METABRIC database were analyzed using the same process. The breast cancer samples in METABRIC were divided into a high-risk score group and a low-risk score group based on the prognostic model constructed from the 21 breast cancer prognostic pathways identified in TCGA. There was a significant difference in survival between these two groups (log-rank test, p < 2.2 × 10 −16 , see Figure 8).   In addition, the validation data in the METABRIC database were analyzed using the same process. The breast cancer samples in METABRIC were divided into a high-risk score group and a low-risk score group based on the prognostic model constructed from the 21 breast cancer prognostic pathways identified in TCGA. There was a significant difference in survival between these two groups (log-rank test, p < 2.2 × 10 −16 , see Figure 8).

Behavior of Prognostic Pathways among Cancer Subtypes
In addition, we computed the mean values of PDS for the 21 identified prognostic pathways among breast cancer Pam50 subtypes and plotted their distribution in Figure 1. Obviously, the PDS scores of the identified prognostic pathways are higher in Basal-like subtype with poor prognosis than other subtypes, while the PDS sores of subtypes of LumA and Normal-like are relatively lower. As we all know, these two subtypes LumA and Normal-like are usually correlated with low degree of malignancy and good prognosis. This means that the PDS score of the identified prognostic pathways can reflect the degree of malignancy and prognosis of breast cancer among subtypes. The higher the PDS score of the prognostic pathways, the more serious the pathway deregulated and the worse the prognosis (see Figure 9).

Behavior of Prognostic Pathways among Cancer Subtypes
In addition, we computed the mean values of PDS for the 21 identified prognostic pathways among breast cancer Pam50 subtypes and plotted their distribution in Figure 1. Obviously, the PDS scores of the identified prognostic pathways are higher in Basal-like subtype with poor prognosis than other subtypes, while the PDS sores of subtypes of LumA and Normal-like are relatively lower. As we all know, these two subtypes LumA and Normal-like are usually correlated with low degree of malignancy and good prognosis. This means that the PDS score of the identified prognostic pathways can reflect the degree of malignancy and prognosis of breast cancer among subtypes. The higher the PDS score of the prognostic pathways, the more serious the pathway deregulated and the worse the prognosis (see Figure 9).

Genes in Prognostic Pathways
We examined the annotations of the gene sets of these 21 pathways in breast cancer. The occurrence frequency of genes in all prognostic pathways of breast cancer was statistically analyzed, and the genes with the highest frequency were selected, for example, the mitogen-activated protein kinases MAPK1, MAPK3, and MAP2K1; G-protein-related genes GNAQ and HRAS; and other oncogenes, such as RhoA, ROCK1, and ROCK2. Increased expression and/or activation of HRAS is often associated with tumor aggressiveness in breast cancer. HRAS induces the invasion and migration of MCF10A human breast epithelial cells, and HRAS induces cell proliferation and phenotypic transformation [29]. The KRAS, BRAF, and PIK3CA genes activate the ERK/MAPK pathway [30]. The activation of NHE1 and subsequent invasion induced by serum deprivation in metastatic human breast cells is coordinated by a sequential RhoA/p160ROCK/p38 MAPK signaling pathway gated by direct phosphorylation of protein kinase A and inhibition of RhoA [31]. In addition, HRAS, KRAS, AKT1, PIK3CA, TP53, and 24 other genes (see Supplementary Table  S4) in these 21 prognostic pathways of BRCA have been proven to be BRCA driver genes, accounting for 28.3% of the total complement of BRCA driver genes [32].
Impressively, for prognosis-related signaling pathways with high correlations, key genes in the MAPK and TGF-β signaling pathways are associated with many cancer types. Through mutation of the pathway members or aberrant activation of the downstream

Genes in Prognostic Pathways
We examined the annotations of the gene sets of these 21 pathways in breast cancer. The occurrence frequency of genes in all prognostic pathways of breast cancer was statistically analyzed, and the genes with the highest frequency were selected, for example, the mitogen-activated protein kinases MAPK1, MAPK3, and MAP2K1; G-protein-related genes GNAQ and HRAS; and other oncogenes, such as RhoA, ROCK1, and ROCK2. Increased expression and/or activation of HRAS is often associated with tumor aggressiveness in breast cancer. HRAS induces the invasion and migration of MCF10A human breast epithelial cells, and HRAS induces cell proliferation and phenotypic transformation [29]. The KRAS, BRAF, and PIK3CA genes activate the ERK/MAPK pathway [30]. The activation of NHE1 and subsequent invasion induced by serum deprivation in metastatic human breast cells is coordinated by a sequential RhoA/p160ROCK/p38 MAPK signaling pathway gated by direct phosphorylation of protein kinase A and inhibition of RhoA [31]. In addition, HRAS, KRAS, AKT1, PIK3CA, TP53, and 24 other genes (see Supplementary Table S4) in these 21 prognostic pathways of BRCA have been proven to be BRCA driver genes, accounting for 28.3% of the total complement of BRCA driver genes [32].
Impressively, for prognosis-related signaling pathways with high correlations, key genes in the MAPK and TGF-β signaling pathways are associated with many cancer types. Through mutation of the pathway members or aberrant activation of the downstream genes (i.e., RAS, SRC, and PI3K) and receptor kinases, the MAPK signaling pathway is overactivated in different malignancies. The activators and components of the MAPK pathway-Raf, RAS, BRAF, MEK, and ERK-are frequently mutated in colon, melanoma, ovarian, thyroid, colorectal, and non-small cell lung cancers [33]. The TGF-β signaling pathway has multiple gene targets, and TGF-β performs its critical functions in proliferation and suppression by targeting the c-Myc, Cyclin A/B/D/E, CDK1/2/4/6, p15INK4B, p21CIP1, and p27KIP1 genes [34].

Discussion
We applied Pathifier, a recently introduced method for analysis of transcriptomic data, to perform a comprehensive pan-cancer study across 13 different tumor types with more than 6000 cancer samples in TCGA. For each cancer type, the cancer-specific deregulated pathways and prognostic pathways were found to differ greatly among cancers, reflecting the heterogeneity across cancer types. In addition, we constructed a prognostic pathwaybased prognostic model, which was well validated in an independent data set in the METABRIC database. The prognostic model accurately distinguished the high-risk and low-risk score groups, indicating the broad applicability of our model as a prognostic model. Then, for any given pair of cancer types, we found that there is little overlap between the two lists of pathway-based biomarkers. These results highlight the observation that cancer is a highly heterogeneous disease and that, therefore, personalized treatment is necessary for patients with different cancer types.
Although we developed an approach for the classification and identification of prognostic pathways in different cancer types, our study has a few limitations. We only used normal samples in TCGA, and a certain number of normal samples are needed to estimate PDS more accurately. In addition, some types of cancer with high heterogeneity can be studied further by subtyping based on pathways.
In summary, Pathifier-based research can allow more accurate and robust identification of prognostic pathways in cancer samples and is expected to improve precision treatment for different cancers. We also expect that our method, with its good performance, will be applicable to other cancers. Future validation in other cancer types with large sample sizes is desired.

Conclusions
In this study, we analyzed transcriptomic data of 13 different cancer types in TCGA database to identify cancer-specific deregulated pathways and prognostic pathways based on pathway deregulation scores. First, individual-specific pathway deregulation scores for each sample were inferred. Second, the cancer-specific deregulated pathways and prognostic pathways of different cancer types were identified. Finally, we constructed and evaluated the pathway-based prognostic prediction model.
There are indeed several papers building PDS-based Cox models, and all of them are focused on a single type of cancer [35]. However, we performed the pathway based personalized analysis on pan-cancer including 13 types of cancer. The results showed that the individual-specific deregulated pathways score can clearly distinguish different cancer types and their tumor-adjacent tissues. The cancer-specific deregulated pathways and prognostic pathways of different cancer types have high heterogeneity. The cancers with more deregulation pathways tend to be malignant and have worse prognoses. The prognostic model based on pathways successfully predicted survival and prognosis both on training data and validation data. We believe that our prognostic models based on pathways are reliable for prognostic prediction. Most of these results cannot be obtained by PDS-based analysis on only a single cancer type. These are the highlights of our study. In addition, this work may improve the development of precision medicine.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10.3 390/biomedicines9111502/s1, Table S1: The cancer-specific deregulated pathways of each cancer type. Table S2: The prognostic pathways among the different types of cancer. Table S3: The prognostic pathways in the 13 cancers are distributed among different types. Table S4: The driver genes in these 21 prognostic pathways of BRCA.