Survival and Enrichment Analysis of Epithelial–Mesenchymal Transition Genes in Bladder Urothelial Carcinoma

The escalating prevalence of bladder cancer, particularly urothelial carcinoma, necessitates innovative approaches for prognosis and therapy. This study delves into the significance of genes related to epithelial–mesenchymal transition (EMT), a process inherently linked to carcinogenesis and comparatively better studied in other cancers. We examined 1184 EMT-related gene expression levels in bladder urothelial cancer cases through the TCGA dataset. Genes shown to be differentially expressed in relation to survival underwent further network and enrichment analysis to uncover how they might shape disease outcomes. Our in silico analysis revealed a subset of 32 genes, including those significantly represented in biological pathways such as VEGF signaling and bacterium response. In addition, these genes interact with genes involved in the JAK-STAT signaling pathway. Additionally, some of those 32 genes have been linked to immunomodulators such as chemokines CCL15 and CCL18, as well as to various immune cell infiltrates. Our findings highlight the prognostic utility of various EMT-related genes and identify possible modulators of their effect on survival, allowing for further targeted wet lab research and possible therapeutic intervention.

Cells involved in the invasion of bladder cancer alter their surroundings and can also become transiently and reversibly plastic, turning into mesenchymal stem cells.This is the epithelial-mesenchymal transition (EMT), which is among the most relevant paradigm shifts in how we view cancer progression and can combat its growth.During EMT, epithelial cells lose their polarized, adhesive characteristics and gain a mesenchymal phenotype, enabling them to migrate and invade surrounding tissues [30,31].Transcription factors including Snail, Zeb, and Twist aid in this process by repressing E-cadherin, an epithelial transmembrane protein [32].In contrast to epithelial cells, mesenchymal carcinoma cells exhibit specific metabolic needs.As they undergo EMT, cancer cells finely regulate multiple metabolic pathways to support the demands of rapid cell proliferation [33].The molecular pathways shown to be associated with EMT include Snail/Slug, Twist, Six1, Cripto, TGF-β, and Wnt/β-catenin [34].The literature shows how genes such as CDH1, ZEB1, TGFB, CDH2, VIM, and TIMP1 have been linked to inducing the EMT phenotype, driving cell migration, and adapting to changing demands on the primary tumor [33,35,36].
In bladder cancer, various microRNAs (miRNAs) have been found to regulate proteins such as Smad7 or Twist1, either promoting or disrupting EMT and metastasis [37].Understanding the mechanisms underlying EMT is crucial for developing targeted therapies to control cancer metastasis and may prove useful in treatment options going forward.Comparatively, there has been less work in this field in bladder cancer than in other cancers.This paper examines a multitude of EMT-related genes in relation to not only outcomes, but also the biologic networks and pathways which allow these genes to influence carcinogenesis and affect these outcomes.

Selection of Genes
To have a comprehensive overview of genes involved in the epithelial-mesenchymal transition, dbEMT 2.0 (http://dbemt.bioinfo-minzhao.org/(accessed on 1 September 2022)), a database curated for focus on EMT-related genes, was utilized.A spreadsheet was generated with 1184 genes listed on the database, obtained from an initial PubMed abstract query for "Epithelial Mesenchymal Transition Genes" with the results mined for unique genes linked to EMT (see Supplemental Materials).

Survival Analysis
Publicly available cases from the NIH-funded "The Cancer Genome Atlas" (TCGA) project were utilized to examine gene expression pertaining to survival in bladder urothelial cancer (data portal: https://portal.gdc.cancer.gov/projects/TCGA-BLCA(accessed on 1 September 2022)).Kaplan-Meier plots were generated through the R2 platform (https: //hgserver1.amc.nl/cgi-bin/r2/main.cgi (accessed on 1 September 2022)) using the TCGA dataset for "Bladder Urothelial Carcinoma", n = 407.The built-in "KaplanScan" algorithm was used to divide mRNA gene expression into "high" versus "low" categories (n values for each based on KaplanScan groupings of expression).Overall survival was compared to follow-up time in months being analyzed.For multiple hypothesis testing, p-values were adjusted to a false discovery rate (FDR) of 0.05.

Expression Analysis
To compare normal versus tumor levels of those EMT genes which showed differential expression regarding survival, mRNA levels for the "Bladder Urothelial Carcinoma" TCGA dataset were analyzed with a Welch's t-test through the UCSC Xena platform.This platform is a genome browser and visualization tool of genomic and phenotypic data for both public and private datasets (https://xena.ucsc.edu/(accessed on 1 September 2022)).An FDR cutoff of 0.05 was used for significance.Violin plots were generated to visualize expression values.
Through the aforementioned R2 platform, the expression levels of the previously mentioned genes which showed differential expression in regard to survival were analyzed.mRNA gene levels were compared in respect to pathologic staging and, given the small n associated with cases of stage 1 bladder urothelial cancer in the TCGA dataset (n = 2), Kruskal-Wallis analysis with corresponding pairwise Welch's t-tests were used.Again, p-values were adjusted and an FDR cutoff of 0.05 was used.
Subsequently, the network analysis highlighted certain genes alongside the aforementioned EMT genes.These genes underwent enrichment analysis to shed light on their potential involvement in specific biological processes, using annotations from the Gene Ontology (GO), Kyoto Encyclopedia of Genes, and Genomes (KEGG) databases.It was carried out through the Metascape platform (http://metascape.org(accessed on 1 October 2022)).The criteria for this analysis included a minimum overlap of 3 genes and an enrichment threshold of 1.5.Statistical significance was set at p < 0.05.

Tumor Immune Microenvironment Analysis
The relationship between the EMT-related genes differentially expressed in connection with survival and the immune system in cases of bladder urothelial cancer was examined through the TISIDB and TIMER platform.Spearman's correlations between mRNA gene expression of the EMT genes and clinically relevant immunoinhibitor and cytokine gene expressions were calculated and visualized.Those with a rho of >|0.40| were considered meaningful with a p-value of >0.01.Devolution methods were used to estimate the immune infiltration of a wide variety of immune cells based on gene expression on the TIMER platform.The Spearman's correlation was adjusted based on tumor purity, with a rho of >|0.30| visualized.

Survival Analysis
A myriad of EMT-related genes showed differential expression in normal versus cancer tissues.Kaplan-Meier plots (Figure 1) were generated using TCGA data on "Bladder Urothelial Carcinoma" for each of the 1184 genes mined from the EMTdb, with only 32 meeting significant cutoff following FDR correction (Table 1).

Expression Analysis
Out of the 32 genes highlighted in the survival analysis, some of them also showed statistically significant expression levels when comparing normal versus tumor samples.This can be seen in the density plot (Figure 2) or in more detail in the sample violin plots (Figure 3); the rest of the genes and p-values from the violin plots in Supplemental Materials can be seen in Table 4.Some genes also showed significant differential expression based on tumor stage (Figure 4).(A,B) show subset of Kaplan-Meier plots, based on gene expression levels of ADAM17 and ADER respectively.See Supplemental Figure S1 for Kaplan-Meier plots of all genes shown to be differentially expressed.
Table 1.Tabulated data from Kaplan-Meier plots based on mRNA expression of EMT-related genes which showed to be differentially expressed in regard to overall survival.Genes with "high" mRNA expression leading to worse prognosis include those in Table 2. On the other hand, genes which showed "low" mRNA expression leading to worse prognosis include those in Table 3.

Expression Analysis
Out of the 32 genes highlighted in the survival analysis, some of them also showed statistically significant expression levels when comparing normal versus tumor samples.This can be seen in the density plot (Figure 2) or in more detail in the sample violin plots (Figure 3); the rest of the genes and p-values from the violin plots in Supplemental Materials can be seen in Table 4.Some genes also showed significant differential expression based on tumor stage (Figure 4).From the XENA browser, 19 normal tissues were compared with 407 TCGA cases.Following FDR correction (cut-off p-value is 0.003143), 10 genes were shown to be differentially expressed when considering overall survival (Table 5).
Several genes approach significance post-correction (Table 6).Out of the same set of EMT-genes regarding survival, 10 out of the 32 showed differential expression when considering pathologic staging (Table 7).From the XENA browser, 19 normal tissues were compared with 407 TCGA cases.Following FDR correction (cut-off p-value is 0.003143), 10 genes were shown to be differentially expressed when considering overall survival (Table 5).Network analysis was employed to identify genes with interconnected relationships, utilizing factors such as physical interactions, co-localization, and co-expression data (Figure 5).The genes chosen for constructing each network analysis were the previously mentioned EMT-associated genes that displayed distinct expression patterns in survival outcomes.

Enrichment Analysis
Enrichment analysis was executed on two distinct gene sets: first, on the genes pinpointed in the network analysis, and additionally, on the genes situated at the periphery ("outer rim") of the network analysis, apart from the initial EMT genes (Figure 6).Enrichment analysis of the initial 32 highlighted EMT genes showed numerous pathways significantly overrepresented in the cohort of genes, such as endothelial cell migration and regulation of cell shape, in line with the known physiologic processes involved with the epithelial-mesenchymal transition.However, other less canonically associated ones such as defense response to bacterium and carbohydrate response were also uncovered through the analysis.Regarding the genes listed in the network analysis, similar biologic processes

Enrichment Analysis
Enrichment analysis was executed on two distinct gene sets: first, on the genes pinpointed in the network analysis, and additionally, on the genes situated at the periphery ("outer rim") of the network analysis, apart from the initial EMT genes (Figure 6).Enrichment analysis of the initial 32 highlighted EMT genes showed numerous pathways significantly overrepresented in the cohort of genes, such as endothelial cell migration and regulation of cell shape, in line with the known physiologic processes involved with the epithelial-mesenchymal transition.However, other less canonically associated ones such as defense response to bacterium and carbohydrate response were also uncovered through the analysis.Regarding the genes listed in the network analysis, similar biologic processes such as the VEGFA signaling pathway was highlighted alongside other related ones such as HIF-1 survival signaling.Through examining for correlation between various immunomodulators and chemokines (see Appendix A, Table A1 for list), the following plots were generated (Figure 7), with the statistically and clinically significant ones (p < 0.05 post correction, |rho| > 0.4) being shown.Out of the 32 genes, 12 were shown to have at least significant correlation with an immunomodulator or chemokines (TBX3, NRP2, FN1, FOXA1, FBP1, ANXA1, LAMC2, HOOK1, NES, PTPN6, RUNX2), with the first four having at least 25 different immunomodulators or cytokines to be significantly correlated with (Table 8).
Figure 6.Enrichment analysis of the EMT genes involved in survival (a) as well as enrichment analysis highlighted in the aforementioned network analysis (b).Labeled are statistically enriched terms which are biologic pathways selected from KEGG and other hallmark gene sets.Additionally, for both the EMT genes and network highlighted gene ((c) and (d), respectively), the representative terms were converted into a network layout with each circle representing a single biologic process, grouped into larger "themes" as labeled in the color key.The size of the circle represents the amount of analyzed genes within that term.

Immunomodulator, Cytokine
Through examining for correlation between various immunomodulators and chemokines (see Appendix A, Table A1 for list), the following plots were generated (Figure 7), with the statistically and clinically significant ones (p < 0.05 post correction, |rho| > 0.4) being shown.Out of the 32 genes, 12 were shown to have at least significant correlation with an immunomodulator or chemokines (TBX3, NRP2, FN1, FOXA1, FBP1, ANXA1, LAMC2, HOOK1, NES, PTPN6, RUNX2), with the first four having at least 25 different immunomodulators or cytokines to be significantly correlated with (Table 8).

Discussion
Overall, we see a robust cohort of EMT-related genes that are differentially expressed as pertaining to survival in bladder urothelial carcinoma.RUNX2, a gene most associated with cartilage production, has been linked to pancreatic cancer and to breast cancer progression through modulation of MicroRNAs and the metastasis-associated 1 (MTA1)/NuRD complex [38,39].By activating the Wnt signaling pathway, ARMC8 has been linked to increased invasion in cutaneous squamous cell carcinoma and lung cancer [40][41][42].Also implicated in the Wnt pathway, as well as the VEGF pathway, CEMIP (formerly known as KIA1199) has emerged via immunohistochemical studies as a possible biomarker for a variety of cancers [43][44][45].The phosphatase INPP4B is an inhibitor of the Wnt pathway; its knockout has been linked to increased proliferation [46].Knockout or downregulation of the transcription factor gene FOXA1 has also been linked to worse prognosis, altering the carcinogenic activity of the Snail/Twist1 axis in breast cancer as well as prostate cancer [47,48].TBX3 has been linked to breast and cervical cancer proliferation, but it also inhibits the activity of the YAP/TAZ signaling pathway involved with cellular regeneration and growth [49][50][51].
Of note, PTPN6, a tyrosine phosphatase, affects cell growth and carcinogenesis in both bladder and colon cancer.Interestingly, overexpression of PTPN6 has been associated with worse prognosis and increased metastasis, while the opposite was seen in other studies [52,53].Additionally, there were cases where our findings were incongruent with other cancers.PEBP4 expression was correlated with metastasis in colorectal and breast cancer but is significantly associated with better outcomes in our analysis.AGER (advanced glycosylation end-product-specific receptor) was shown to be associated with increased cell migration in cervical cancer, but the opposite as per our analysis [54].The pro-carcinogenic effect that ART1 has in colorectal cancer was not seen in our Bladder TCGA analysis [55].
Interestingly enough, some of the genes shown to be differentially expressed in regard to survival were not differentially expressed when comparing normal tissue to tumor; nor did their expression correlate to pathologic staging.For the pathologic staging, only FN1, NRP2, FOXA1, NES, AGER, RUNX2, PTPN6, FBP1, TBX3, and STIM2 were shown to be significantly different.
While there may be differences in Stage I expression versus the other stages, our sample size (n = 2) was too small to reliably detect any except for in the expression levels of FOXA1.Instead, many of the differences were seen between Stage 2 versus 3 and 4. Histopathologically, this correlates with the expansion of the cancer through the muscle with possible lymph node and systemic metastasis.During this process, the pro-mobility changes which accompany the EMT would be fully evident.Our findings were mostly consistent with the survival analysis results, such as with RUNX2 which showed worse prognosis with higher levels, increasing expression from Stages 1 to 4.
Through the network analysis, we were able to see more genes through which the EMTrelated genes can modulate and affect carcinogenesis.For example, VEGFA was indicated, a well-known member of a family of growth factors which has proliferative and anti-apoptotic effects [56].It has been found to be highly expressed in hepatocellular carcinoma (HCC) and triple-negative breast cancer (TNBC).It is associated with worse prognosis in HCC [57] and significantly lower progression-free survival in TNBC treated with chemotherapy [58].Additionally, SFRP2 was implicated, and recent immunohistochemistry work has linked this gene as a possible early biomarker of pancreatic and colon cancer [59,60].Another gene highlighted in our analysis, CAV1, codes for a scaffolding protein and often predicts poor prognosis [61].
Enrichment analysis of the EMT-related genes further highlights the multiple modalities through which the EMT process itself encourages cancer growth and metastasis.Included among the highlighted cellular processes in our analysis are those regarding the establishment or maintenance of epithelial cell apical/basal polarity and positive regulation of cell migration.The VEGFA-VEGFR2 signaling pathway and associated angiogenesis, among the seminal hallmarks of cancer, were highlighted as the most overly enriched biologic process (p < 5.0 × 10 −7 ).Other biological processes less typically associated with EMT such as "defense response to bacterium" and "signaling by interleukins" were picked up by the analysis.However, the same cytokine and inflammatory-mediated response to bacteria can help explain another way EMT-related genes play a role in cancer progression.The increased angiogenic and metabolic changes associated with certain proinflammatory states, which would help fend off a bacterial infection, can serve as fertile ground for initiation of cancer metastasis.
Among the most salient advances in our understanding of cancer progression is the complex role the immune system plays.Notable mediators of the immune system repeatedly occurred in our analysis.TGFBR1, a receptor for the TBFB growth factor, and CSF1R, a cytokine receptor, have been linked to pro-survival activity [62][63][64][65].In other cancers, the chemokine CCL15 has been linked to tumor-associated macrophage recruitment and CCL18 has been linked to an immunosuppressive tumor microenvironment, allowing for evasion of the host's immune system [66][67][68][69].The most common immune cell type with a significant correlation was CD4+ T cells.While classically thought of as having an anti-tumor role, the varied subtypes not completely covered in our current deconvolution methods may play a paradoxical pro-tumor role.It is possible they do so by decreasing activation of other immune cells such as macrophages, warranting further study into the specific gene to immune cell interaction [70].
While this research represents a step forward in investigating the expression patterns of EMT-related genes, there exist certain limitations to these discoveries.These assessments rely on the levels of mRNA in a stable state as a proxy for protein levels.To partially address this, proteomic analysis of the highlighted genes in our study were examined in bladder cancer models through the depmap.orgportal (https://depmap.org/portal/(accessed on 12 September 2023)) and Cancer Cell Line Encyclopedia (CCLE), verifying expression of highlighted genes.Moreover, the functional behavior of these mRNA molecules could be subject to additional regulation by post-translational elements.Additionally, the techniques used to decipher immune cell infiltrations might be influenced by the tumor's purity, contingent upon the specific formula employed.Ongoing efforts are dedicated to refining methods for accurately estimating cellular infiltrations based on mRNA levels.Overall, this study potentially serves as a starting point for future investigations aimed at directly scrutinizing the highlighted relationship between gene expression and prognosis, extending to the protein or enzymatic realm.

Figure 1 .
Figure 1.Subset of Kaplan-Meier plots based on mRNA expression of EMT-related genes which showed to be differentially expressed in relation to overall survival.Each plot lists the numbers of pts in "high" versus "low" mRNA expression cohorts based on the KaplanScan grouping algorithm.(A,B) show subset of Kaplan-Meier plots, based on gene expression levels of ADAM17 and ADER respectively.See Supplemental Figure S1 for Kaplan-Meier plots of all genes shown to be differentially expressed.

Figure 1 .
Figure 1.Subset of Kaplan-Meier plots based on mRNA expression of EMT-related genes which showed to be differentially expressed in relation to overall survival.Each plot lists the numbers of pts in "high" versus "low" mRNA expression cohorts based on the KaplanScan grouping algorithm.(A,B) show subset of Kaplan-Meier plots, based on gene expression levels of ADAM17 and ADER respectively.See Supplemental Figure S1 for Kaplan-Meier plots of all genes shown to be differentially expressed.

Figure 2 .
Figure 2. Density plot showing expression of normal versus tumor expressions for all selected genes which had both TCGA and GTEX data.

Figure 2 .
Figure 2. Density plot showing expression of normal versus tumor expressions for all selected genes which had both TCGA and GTEX data.

Figure 3 .
Figure 3. Subset of violin plot showing mRNA expression between primary tumors (Blue, n = 407) and surrounding normal tissue (Red, n = 19).For full listing, see Supplemental Material Figure S2.(A,B) show subset of violin plots, showing primary versus normal solid tissue violin plots for NR2F2 and TBX3 respectively.

Figure 3 .
Figure 3. Subset of violin plot showing mRNA expression between primary tumors (Blue, n = 407) and surrounding normal tissue (Red, n = 19).For full listing, see Supplemental Figure S2.(A,B) show subset of violin plots, showing primary versus normal solid tissue violin plots for NR2F2 and TBX3 respectively.

Genes 2023 , 24 Figure 5 .
Figure 5. Network analysis showing 20 of the most highlighted genes.The analysis was conducted utilizing factors such as physical and anticipated interactions, protein co-localization, and shared DNA domains, alongside various other attributes.Connections between genes based on physical interactions are highlighted in red, shared pathways in blue, shared protein domains in yellow, predicted in orange, co-expression in purple, genetic interactions in green, and co-localization in navy.

Figure 5 .
Figure 5. Network analysis showing 20 of the most highlighted genes.The analysis was conducted utilizing factors such as physical and anticipated interactions, protein co-localization, and shared DNA domains, alongside various other attributes.Connections between genes based on physical interactions are highlighted in red, shared pathways in blue, shared protein domains in yellow, predicted in orange, co-expression in purple, genetic interactions in green, and co-localization in navy.

Figure 6 .
Figure 6.Enrichment analysis of the EMT genes involved in survival (a) as well as enrichment analysis highlighted in the aforementioned network analysis (b).Labeled are statistically enriched terms which are biologic pathways selected from KEGG and other hallmark gene sets.Additionally, for both the EMT genes and network highlighted gene ((c) and (d), respectively), the representative terms were converted into a network layout with each circle representing a single biologic process, grouped into larger "themes" as labeled in the color key.The size of the circle represents the amount of analyzed genes within that term.

Figure 7 .
Figure 7. Sample Spearman's correlation graphs for TBX3 (full table of all EMT-related genes in Appendix A, Table A2), grouped by EMT-gene of interest.Only correlations shown to be statistically significant (p < 0.05 post multiple hypothesis correction) and deemed clinically significant |rho| > 0.4 are depicted both in the figure and included in the summary below.Subset shown, with (A) showing TBX3 expression correlation with CXCL16 expression and (B) showing correlation between TBX3 and CXCL13 expression; for other genes NRP2, FN1, FOXA1, FBP1, ANXA1, LAMC2, HOOK1, NES, RUNX2, and PTPN6 see Supplemental Figure S3.

Figure 7 .
Figure 7. Sample Spearman's correlation graphs for TBX3 (full table of all EMT-related genes in Appendix A, Table A2), grouped by EMT-gene of interest.Only correlations shown to be statistically significant (p < 0.05 post multiple hypothesis correction) and deemed clinically significant |rho| > 0.4 are depicted both in the figure and included in the summary below.Subset shown, with (A) showing TBX3 expression correlation with CXCL16 expression and (B) showing correlation between TBX3 and CXCL13 expression; for other genes NRP2, FN1, FOXA1, FBP1, ANXA1, LAMC2, HOOK1, NES, RUNX2, and PTPN6 see Supplemental Figure S3.

Figure 8 .
Figure 8. Correlations of various gene expression levels to tumor purity and immune infiltration.Only correlations to immune infiltration shown to be statistically significant (p < 0.05 post multiple hypothesis correction) and deemed clinically significant |rho| > 0.3 are depicted both in the figure and included in the summary below.Subset shown, for other genes ANXA1, ARMC8, FBP1, FN1, FOXA1, LAMC2, MAP2K1, NRP2, PTPN6, RUNX2, STIM2, TBX3 see Supplemental Images.(A) shows correlation between ADAM17 gene expression and tumor purity (amount of non-cancerous cells in tumor sample) and (B) shows correlation between ADAM17 expression level and calculated macrophage infiltration.

Figure 8 .
Figure 8. Correlations of various gene expression levels to tumor purity and immune infiltration.Only correlations to immune infiltration shown to be statistically significant (p < 0.05 post multiple hypothesis correction) and deemed clinically significant |rho| > 0.3 are depicted both in the figure and included in the summary below.Subset shown, for other genes ANXA1, ARMC8, FBP1, FN1, FOXA1, LAMC2, MAP2K1, NRP2, PTPN6, RUNX2, STIM2, TBX3 see Supplemental Figure S2.(A) shows correlation between ADAM17 gene expression and tumor purity (amount of non-cancerous cells in tumor sample) and (B) shows correlation between ADAM17 expression level and calculated macrophage infiltration.

Table 2 .
List of genes and p-values for which high mRNA expression is correlated with worse prognosis.

Table 3 .
List of genes and p-values for which low mRNA expression is correlated with worse prognosis.

Table 4 .
Violin plot genes and p-values.

Table 4 .
Violin plot genes and p-values.

Table 5 .
List of genes differentially expressed regarding overall survival when comparing normal versus tumor samples with p-values.

Table 5 .
List of genes differentially expressed regarding overall survival when comparing normal versus tumor samples with p-values.

Table 6 .
List of genes that approach significance after FDR correction with p-values (cutoff p value is 0.003143).

Table 7 .
List of stages that showed significantly different expression of the relevant gene based off TCGA datasets.

Table 8 .
Summary of Spearman's correlation data between EMT-related gene and various immunomodulators, grouped by EMT-gene of interest.

Table 8 .
Summary of Spearman's correlation data between EMT-related gene and various immunomodulators, grouped by EMT-gene of interest.

Table 9 .
Summary of Spearman's correlation data between EMT-related gene and various immune cell types, grouped by EMT-gene of interest.

Table 9 .
Summary of Spearman's correlation data between EMT-related gene and various immune cell types, grouped by EMT-gene of interest.