Impact TMPRSS2–ERG Molecular Subtype on Prostate Cancer Recurrence

Currently, seven molecular subtypes of prostate cancer (PCa) are known, the most common of which being the subtype characterized by the presence of the TMPRSS2–ERG fusion transcript. While there is a considerable amount of work devoted to the influence of this transcript on the prognosis of the disease, data on its role in the progression and prognosis of PCa remain controversial. The present study is devoted to the analysis of the association between the TMPRSS2–ERG transcript and the biochemical recurrence of PCa. The study included two cohorts: the RNA–Seq sample of Russian patients with PCa (n = 72) and the TCGA–PRAD data (n = 203). The results of the analysis of the association between the TMPRSS2–ERG transcript and biochemical recurrence were contradictory. The differential expression analysis (biochemical recurrence cases versus biochemical recurrence-free) and the gene set enrichment analysis revealed a list of genes involved in major cellular pathways. The GNL3, QSOX2, SSPO, and SYS1 genes were selected as predictors of the potential prognostic model (AUC = 1.000 for a cohort of Russian patients with PCa and AUC = 0.779 for a TCGA–PRAD cohort).


Introduction
Currently, seven major molecular subtypes of prostate cancer (PCa), identified by the Cancer Genome Atlas Research Network, TCGA-PRAD project [1], are known. Four of the seven subtypes are characterized by the presence of fusion transcripts between the TM-PRSS2 exons and the exons of genes encoding members of the erythroblast transformationspecific (ETS) family of transcription factors: ERG, ETV1, ETV4, and FLI1 (the frequency of the subtypes is 46%, 8%, 4%, and 1%, respectively). Three other subtypes are characterized by the presence of point mutations in one of the following genes: SPOP, FOXA1, or IDH1 (the frequency of the subtypes is 11%, 3%, and 1%, respectively) [1,2]. Thus, about half of all PCa cases have a TMPRSS2-ERG fusion transcript, which is formed due to an intrachromosomal rearrangement leading to the fusion of two genes: TMPRSS2 and ERG. The TMPRSS2 gene is characterized by a higher expression level compared with that of ERG gene expression in the prostate tissue [3]; however, their fusion leads to a Life 2021, 11, 588 2 of 11 manifold increase in ERG expression [4][5][6]. Furthermore, a number of studies have described TMPRSS2-ERG-mediated feed-forward regulation of wild-type ERG, inducing its overexpression [7,8]. ERG is a pro-oncogenic transcription factor involved in the regulation of embryonic development, cell proliferation, and differentiation. Multiple increases in the expression of the ERG entail serious transcriptomic reprogramming and altered cell signaling (for example, activation of WNT/TGF-beta and NOTCH pathways). It is believed that the fusion of TMPRSS2 and ERG is an early driver event of PCa tumorigenesis. The TMPRSS2-ERG fusion transcript is found in prostatic intraepithelial neoplasia (PIN); the presence of this fusion transcript is associated with an unfavorable prognosis and more aggressive PCa [9][10][11].
Predicting PCa recurrence is a nontrivial task, despite the existing stratification schemes for risk groups based on traditional clinical parameters, such as the Gleason score, lymphatic dissemination, and preoperative prostate-specific antigen (PSA). The most widely used method to classify patients is that of D'Amico, which identifies three risk groups: low, intermediate, and high [12]. However, these risk groups do not accurately reflect the likelihood of recurrence in PCa. Biochemical recurrence (BCR) is established at postoperative PSA ≥ 0.2 ng/mL and, as a rule, occurs only in some high-risk patients but can also be observed in some intermediate-risk patients [10,13,14], which are cases with an unfavorable prognosis. Therefore, it is necessary to search for more precise prognostic markers. The TMPRSS2-ERG fusion transcript can be a potential marker of an unfavorable prognosis in PCa.
To date, a number of studies have been published that confirm the association of TMPRSS2-ERG with an unfavorable prognosis. For example, a study of several cohorts of PCa patients found less recurrence-free survival (RFS) for TMPRSS2-ERG-positive PCa cases [15]. At the same time, an important aspect is the quantitative level of expression of the TMPRSS2-ERG fusion transcript and the ERG gene, not just their presence [16]. On the other hand, several studies have shown that TMPRSS2-ERG is a precursor of a favorable prognosis. A recent study showed better survival in TMPRSS2-ERG-positive patients than in TMPRSS2-ERG-negative patients [17]. Several other studies on numerous cohorts of PCa patients also did not reveal a direct relation between TMPRSS2-ERG gene rearrangement and BCR [18,19]. Thus, the data on the association of the TMPRSS2-ERG fusion transcript with BCR are contradictory, and its role in the progression of PCa remains unclear.
In this work, we studied the association between the presence and expression level of the TMPRSS2-ERG fusion transcript and BCR using a cohort of 72 PCa samples obtained from Russian patients. We also searched for potential prognostic markers based on differential gene expression for the TMPRSS2-ERG molecular subtype using RNA-Seq data from the cohort of Russian patients and The Cancer Genome Atlas project (TCGA-PRAD).

Materials
The study used two cohorts: 72 tumor samples of PCa obtained from Russian patients and RNA-Seq data of the TCGA-PRAD project. Tumor samples of PCa with paired adjacent morphologically normal tissues were collected and characterized at the P.A. Herzen of the Ministry of Health of Russia (Table 1). The patients provided written informed consent to participate in this study. Samples were collected from patients not undergoing neoadjuvant therapy. Each sample contained a minimum of 70% of tumor cells.
For differential expression analysis, we used RNA-Seq data for locally advanced PCa cases from TCGA-PRAD [20] obtained from patients without neoadjuvant therapy and belonging to the Caucasian population. For these cases, the disease recurrence status and tumor molecular subtype were known (Table 1). Total RNA was isolated from fresh, frozen samples using the MagNA Pure Compact RNA Isolation Kit (Roche, Basel, Switzerland) according to the manufacturer's protocol. The concentration of the isolated RNA was determined on a Qubit 2.0 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) using a Qubit RNA BR Assay Kit (Thermo Fisher Scientific). The quality of the isolated RNA was assessed on an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) using an Agilent RNA 6000 Nano Kit (Agilent Technologies). The RNA integrity number (RIN) was no less than 8.0. Reverse transcription was performed using Mint reverse transcriptase (Evrogen, Moscow, Russia) according to the manufacturer's protocol.

Quantitative PCR (qPCR)
The TaqMan Gene Expression Assays Hs03063375_ft (Thermo Fisher Scientific) was used to assess the expression level of the TMPRSS2-ERG fusion transcript. The RPN1 gene [21] was used as a control gene (primer and probe sequences for RPN1 [22]). qPCR was performed with an Applied Biosystems 7500 Real-Time PCR System (Thermo Fisher Scientific). The following process was used for amplification: 95 • C for 15 min; 40 cycles at 95 • C for 15 s; 60 • C for 60 s. Each qPCR reaction was repeated three times. To assess the level of expression, the method of relative measurements (∆CT) was used and calculations were performed using the ATG program (Analysis of Transcription of Genes) [23].

RNA Sequencing
The mRNA libraries were prepared using a TruSeq ® Stranded mRNA LT kit (Illumina, San Diego, CA, USA) according to the manufacturer's protocol. Sequencing was performed on the NextSeq500 System (read length-75 nt, single-end mode) using the NextSeq 500/550 High Output Kit v2.5 (Illumina).

Statistics
The analysis of associations between the presence/level of expression of TMPRSS2-ERG and BCR was performed in the R environment (v.3.6.3) [29]. Pearson's chi-squared test (χ 2 ) and odds ratio quantity (OR) were used to assess the relationship between the TMPRSS2-ERG transcript presence and the presence of biochemical recurrence. To assess the association between the TMPRSS2-ERG expression level and biochemical recurrence, the Mann-Whitney U-test (MW test) and fold change (FC) quantity were used. Differences were considered statistically significant at a p-value ≤ 0.05.
The quasi-likelihood (QLF test) and the MW tests were used to assess the significance of changes in gene expression. Genes that passed a QLF p-value ≤ 0.05 were considered differentially expressed.

Expression of the TMPRSS2-ERG Fusion Transcript in PCa Samples
Using qPCR, we assessed the expression of TMPRSS2-ERG transcript in 72 tumor samples of PCa and in paired adjacent normal tissues obtained from Russian patients. In all tested samples of normal prostate tissue, the expression of the TMPRSS2-ERG fusion transcript was not detected. The expression of TMPRSS2-ERG transcript was detected in 50% (36/72) of PCa tumor samples.
For these two groups of samples, factor analysis of the association of TMPRSS2-ERG fusion transcript presence with BCR was performed using the χ 2 test and OR value calculation. The OR value predicting the direction of changes was 3.56 with χ 2 p-value = 0.165.
To evaluate the expression level of TMPRSS2-ERG transcript, ∆CT was calculated relative to the reference gene (RPN1) for TMPRSS2-ERG-positive samples (n = 36). By analogy with the whole cohort, these samples were also divided into two groups: (1) BCR (10 samples), and (2) BRF (19 samples). There was a significant increase in the expression level of the TMPRSS2-ERG fusion transcript in the BCR group by an average of 5.8 times (p-value = 0.04) (Figure 1a). Survival analysis showed significantly less RFS for TMPRSS2-ERG-positive cases (Figure 1b, p-value = 0.009).
Additionally, we analyzed the association between the presence of the TMPRSS2-ERG fusion transcript and BCR using a cohort of 203 PCa samples from the TCGA-PRAD project, which was also divided into the groups of BCR (n = 52) and BRF (n = 151). The OR value for this cohort was 0.83 with χ 2 p-value = 0.80.

Differentially Expressed Genes and Significantly Enriched Pathways
Earlier, we performed an analysis of differentially expressed genes (DEGs) between groups of favorable and unfavorable prognoses for the lymph−node−negative LAPC and for the TMPRSS2-ERG−positive LAPC TCGA−PRAD cohort [35].
In the current study, we analyzed differential gene expression using an expanded cohort of PCa samples obtained from Russian patients and the TCGA−PRAD cohort. The following groups of comparisons were formed: 1) BCR versus BRF cases within the TMPRSS2-ERG molecular subtype and 2) BCR versus BRF cases within the TMPRSS2-ERG molecular subtype in the TCGA−PRAD cohort. As a result, 388 DEGs (QLF p−value ≤ 0.05, logCPM ≥ 3) were filtered for BCR within the TMPRSS2-ERG molecular subtype, which overlapped between the two studied cohorts (Figure 2, Supplementary Table S1). A total of 104 identified genes were characterized by an increase in expression in the BCR group within the TMPRSS2-ERG molecular subtype, while the expression levels of 284 genes decreased (Supplementary Table S1).
Using GSEA, enrichment with genes participating in several pathways related to the cytoskeleton, cell cycle, hormones, and secretion (insulin secretion, retinol, growth hormone pathways, calcium signaling), repair, and cell transport was revealed (Figure 3). The complete GSEA results are presented in Supplementary Table S2.

Differentially Expressed Genes and Significantly Enriched Pathways
Earlier, we performed an analysis of differentially expressed genes (DEGs) between groups of favorable and unfavorable prognoses for the lymph-node-negative LAPC and for the TMPRSS2-ERG-positive LAPC TCGA-PRAD cohort [35].
In the current study, we analyzed differential gene expression using an expanded cohort of PCa samples obtained from Russian patients and the TCGA-PRAD cohort. The following groups of comparisons were formed: (1) BCR versus BRF cases within the TMPRSS2-ERG molecular subtype and (2) BCR versus BRF cases within the TMPRSS2-ERG molecular subtype in the TCGA-PRAD cohort. As a result, 388 DEGs (QLF p-value ≤ 0.05, logCPM ≥ 3) were filtered for BCR within the TMPRSS2-ERG molecular subtype, which overlapped between the two studied cohorts (Figure 2, Supplementary Table S1). A total of 104 identified genes were characterized by an increase in expression in the BCR group within the TMPRSS2-ERG molecular subtype, while the expression levels of 284 genes decreased (Supplementary Table S1).
Using GSEA, enrichment with genes participating in several pathways related to the cytoskeleton, cell cycle, hormones, and secretion (insulin secretion, retinol, growth hormone pathways, calcium signaling), repair, and cell transport was revealed (Figure 3). The complete GSEA results are presented in Supplementary Table S2.

Four-Gene Prognostic Model
Among the 388 identified DEGs, seven candidate genes were highlighted as the most promising markers of unfavorable prognosis based on the distribution of differential To determine a potential predictive model based on combinations of the aforementioned candidate genes, ROC analysis was performed. The combination of the genes GNL3, QSOX2, SSPO, and SYS1 showed the best prognostic characteristics as a predictive model in both analyzed cohorts (AUC = 1 for the cohort of Russian patients, sensitivity = 1, specificity = 1; AUC = 0.779 for the TCGA cohort, sensitivity = 0.526, specificity = 0.870) (Figure 4). The results of other possible predictive models are presented in Supplementary  Table S3. In our research, we identified GNL3, QSOX2, and SSPO as upregulated genes and SYS1 as downregulated for the BCR group within TMPRSS2-ERG-positive PCa ( Table 2). model in both analyzed cohorts (AUC = 1 for the cohort of Russian patients, sensitivity = 1, specificity = 1; AUC = 0.779 for the TCGA cohort, sensitivity = 0.526, specificity = 0.870) (Figure 4). The results of other possible predictive models are presented in Supplementary  Table S3. In our research, we identified GNL3, QSOX2, and SSPO as upregulated genes and SYS1 as downregulated for the BCR group within TMPRSS2-ERG−positive PCa (Table 2).

Discussion
Currently, the question of whether the presence and/or expression level of the TMPRSS2-ERG fusion transcript is a factor of unfavorable prognosis in PCa remains unclear. Several studies show contradictory results. In this work, we analyzed the association of the presence/expression level of the TMPRSS2-ERG fusion transcript with BCR in PCa. A significant increase in the level of TMPRSS2-ERG expression was revealed in the BCR group in a cohort of Russian patients. At the same time, no statistically significant association between BCR and the presence of the TMPRSS2-ERG fusion transcript in the tumor was observed for both studied cohorts. However, less recurrence-free survival was observed for TMPRSS2-ERG-positive cases.
Since there is no unambiguous association of the TMPRSS2-ERG transcript with PCa recurrence, and the known markers of PCa do not have sufficient predictive power, the second part of our work was devoted to the search for potential prognostic markers of an unfavorable prognosis of PCa. Given the high biological heterogeneity of PCa [36], the search for potential markers was carried out within the TMPRSS2-ERG molecular sub-type, which is characterized by the highest occurrence (46% of all cases of PCa) [1].
Previously, we searched for potential prognostic markers for the cohort of Russian patients and the TCGA cohort [35,37]. However, we considered potential prognosis markers for a cohort of Russian patients without taking into account the TMPRSS2-ERG molecular subtype in view of the cohort size (n = 32). In addition, in the present study, a different study design was used for the TCGA cohort, including comparison groups and bioinformatic data processing.
As a result of the analysis, we obtained a list of DEGs (388 genes) common to both studied cohorts. The identified genes are involved in molecular pathways related to the regulation of the cytoskeleton, cell cycle and repair pathways, secretion pathways, and hormone signaling. The above pathways have also been associated with an unfavorable prognosis in lymph-node-negative PCa [35].
The following genes were selected as potential markers of unfavorable prognosis: GNL3, ODF2, PAXBP1, QSOX2, SSPO, SYS1, and ZNF302. Based on the results of the ROC analysis, the best primary predictive model that was identified relied on the combination of the GNL3, QSOX2, SSPO, and SYS1 genes (AUC = 1 for the cohort of Russian patients; AUC = 0.779 for the TCGA cohort) for the TMPRSS2-ERG molecular subtype. At the same time, an increase in expression was noted for the GNL3, QSOX2, and SSPO genes, while the expression of the SYS1 gene decreased in the group with an unfavorable prognosis.
The GNL3 gene encodes for guanine nucleotide-binding protein-like 3, also known as the nucleostemin, which is required to maintain the proliferative capacity of stem cells. The nucleostemin is concentrated in the nucleus and extracellular matrix. It stabilizes the MDM2 protein by preventing its ubiquitination and, hence, proteasomal degradation. In this way, the GNL3 protein may interact with p53 and may be involved in tumorigenesis. The QSOX2 gene encodes quiescin sulfhydryl oxidase 2, a member of the sulfhydryl oxidase/quiescin-6 (Q6) family (QSOX1), which are involved in the sensitization of neuroblastoma cells for IFN-gamma (IFNG)-induced cell death. The QSOX2 protein is localized as QSOX1 in the nucleus, Golgi apparatus, and extracellular matrix.
The SSPO (or SSPOP) is a pseudogene also known as SCO-spondin, which is involved in the modulation of neuronal aggregation. The SSPO may be involved in developmental events during the formation of the central nervous system. There are the metabolism of proteins and the O-glycosylation of TSR domain-containing proteins among its related pathways.
The SYS1 gene encodes the SYS1 Golgi trafficking protein, which forms a complex with ADP-ribosylation factor-related protein ARFRP1 and targets ARFRP1 to the Golgi apparatus. The SYS1 protein is mostly localized in Golgi apparatus and cytosol.
A number of studies have also described the association of GNL3 gene overexpression with tumor progression and poor survival in cancer of the prostate [38], stomach [39], Life 2021, 11, 588 9 of 11 colon [40], breast [41,42], and lung [43]. For three other identified genes (QSOX2, SSPO, and SYS1), we have described potential involvement in PCa for the first time.
In conclusion, the high expression level of the TMPRSS2-ERG fusion transcript was associated with unfavorable prognosis. Despite the lack of significant association of the unfavorable prognosis and presence of the TMPRSS2-ERG fusion transcript, comprehension of the TMPRSS2-ERG fusion transcript status in tumor is important to clarify which PCa prognostic model is appropriate for application. The most perspective prognostic model for TMPRSS2-ERG-positive PCa was based on expression profiles of GNL3, QSOX2, SSPO, and SYS1 genes. The above model demonstrated sufficiently high predictive potential. However, further experimental research to validate this model on an expanded cohort by qPCR is needed.