Next Article in Journal
Genome Analysis of BnCNGC Gene Family and Function Exploration of BnCNGC57 in Brassica napus L.
Next Article in Special Issue
Metabolomics-Based Machine Learning Models Accurately Predict Breast Cancer Estrogen Receptor Status
Previous Article in Journal
Intravitreal Metformin Protects Against Choroidal Neovascularization and Light-Induced Retinal Degeneration
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Unmasking Neuroendocrine Prostate Cancer with a Machine Learning-Driven Seven-Gene Stemness Signature That Predicts Progression

1
Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires C1428EGA, Argentina
2
Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), CONICET-Universidad de Buenos Aires, Buenos Aires C1428EGA, Argentina
3
Instituto de Tecnología (INTEC), Universidad Argentina de la Empresa (UADE), Buenos Aires C1073AAO, Argentina
4
Department of Genitourinary Medical Oncology and The David H. Koch Center for Applied Research of Genitourinary Cancers, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
5
Centro de Oncología Molecular y Traslacional y Plataforma de Servicios Biotecnológicos, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal B1876BXD, Argentina
6
Sector de Oncología Clínica, Hospital Italiano de Buenos Aires, Buenos Aires C1199ABB, Argentina
*
Authors to whom correspondence should be addressed.
Int. J. Mol. Sci. 2024, 25(21), 11356; https://doi.org/10.3390/ijms252111356
Submission received: 23 September 2024 / Revised: 17 October 2024 / Accepted: 20 October 2024 / Published: 22 October 2024

Abstract

Prostate cancer (PCa) poses a significant global health challenge, particularly due to its progression into aggressive forms like neuroendocrine prostate cancer (NEPC). This study developed and validated a stemness-associated gene signature using advanced machine learning techniques, including Random Forest and Lasso regression, applied to large-scale transcriptomic datasets. The resulting seven-gene signature (KMT5C, DPP4, TYMS, CDC25B, IRF5, MEN1, and DNMT3B) was validated across independent cohorts and patient-derived xenograft (PDX) models. This signature demonstrated strong prognostic value for progression-free, disease-free, relapse-free, metastasis-free, and overall survival. Importantly, the signature not only identified specific NEPC subtypes, such as large-cell neuroendocrine carcinoma, which is associated with very poor outcomes, but also predicted a poor prognosis for PCa cases that exhibit this molecular signature, even when they were not histopathologically classified as NEPC. This dual prognostic and classifier capability makes the seven-gene signature a robust tool for personalized medicine, providing a valuable resource for predicting disease progression and guiding treatment strategies in PCa management.

1. Introduction

Prostate cancer (PCa) remains one of the most significant health challenges for men globally, with a high incidence and mortality, particularly in advanced stages of the disease [1]. Despite advancements in early detection and treatment, accurately predicting which patients will experience aggressive disease progression remains a major challenge. A critical gap in the management of PCa is the lack of reliable prognostic biomarkers capable of identifying patients at the highest risk of developing more aggressive forms of PCa, such as neuroendocrine prostate cancer (NEPC), a subtype associated with poor prognosis [2,3]. Addressing this gap is essential for improving patient outcomes and guiding effective therapeutic strategies.
To overcome this challenge, we propose the identification of stem-like characteristics within prostate tumors. Cancer stem cells (CSCs) are a subpopulation of cells within tumors that possess the ability to self-renew, differentiate, and drive tumor growth, metastasis, and resistance to conventional therapies [2,4]. These cells have been implicated in the recurrence and progression of PCa, making them critical targets for both prognostic and therapeutic interventions [5]. Moreover, CSCs are believed to contribute to the heterogeneity of PCa, which complicates treatment and highlights the need for more refined biomarkers [6]. However, despite the recognized importance of CSCs, there is still a need for concise and clinically applicable biomarkers, such as transcriptomic signatures, that can reliably point out the presence of stemness traits in prostate tumors and their associated risk of progression.
In addition to their role in driving tumor growth, CSCs are more abundant after the neuroendocrine differentiation of prostate cancer, which results in NEPC [7]. This PCa subtype can arise either de novo or through the transdifferentiation of adenocarcinoma under selective pressures such as androgen deprivation therapy (ADT) [8,9]. This transdifferentiation process, driven by cellular plasticity and epigenetic changes, results in a highly aggressive cancer subtype that is associated with poor outcomes and limited treatment options [10,11]. Identifying biomarkers that can detect early shifts toward a neuroendocrine phenotype is crucial for managing treatment-resistant cases. However, existing NEPC-related gene signatures are often complex, including a large number of genes, limiting their practical use in clinical settings [12,13].
In this study, we aimed to address these challenges by developing a concise and robust stemness-associated gene signature using machine learning techniques. By analyzing large-scale transcriptomics data from multiple cohorts, we identified a seven-gene signature that predicts multiple PCa disease progression events. This signature was rigorously validated across independent datasets and further substantiated using patient-derived xenograft (PDX) models and a NEPC dataset, where we observed that our signature is able to classify samples as NEPC and, particularly, the large cell neuroendocrine carcinoma subtype. Our comprehensive approach provides a novel and clinically applicable tool for patient stratification and treatment personalization, offering new insights into the role of stem-like traits in PCa and their association with neuroendocrine differentiation.

2. Results

2.1. Dysregulation of Stemness-Associated Genes Across Multiple PCa Comparisons

We gathered 144 stemness-associated genes in PCa from the literature [6,14,15,16,17,18] (Supplementary Table S1) and analyzed their expression and association with multiple survival endpoints (Figure 1A). First, we performed pairwise differential gene expression analyses using 7 PCa datasets (n = 1259), which included 11 comparisons between normal prostate, primary tumor, metastatic, and castration-resistant PCa (CRPC) samples (Figure 1A). Volcano plots evidenced the dysregulation of 139 stemness-associated genes, with both upregulation (red, adjusted p < 0.05, log2FC > 0) and downregulation (blue, adjusted p < 0.05, log2FC < 0) in all comparisons (Primary PCa vs. Benign/Normal/Adjacent, Metastatic PCa vs. Benign, Metastatic PCa vs. Primary PCa, CRPC vs. Benign; CRPC vs. Primary PCa) (Figure 1(Bi)). Figure 1(Bii) summarizes these results across comparisons. We observed that all stemness genes were dysregulated in at least one dataset, with 29 genes consistently upregulated and 26 genes consistently downregulated (Figure 1(Bii), Supplementary Table S2).

2.2. Association of Stemness Markers with PCa Patients’ Survival

We evaluated the association with different survival events in PCa patients, including progression-free survival (PFS), biochemical relapse-free survival (RFS), metastasis-free survival (MFS), overall survival (OS), and disease-specific survival (DSS) for the 139 differentially expressed genes. Figure 2(Ai) shows representative Kaplan–Meier plots for three example genes (DBNL, UBTD2, and MBNL2) in the TCGA-PRAD dataset (n = 497, PFS). The results showed that high expression of DBNL and low expression of MBNL2 were significantly associated with poor PFS (HR = 2, Log-rank p = 0.0011 and HR = 0.39, Log-rank p < 0.0001, respectively, Figure 2(Ai), left and right panels). No significant associations were observed for UBTD2 (Log-rank p = 0.1062, Figure 2(Ai), middle panel). Figure 2(Aii) shows a heatmap summarizing the results of the univariable survival analysis for each of the 139 candidate genes performed across the 5 training datasets including 5 different types of events (n = 1229; detailed in Supplementary Table S3). The results are color-coded as follows: red squares represent genes with high expression significantly associated with shorter times to the event, white squares indicate genes with no significant associations to the event, and blue squares represent genes with high expression significantly associated with a better outcome (longer times to the event). Of note, there was a group of genes whose high expression was consistently associated with poor prognosis (Figure 2(Aii), left, in red), while others were associated with a better outcome (Figure 2(Aii), right, in blue).
Next, we performed multivariable Cox regression analyses for each of the 139 previously mentioned genes to evaluate their independence from other known risk factors for PCa progression in predicting an event. For the three examples mentioned above, DBNL (HR = 2.61, 95% CI 1.40–4.86, Cox p = 0.002) and MBNL2 (HR = 0.69, 95% CI 0.54–0.88, Cox p = 0.003) displayed a significant association with high and low risk of PFS, respectively, independently from the other covariates available in the TCGA-PRAD dataset (PSA levels, ISUP grade, Clinical T Stage, and Targeted Molecular/Radiation Therapy; Figure 2(Bi)). No significant associations were observed for UBTD2 (Figure 2(Bi)). The overall results for the multivariable survival analyses are summarized as a heatmap in Figure 2(Bii) and detailed in Supplementary Table S4. Most associations observed in the univariable survival analysis (Figure 2(Aii)) lost statistical significance after adjusting for clinical covariates (Figure 2(Bii)).

2.3. Modeling a Stemness-Associated Signature with Prognostic Value

We used a machine learning algorithm to identify the most relevant prognostic candidate genes to model a gene-expression signature that could stratify patients into risk groups for disease progression and death. We used a Random Forest algorithm to rank genes according to their relevance for event prediction in the training datasets and calculated the mean relative importance score for each gene (Figure 3A). The top 15 genes were ALDH1A1, KMT5C, DPP4, RPS6KB1, TYMS, CCT3, IL1RAP, MICAL3, CDC25B, IRF5, MEN1, DNMT3B, CD24, RND3, and CASP9 (Figure 3A, purple square). Next, we used these genes to develop our stemness-associated risk signature. Model coefficients were calculated on the TCGA-PRAD cohort by Lasso regression, a feature selection method that keeps the most important predictors by shrinking the coefficients of less significant genes to zero. This analysis resulted in a signature of seven significant genes, generating the following weighted linear model:
0.284 × KMT5C + 0.272 × MEN1 + 0.218 × TYMS + 0.090 × IRF5 + 0.083 × DNMT3B + 0.048 × CDC25B − 0.060 × DPP4,
where gene expressions are considered a continuous variable (Figure 3(Bi)).
Next, in order to evidence this seven-gene signature prognostic performance, we calculated the risk score for each patient on the TCGA-PRAD dataset and stratified them into high-risk and low-risk groups using the median score as the cut point. As expected, we evidenced a shorter progression time for the high-risk group compared to the low-risk group (HR = 3.36, 95% CI 2.11–5.35, Log-rank p < 0.0001, Figure 3(Bi)). When we considered the score as a continuous variable, we observed an HR = 4.34 (95% CI 2.95–6.37, Cox p < 0.0001) for each unit increase in the score (Figure 3(Bii), Supplementary Table S5). We then extended the score to the other training datasets and corroborated its prognostic significance within these cohorts. When analyzing the event-free survival in all the other training datasets, we observed significant associations with our model in five out of six analyses, both using the dichotomous and continuous score, suggesting that our seven-gene signature is able to predict the risk of multiple disease-progression events across our training cohorts (Figure 3(Bii), Supplementary Table S5). The identified genes and the developed risk score model effectively stratify patients based on their risk of adverse outcomes, suggesting their potential as prognostic biomarkers.

2.4. Consistent Performance Across Validation Datasets

Next, we validated our model using datasets from independent cohorts (n = 501). We calculated the seven-gene score for all patients in the different datasets and categorized them into high or low risk using the median value as a cutoff. Interestingly, the risk score was significantly associated with event-free survival in all validation cohorts (Figure 4(Ai,Aii)). Of note, in the SU2C dataset, which comprises metastatic PCa samples, patients with high scores had nearly 2-fold higher risk of death compared to patients with low scores (Figure 4(Aii)). This demonstrates that the seven-gene signature is a robust predictor of the risk of death even in advanced stages. Moreover, when analyzing the seven-gene signature as a continuous variable, all datasets presented significant results, with higher concordance indexes than the dichotomized analysis (Figure 4(Aiii)). Multivariable analyses demonstrated that our score predicts disease-progression events independently of the other clinicopathological variables (Figure 4B), which highlights its potential utility in clinical decision making.

2.5. The Stemness-Associated Gene Signature Captures Neuroendocrine Disease Heterogeneity in the MDA PCa PDX Series

Next, we sought to analyze the association between the seven-gene signature and other clinicopathological characteristics available in the MDA PCa PDX series, which was developed in the Laboratory of Dr. Navone within the “Prostate Cancer Patient Derived Xenograft Program” at MD Anderson Cancer Center and the David H. Koch Center for Applied Research of Genitourinary Cancers. PCa tissue samples used for PDX development were derived from therapeutic or diagnostic procedures, namely, radical prostatectomies, orthopedic and neurosurgical procedures to palliate complications, and biopsies of metastatic lesions [19] (Figure 5A). We analyzed the expression of the seven stemness-associated genes selected in the present study using previously generated RNA-Seq data from the 44 MDA PCa PDX series [20]. Surprisingly, the expression of this signature was able to accurately cluster PDXs according to their histopathological classification (adenocarcinoma or sarcomatoid vs. neuroendocrine tumors) in an unsupervised clustering analysis (Figure 5(Bi)). Moreover, NEPC PDXs displayed significantly higher scores (Figure 5(Bii)). Specifically, CDC25B, TYMS, KMT5C, and DNMT3B were significantly upregulated in NEPC vs. no-NEPC PDXs, while IRF5 and DPP4 were significantly downregulated (Figure 5(Biii)).
These results were also observed in a Principal Component Analysis (PCA) (Figure 5(Ci)), which highlighted KMT5C as the main gene in the signature contributing to the variance (PC1) between samples of different histopathological profiles (Figure 5(Cii)), followed by CDC25B and DNMT3B. Of note, KMT5C is also the gene that weighs higher in our score (biggest coefficient, Figure 3(Bi)). To evaluate the power of the signature in predicting whether a tumor is NEPC, we performed receiver operating characteristic (ROC) analysis. The AUC of our seven-gene score was 0.92 (95% CI = 0.84–1) (Figure 5D), highlighting its high performance for classifying NEPC samples.

2.6. Our Stemness Score Adds Value to Pre-Existing NEPC Score

To compare our risk score performance with a pre-established NEPC classification score, we analyzed the expression of the genes from the 70-gene signature by Beltran et al. [12], in the MDA PCa PDX series. We observed a good segregation of the PDXs according to their histopathological classification when using the 70 genes from Beltran et al. NEPC score [12]; however, the 2 double-negative tumors (negative for AR and NE features) were clustered within the NEPC tumors group (Figure 6(Ai)). Nonetheless, when also including the expression of the seven genes identified in this work alongside the genes from the NEPC classification score [12], clustering of the PDX was more accurate, not only grouping adenocarcinomas vs. NEPC tumors but also sarcomatoid samples (Figure 6(Aii)).

2.7. The Seven-Gene Signature Effectively Classifies Large-Cell Neuroendocrine Carcinomas

To validate the association of our risk score model with NEPC, we analyzed the transcriptomics dataset from Beltran et al. (n = 49) [12], which includes 15 samples from CRPC-neuroendocrine (NE) and 34 CRPC-adenocarcinomas tumors. Our signature was able to distinguish CRPC-NE tumors to a limited extent (Figure 6(Bi)), while, overall, our risk score was significantly higher in CRPC-NE compared to CRPC-Adeno (p < 0.01, Figure 6(Bii)). However, we looked further into the available pathology classification (prostate adenocarcinoma with no neuroendocrine differentiation, n = 34; prostate adenocarcinoma with neuroendocrine differentiation >20%, n = 2; small-cell carcinoma n = 4; large-cell neuroendocrine carcinoma, n = 7; mixed small-cell carcinoma–adenocarcinoma, n = 2) and observed that six out of seven samples of the large-cell NEPC clustered together (Figure 6(Bi)), while the seven-gene signature was particularly higher in that subtype (Figure 6(Ci)). Strikingly, the AUC = 0.99 (95% CI = 0.97–1) suggests that the signature of seven stemness-associated genes proposed in this work is accurate in classifying samples as large-cell NEPC (Figure 6(Cii)). Since large-cell NEPC molecular characterization remains elusive [21], our findings set the grounds for future research on the implications of these genes in this subtype pathogenesis.

3. Discussion

In this study, we identified and validated a novel seven-gene signature that represents a significant advancement in the prediction of poor outcomes and molecular detection of NEPC. Our findings demonstrate that this signature not only reliably stratifies PCa patients based on their risk of progression but also reveals a crucial link between stemness-associated pathways and neuroendocrine characteristics. Importantly, this signature is particularly adept at identifying tumors within the Prostate Cancer Foundation (PCF) and World Health Organization (WHO)-defined large-cell neuroendocrine carcinoma [22,23], which is regarded as very rare and associated with very poor outcomes (mean survival of 7 months) [23].
Large-cell NEPCs are high-grade tumors that usually develop from treatment-resistant clones [24]; they are mainly diagnosed histopathologically, thus remaining a challenge and underrecognized [21,22,25]. Hence, there is a need for molecular biomarkers that could subclassify NEPC tumors for better clinical management [21]. The ability of our seven-gene signature to pinpoint this specific aggressive and challenging NEPC subtype underscores the clinical utility of our model in guiding more precise therapeutic interventions.
Our stemness-associated signature addresses a critical need for improving PCa prognosis, while also offering the precise stratification of NEPC, which is often characterized by poor clinical outcomes and high proliferative indices [26]. NEPC is recognized as one of the most aggressive and treatment-resistant forms of PCa, often arising in the context of advanced CRPC after multiple rounds of ADT [27]. While most NEPC cases develop in patients with a history of extensive anti-androgen treatment, the disease can also manifest de novo, albeit rarely, in treatment-naïve patients [9,12]. Further, ADT-induced NE transdifferentiation could be explained by altered mast cell infiltration [28,29]. Maimaitiyiming et al. established a mast cell gene signature with prognostic efficacy in PCa [30], and, interestingly, mast cells have been reported to support the stem phenotype of cancer cells [31]. Altogether, focusing on stemness-associated genes could offer insights into NEPC biology and potential targets.
The molecular landscape of NEPC has been increasingly clarified in recent years, with significant contributions from studies like those of Beltran et al., who delineated the heterogeneity within NEPC and highlighted distinct molecular subtypes [12,22,32]. Their research highlights the genetic, epigenetic, and molecular diversity of NEPC, particularly noting alterations such as RB1 and TP53 loss, MYCN overexpression, and the activation of the PI3K/AKT pathway, which contribute to the aggressive nature of these tumors [12,22,32]. Our study builds on these findings by focusing on a seven-gene stemness signature. Unlike previous signatures that include a broad array of genes, our streamlined seven-gene model achieves comparable or superior predictive accuracy, underscoring its practical utility in diverse clinical contexts.
The biological relevance of the genes in our signature—KMT5C, DPP4 (also known as CD26), TYMS, CDC25B, IRF5, MEN1, and DNMT3B—lies in their involvement in critical processes such as chromatin modification, DNA methylation, DNA repair, cell-cycle regulation, immune escape, and extracellular matrix remodeling [33,34,35,36,37,38,39]. These processes are fundamental to maintaining the plasticity and adaptability of cancer stem cells (CSCs) [5,40], which are enriched after the transdifferentiation of prostate adenocarcinoma into more aggressive neuroendocrine phenotypes [41]. For example, KMT5C, DNMT3B, and MEN1 play pivotal roles in chromatin remodeling and methylation [33,34,35], processes that are crucial for the epigenetic reprogramming observed in NEPC [42]. Additionally, TYMS has been previously associated with neuroendocrine differentiation in other types of cancer [43,44]. The integration of these stemness-associated genes into our model highlights the potential for characterizing NEPC-like tumors.
One of the key strengths of our study is the extensive validation of our signature across patient datasets and PDX models. The latter, which faithfully replicate the histological and genetic features of human tumors, are widely regarded as the gold standard for preclinical studies [45]. Our findings demonstrate that the seven-gene signature consistently distinguishes NEPC from other PCa subtypes in these models, underscoring its clinical utility and potential for identifying NEPC-like tumors. This aspect of our research not only validates the predictive power of the signature but also highlights its potential utility in translational research, particularly in the development of novel therapeutic strategies aimed at targeting the molecular underpinnings of NEPC.

4. Materials and Methods

4.1. Stemness-Associated Genes

We gathered 144 stemness-associated genes from the PCa literature [6,14,15,16,17,18]. We conducted transcriptomics analyses using publicly available PCa datasets (see below) to study differential gene expression across multiple comparisons, including normal/benign tissues, primary PCa tumors, CRPC tumors, and metastatic samples. We performed univariable survival analysis to study the association between gene expression and different endpoints (progression, disease-free time biochemical-relapse, metastasis, and death). We also performed multivariable survival analyses that included clinicopathological features as covariables.

4.2. Gene Expression Analyses in Human Patients

4.2.1. Dataset Selection Criteria

To study differential gene expression across different PCa datasets, we searched the Gene Expression Omnibus (GEO, accessed date 1 October 2021) and the Genomic Data Commons Data Portal (accessed date 1 October 2021) to identify eligible datasets that met the following criteria: (1) PCa tissue samples with available transcriptomic and clinicopathological data; (2) datasets with ≥2 different tissue sample types (Table 1).

4.2.2. Differential Gene Expression Analyses

We used the limma package (Linear Models for Microarray Analysis, version 3.58.1) [54] to study differential gene expression from both microarrays and RNA sequencing (RNA-seq). In the case of non-normalized data, quantile normalization was applied [54]. For RNA-seq data, the voom function in the limma package was used for processing [55]. We conducted pair-wise differential expression analyses within each dataset. For each available probe or gene, the fold changes (FCs) between conditions were calculated and expressed as log2FC. To correct for multiple testing, we used the Benjamini-Hochberg method to control the type I error and reported adjusted p-values.

4.3. Association Between Gene Expression and Patients’ Outcomes

4.3.1. Dataset Selection Criteria

To perform survival analysis, we searched the Gene Expression Omnibus (GEO) (accessed date 1 October 2021), cBioPortal (accessed date 1 October 2021), and the Genomic Data Commons Data Portal (accessed date 1 October 2021) to identify eligible datasets that met the following criteria: (1) PCa cases with available gene expression data and (2) available clinicopathological features with ≥5 years of follow-up. Gene expression and clinical data were downloaded and analyzed for the resulting selected datasets. Samples with incomplete gene expression data or missing essential clinicopathological metadata were not included. Datasets were randomly distributed in training (5 datasets, 7 survival analyses) and validation cohorts (4 datasets) (Table 2).

4.3.2. Survival Analyses

We used the Log-rank test to analyze differences in the risk of disease-progression events between different groups of patients [62]. To stratify patients according to high or low expression, we used the Cutoff Finder tool to find the optimal cutoff point for each gene [63]. The Cox proportional hazards model was used to estimate the risk of the disease-progression event for the different groups [64]. Multivariable analyses included clinicopathological features as covariables. All modeling, calculations, and graphs were performed with the R packages survival (version 3.7-0) [65] and survminer (version 0.4.9) [66].

4.4. Selection of Candidate Genes for Modeling a Risk Score

To identify the 15 most important genes for predicting events, we used a machine learning ensemble-based approach (i.e., the Random Forest Classifier) as implemented in the randomForestSRC R package (version 3.3.1) [67]. The mtry and nodesize parameters were optimized through a grid search approach to minimize the out-of-bag error using the tune function (set.seed(1), ntree = 500), and are included in Supplementary Table S6. We used the Breiman–Cutler variable importance (VIMP) measure to estimate the relative importance of each variable in predicting event-free survival within the training datasets. We applied the subsampling method [68] to estimate the standard error of the VIMP and to calculate the confidence intervals. Genes were ranked according to their variable importance. To facilitate the comparison across datasets, VIMP values were converted into fractions, with 1 representing the most important variable and 0 representing the least important variable within a given dataset.

4.5. Gene Signature and Risk Score Calculation

We modeled a risk score based on the gene expression of the 15 most important genes identified across training datasets using Random Forest. To develop this risk score, we calculated model coefficients through Lasso regression using TCGA-PRAD data. Patient scores were then calculated based on the expression of the selected genes following Lasso regression. The performance of this risk score was evaluated within each training dataset. Univariable Cox regression was used to estimate the risk of poor survival in patients with high-risk scores. Patients were stratified either by a dichotomized risk score (with the median as the cut point) or by a continuous risk score. The concordance index (CI) was used to measure the performance of the signature within each dataset.
In the validation stage, those same coefficients were used in all additional datasets. For each patient, the score was calculated, and its association with event-free survival was studied using univariable and multivariable Cox regressions.

4.6. Transcriptome Analysis of MDA PCa PDX Series

To assess the association of the stemness signature and other clinicopathological characteristics in an extensively annotated cohort, we used the MDA PCa PDX series, which was previously developed in the “Prostate Cancer Patient Derived Xenograft Program” at MD Anderson Cancer Center and the David H. Koch Center for Applied Research of Genitourinary Cancers [19]. Briefly, PCa tissue samples used were derived from various procedures, and small pieces were then implanted into subcutaneous pockets of 6- to 8-week-old male CB17 SCID mice (Charles River Laboratories) [19]. RNA-Seq and transcriptome analysis on these samples was performed as previously described [20].

4.7. Unsupervised Clustering and Principal Component Analysis (PCA)

Unsupervised clustering analysis including the expression data of the stemness-associated genes included in the signature was performed using the pheatmap (version 1.0.12) [69] package, and Principal Component Analysis was performed using the factoextra package (version 1.0.7) in R (version 4.3.0) [70].

4.8. Receiver Operating Characteristic (ROC) Curve for NEPC Classification

The pRoc package (version 1.18.5) [71] was used for the estimation of the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC). Confidence intervals were calculated by the DeLong method [71].

4.9. NEPC Patients’ Samples Dataset

To assess gene expression in NEPC samples, we downloaded the data from the Neuroendocrine Prostate Cancer (Multi-Institute, Nat Med 2016) dataset published by Beltran et al. [12] from cBioPortal (accessed date 30 July 2024) [72,73,74]. Briefly, this dataset contains transcriptomics and histopathological data from 49 PCa samples (34 CRPC-Adeno and 15 CRPC-NE) obtained by RNA-Seq.

4.10. Statistical Analyses

All bioinformatics analyses were performed using the R programming language (version 4.3.0) [75] through the RStudio platform (RStudio, PBC, Boston, MA, USA, version 2024.04.1) [76]. The tidyverse package (version 2.0.0) was used for general data analysis and manipulation [77]. For graphics, the ggplot2 (version 3.5.1) [78], ggpubr (version 0.6.0) [79], and RColorBrewer (version 1.1-13) [80] packages were used. Datasets available in GEO were downloaded with GEOquery (version 2.70.0) [81]. All heatmaps were created with the pheatmap package (version 1.0.12) [69]. Forest plots were created using GraphPad Prism software (version 8.4.2, La Jolla, CA, USA). The Mann–Whitney test and ANOVA, followed by Tukey’s test, were used to assess differences in risk score and gene expression values across groups. We used the Log-rank test and Cox proportional hazard model regression to study the association between gene expression and patients’ survival. Multivariable analyses were performed in R and plotted in GraphPad Prism software (version 8.4.2, La Jolla, CA, USA). Statistical significance was set at p < 0.05.

5. Conclusions

This study presents a significant advancement in PCa prognosis and the classification of NEPC, particularly for the challenging large-cell subtype. Importantly, PCa cases presenting this molecular signature, even when not histopathologically identified as NEPC, also exhibit a poor prognosis. This reinforces the clinical relevance of our model, which is capable of identifying aggressive tumor subtypes that may not yet display overt NE differentiation but still represent a high risk for adverse outcomes. Through the development of this novel stemness-associated seven-gene signature, our model offers a robust and practical tool with potential clinical application, paving the way for more personalized and effective therapeutic strategies in PCa.

Limitations

Despite the robustness of our findings, there are several limitations that must be acknowledged. Our study primarily relies on transcriptomic data from publicly available repositories, which, while comprehensive, may not fully represent the genetic diversity of PCa patients globally. Future research should focus on further validating our signature in ethnically and genetically diverse cohorts to ensure its broad applicability. Further, the analysis is based on retrospective data, and further prospective validation in clinical settings is necessary to confirm its full applicability. Additionally, while our focus on transcriptomic data provided valuable insights into NEPC biology, integrating multi-omics data, including proteomics and metabolomics, could enhance the predictive power of our model. Moreover, the scarce number of NEPC samples with transcriptomics data and, particularly, of large-cell NEPC (likely due to under-recognition and underreporting [21]) requires further validation in larger cohorts. Functional validation of the identified genes through in vivo studies will also be critical for determining their role in disease and translating findings into clinical practice.

Supplementary Materials

The supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms252111356/s1.

Author Contributions

Conceptualization: A.S., P.S., E.V., D.F.A., F.C., M.M., J.C., A.T., E.L., J.B. and G.G.; methodology: A.S., P.S., R.S., G.P., M.M., J.C., J.B. and G.G.; software: A.S., P.S., M.M., J.C. and J.B.; validation: A.S., P.S., R.S., G.P., E.V., M.M., J.C., A.T., J.B. and G.G.; formal analysis: A.S., P.S., N.A., D.F.A., F.C., E.V., M.M., J.C., A.T., J.B. and G.G.; investigation: A.S., P.S., R.S., G.P., A.T., J.B. and G.G.; resources: E.V., J.C., A.T., E.L. and G.G.; data curation: A.S., P.S. and J.B.; writing—original draft preparation: A.S., P.S., N.A., D.F.A., F.C., E.V., M.M., J.C., A.T., E.L., J.B. and G.G.; writing—review and editing: A.S., P.S., N.A., D.F.A., F.C., E.V., M.M., J.C., A.T., E.L., J.B. and G.G.; visualization: A.S., J.B. and G.G.; supervision: E.V., M.M., J.C., A.T., J.B. and G.G.; project administration: J.B. and G.G.; funding acquisition: E.V., J.C., A.T., E.L. and G.G. All authors have read and agreed to the published version of the manuscript.

Funding

The present study was supported by Agencia Nacional de Promoción de la Investigación el Desarrollo Tecnológico y la Innovación (ANPCyT) PICT-RAICES-2021-III-A-00080; David H. Koch Center for Applied Research in Genitourinary Cancers at MD Anderson (Houston, TX); William Pippin Jr Fellow GU Rsch & TH & MP Scott Fellow Cancer Endowed Research Fellowship Pilot Grant and NIH/NCI U01 CA224044. The funders of the study had no role in the study design, data collection, data analysis, data interpretation, writing, or decision to submit.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Materials, and further inquiries can be directed to the corresponding author/s. The code used to generate the herein presented results is available at https://github.com/asabater00/stemness_signature_PCa (uploaded on 17 October 2024).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Bray, F.; Laversanne, M.; Sung, H.; Ferlay, J.; Siegel, R.L.; Soerjomataram, I.; Jemal, A. Global Cancer Statistics 2022: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2024, 74, 229–263. Available online: https://acsjournals.onlinelibrary.wiley.com/doi/10.3322/caac.21834 (accessed on 18 July 2024).
  2. Beltran, H.; Rickman, D.S.; Park, K.; Chae, S.S.; Sboner, A.; MacDonald, T.Y.; Wang, Y.; Sheikh, K.L.; Terry, S.; Tagawa, S.T.; et al. Molecular Characterization of Neuroendocrine Prostate Cancer and Identification of New Drug Targets. Cancer Discov. 2011, 1, 487–495. [Google Scholar] [CrossRef] [PubMed]
  3. Robinson, D.; Van Allen, E.M.; Wu, Y.-M.; Schultz, N.; Lonigro, R.J.; Mosquera, J.-M.; Montgomery, B.; Taplin, M.-E.; Pritchard, C.C.; Attard, G.; et al. Integrative Clinical Genomics of Advanced Prostate Cancer. Cell 2015, 161, 1215–1228. [Google Scholar] [CrossRef] [PubMed]
  4. Liu, C.; Kelnar, K.; Liu, B.; Chen, X.; Calhoun-Davis, T.; Li, H.; Patrawala, L.; Yan, H.; Jeter, C.; Honorio, S.; et al. The microRNA miR-34a Inhibits Prostate Cancer Stem Cells and Metastasis by Directly Repressing CD44. Nat. Med. 2011, 17, 211–215. [Google Scholar] [CrossRef] [PubMed]
  5. Al Salhi, Y.; Sequi, M.B.; Valenzi, F.M.; Fuschi, A.; Martoccia, A.; Suraci, P.P.; Carbone, A.; Tema, G.; Lombardo, R.; Cicione, A.; et al. Cancer Stem Cells and Prostate Cancer: A Narrative Review. Int. J. Mol. Sci. 2023, 24, 7746. [Google Scholar] [CrossRef]
  6. Maitland, N.J.; Collins, A.T. Prostate Cancer Stem Cells: A New Target for Therapy. J. Clin. Oncol. 2008, 26, 2862–2870. [Google Scholar] [CrossRef]
  7. Banerjee, P.; Kapse, P.; Siddique, S.; Kundu, M.; Choudhari, J.; Mohanty, V.; Malhotra, D.; Gosavi, S.W.; Gacche, R.N.; Kundu, G.C. Therapeutic Implications of Cancer Stem Cells in Prostate Cancer. Cancer Biol. Med. 2023, 20, 401–420. [Google Scholar] [CrossRef] [PubMed]
  8. Beltran, H.; Tomlins, S.; Aparicio, A.; Arora, V.; Rickman, D.; Ayala, G.; Huang, J.; True, L.; Gleave, M.E.; Soule, H.; et al. Aggressive Variants of Castration-Resistant Prostate Cancer. Clin. Cancer Res. 2014, 20, 2846–2850. [Google Scholar] [CrossRef]
  9. Aggarwal, R.; Huang, J.; Alumkal, J.J.; Zhang, L.; Feng, F.Y.; Thomas, G.V.; Weinstein, A.S.; Friedl, V.; Zhang, C.; Witte, O.N.; et al. Clinical and Genomic Characterization of Treatment-Emergent Small-Cell Neuroendocrine Prostate Cancer: A Multi-Institutional Prospective Study. J. Clin. Oncol. 2018, 36, 2492–2503. [Google Scholar] [CrossRef]
  10. Dardenne, E.; Beltran, H.; Benelli, M.; Gayvert, K.; Berger, A.; Puca, L.; Cyrta, J.; Sboner, A.; Noorzad, Z.; MacDonald, T.; et al. N-Myc Induces an EZH2-Mediated Transcriptional Program Driving Neuroendocrine Prostate Cancer. Cancer Cell 2016, 30, 563–577. [Google Scholar] [CrossRef]
  11. Mu, P.; Zhang, Z.; Benelli, M.; Karthaus, W.R.; Hoover, E.; Chen, C.-C.; Wongvipat, J.; Ku, S.-Y.; Gao, D.; Cao, Z.; et al. SOX2 Promotes Lineage Plasticity and Antiandrogen Resistance in TP53- and RB1-Deficient Prostate Cancer. Science 2017, 355, 84–88. [Google Scholar] [CrossRef]
  12. Beltran, H.; Prandi, D.; Mosquera, J.M.; Benelli, M.; Puca, L.; Cyrta, J.; Marotz, C.; Giannopoulou, E.; Chakravarthi, B.V.S.K.; Varambally, S.; et al. Divergent Clonal Evolution of Castration-Resistant Neuroendocrine Prostate Cancer. Nat. Med. 2016, 22, 298–305. [Google Scholar] [CrossRef] [PubMed]
  13. Bluemn, E.G.; Coleman, I.M.; Lucas, J.M.; Coleman, R.T.; Hernandez-Lopez, S.; Tharakan, R.; Bianchi-Frias, D.; Dumpit, R.F.; Kaipainen, A.; Corella, A.N.; et al. Androgen Receptor Pathway-Independent Prostate Cancer Is Sustained through FGF Signaling. Cancer Cell 2017, 32, 474–489.e6. [Google Scholar] [CrossRef] [PubMed]
  14. Huang, R.; Wang, S.; Wang, N.; Zheng, Y.; Zhou, J.; Yang, B.; Wang, X.; Zhang, J.; Guo, L.; Wang, S.; et al. CCL5 Derived from Tumor-Associated Macrophages Promotes Prostate Cancer Stem Cells and Metastasis via Activating β-Catenin/STAT3 Signaling. Cell Death Dis. 2020, 11, 234. [Google Scholar] [CrossRef] [PubMed]
  15. Sharpe, B.; Beresford, M.; Bowen, R.; Mitchard, J.; Chalmers, A.D. Searching for Prostate Cancer Stem Cells: Markers and Methods. Stem Cell Rev. Rep. 2013, 9, 721–730. [Google Scholar] [CrossRef] [PubMed]
  16. Maitland, N.J.; Frame, F.M.; Polson, E.S.; Lewis, J.L.; Collins, A.T. Prostate Cancer Stem Cells: Do They Have a Basal or Luminal Phenotype? Horm. Cancer 2011, 2, 47–61. [Google Scholar] [CrossRef]
  17. Leong, K.G.; Wang, B.E.; Johnson, L.; Gao, W.Q. Generation of a Prostate from a Single Adult Stem Cell. Nature 2008, 456, 804–810. [Google Scholar] [CrossRef]
  18. Goldstein, A.S.; Huang, J.; Guo, C.; Garraway, I.P.; Witte, O.N. Identification of a Cell of Origin for Human Prostate Cancer. Science 2010, 329, 568–571. [Google Scholar] [CrossRef]
  19. Palanisamy, N.; Yang, J.; Shepherd, P.D.A.; Li-Ning-Tapia, E.M.; Labanca, E.; Manyam, G.C.; Ravoori, M.K.; Kundra, V.; Araujo, J.C.; Efstathiou, E.; et al. The MD Anderson Prostate Cancer Patient-Derived Xenograft Series (MDA PCa PDX) Captures the Molecular Landscape of Prostate Cancer and Facilitates Marker-Driven Therapy Development. Clin. Cancer Res. 2020, 26, 4933–4946. [Google Scholar] [CrossRef]
  20. Anselmino, N.; Labanca, E.; Shepherd, P.D.A.; Dong, J.; Yang, J.; Song, X.; Nandakumar, S.; Kundra, R.; Lee, C.; Schultz, N.; et al. Integrative Molecular Analyses of the MD Anderson Prostate Cancer Patient-Derived Xenograft (MDA PCa PDX) Series. Clin. Cancer Res. 2024, 30, 2272–2285. [Google Scholar] [CrossRef]
  21. Serritella, A.V.; Beltran, H.; Lotan, T.L.; VanderWeele, D.J.; Karzai, F.; Madan, R.A.; Hussain, M. Large Cell Neuroendocrine Prostate Cancer: Large Is Not Small. Oncologist 2024, 29, 185–189. [Google Scholar] [CrossRef] [PubMed]
  22. Epstein, J.I.; Amin, M.B.; Beltran, H.; Lotan, T.L.; Mosquera, J.-M.; Reuter, V.E.; Robinson, B.D.; Troncoso, P.; Rubin, M.A. Proposed Morphologic Classification of Prostate Cancer With Neuroendocrine Differentiation. Am. J. Surg. Pathol. 2014, 38, 756–767. [Google Scholar] [CrossRef] [PubMed]
  23. Humphrey, P.A.; Moch, H.; Cubilla, A.L.; Ulbright, T.M.; Reuter, V.E. The 2016 WHO Classification of Tumours of the Urinary System and Male Genital Organs—Part B: Prostate and Bladder Tumours. Eur. Urol. 2016, 70, 106–119. [Google Scholar] [CrossRef] [PubMed]
  24. Evans, A.J.; Humphrey, P.A.; Belani, J.; van der Kwast, T.H.; Srigley, J.R. Large Cell Neuroendocrine Carcinoma of Prostate: A Clinicopathologic Summary of 7 Cases of a Rare Manifestation of Advanced Prostate Cancer. Am. J. Surg. Pathol. 2006, 30, 684–693. [Google Scholar] [CrossRef]
  25. Nguyen, N.; Ronald Dean Franz, I.I.; Mohammed, O.; Huynh, R.; Son, C.K.; Khan, R.N.; Ahmed, B. A Systematic Review of Primary Large Cell Neuroendocrine Carcinoma of the Prostate. Front. Oncol. 2024, 14, 1341794. [Google Scholar] [CrossRef]
  26. Aggarwal, R.; Zhang, T.; Small, E.J.; Armstrong, A.J. Neuroendocrine Prostate Cancer: Subtypes, Biology, and Clinical Outcomes. J. Natl. Compr. Cancer Netw. 2014, 12, 719–726. [Google Scholar] [CrossRef]
  27. Bhagirath, D.; Liston, M.; Akoto, T.; Lui, B.; Bensing, B.A.; Sharma, A.; Saini, S. Novel, Non-Invasive Markers for Detecting Therapy Induced Neuroendocrine Differentiation in Castration-Resistant Prostate Cancer Patients. Sci. Rep. 2021, 11, 8279. [Google Scholar] [CrossRef]
  28. Dang, Q.; Li, L.; Xie, H.; He, D.; Chen, J.; Song, W.; Chang, L.S.; Chang, H.-C.; Yeh, S.; Chang, C. Anti-Androgen Enzalutamide Enhances Prostate Cancer Neuroendocrine (NE) Differentiation via Altering the Infiltrated Mast Cells → Androgen Receptor (AR) → miRNA32 Signals. Mol. Oncol. 2015, 9, 1241–1251. [Google Scholar] [CrossRef]
  29. Ou, Y.-H.; Jiang, Y.-D.; Li, Q.; Zhuang, Y.-J.; Dang, Q.; Tan, W.-L. Infiltrating mast cells promote neuroendocrine differentiation and increase docetaxel resistance of prostate cancer cells by up-regulating p21. Nan Fang Yi Ke Da Xue Xue Bao 2018, 38, 723–730. [Google Scholar] [CrossRef]
  30. Maimaitiyiming, A.; An, H.; Xing, C.; Li, X.; Li, Z.; Bai, J.; Luo, C.; Zhuo, T.; Huang, X.; Maimaiti, A.; et al. Machine Learning-Driven Mast Cell Gene Signatures for Prognostic and Therapeutic Prediction in Prostate Cancer. Heliyon 2024, 10, e35157. [Google Scholar] [CrossRef]
  31. Aller, M.-A.; Arias, A.; Arias, J.-I.; Arias, J. Carcinogenesis: The Cancer Cell–Mast Cell Connection. Inflamm. Res. 2019, 68, 103–116. [Google Scholar] [CrossRef]
  32. Conteduca, V.; Oromendia, C.; Eng, K.W.; Bareja, R.; Sigouros, M.; Molina, A.; Faltas, B.M.; Sboner, A.; Mosquera, J.M.; Elemento, O.; et al. Clinical Features of Neuroendocrine Prostate Cancer. Eur. J. Cancer 2019, 121, 7–18. [Google Scholar] [CrossRef]
  33. Cherif, C.; Nguyen, D.T.; Paris, C.; Le, T.K.; Sefiane, T.; Carbuccia, N.; Finetti, P.; Chaffanet, M.; Kaoutari, A.E.; Vernerey, J.; et al. Menin Inhibition Suppresses Castration-Resistant Prostate Cancer and Enhances Chemosensitivity. Oncogene 2022, 41, 125–137. [Google Scholar] [CrossRef]
  34. Quan, Y.; Zhang, X.; Wang, M.; Ping, H. Histone Lysine Methylation Patterns in Prostate Cancer Microenvironment Infiltration: Integrated Bioinformatic Analysis and Histological Validation. Front. Oncol. 2022, 12, 981226. [Google Scholar] [CrossRef] [PubMed]
  35. Tzelepi, V.; Logotheti, S.; Efstathiou, E.; Troncoso, P.; Aparicio, A.; Sakellakis, M.; Hoang, A.; Perimenis, P.; Melachrinou, M.; Logothetis, C.; et al. Epigenetics and Prostate Cancer: Defining the Timing of DNA Methyltransferase Deregulation during Prostate Cancer Progression. Pathology 2020, 52, 218–227. [Google Scholar] [CrossRef] [PubMed]
  36. Enz, N.; Vliegen, G.; De Meester, I.; Jungraithmayr, W. CD26/DPP4—A Potential Biomarker and Target for Cancer Therapy. Pharmacol. Ther. 2019, 198, 135–159. [Google Scholar] [CrossRef] [PubMed]
  37. Burdelski, C.; Strauss, C.; Tsourlakis, M.C.; Kluth, M.; Hube-Magg, C.; Melling, N.; Lebok, P.; Minner, S.; Koop, C.; Graefen, M.; et al. Overexpression of Thymidylate Synthase (TYMS) Is Associated with Aggressive Tumor Features and Early PSA Recurrence in Prostate Cancer. Oncotarget 2015, 6, 8377–8387. [Google Scholar] [CrossRef] [PubMed]
  38. Ngan, E.S.W.; Hashimoto, Y.; Ma, Z.-Q.; Tsai, M.-J.; Tsai, S.Y. Overexpression of Cdc25B, an Androgen Receptor Coactivator, in Prostate Cancer. Oncogene 2003, 22, 734–739. [Google Scholar] [CrossRef] [PubMed]
  39. Roberts, B.K.; Collado, G.; Barnes, B.J. Role of Interferon Regulatory Factor 5 (IRF5) in Tumor Progression: Prognostic and Therapeutic Potential. Biochim. Biophys. Acta BBA—Rev. Cancer 2024, 1879, 189061. [Google Scholar] [CrossRef]
  40. Chen, H.; Fang, S.; Zhu, X.; Liu, H. Cancer-Associated Fibroblasts and Prostate Cancer Stem Cells: Crosstalk Mechanisms and Implications for Disease Progression. Front. Cell Dev. Biol. 2024, 12, 1412337. [Google Scholar] [CrossRef]
  41. Ellis, L.; Loda, M. Advanced Neuroendocrine Prostate Tumors Regress to Stemness. Proc. Natl. Acad. Sci. USA 2015, 112, 14406–14407. [Google Scholar] [CrossRef] [PubMed]
  42. Chakraborty, G.; Gupta, K.; Kyprianou, N. Epigenetic Mechanisms Underlying Subtype Heterogeneity and Tumor Recurrence in Prostate Cancer. Nat. Commun. 2023, 14, 567. [Google Scholar] [CrossRef]
  43. Guijarro, M.V.; Nawab, A.; Dib, P.; Burkett, S.; Luo, X.; Feely, M.; Nasri, E.; Seifert, R.P.; Kaye, F.J.; Zajac-Kaye, M. TYMS Promotes Genomic Instability and Tumor Progression in Ink4a/Arf Null Background. Oncogene 2023, 42, 1926–1939. [Google Scholar] [CrossRef]
  44. Ibe, T.; Shimizu, K.; Nakano, T.; Kakegawa, S.; Kamiyoshihara, M.; Nakajima, T.; Kaira, K.; Takeyoshi, I. High-Grade Neuroendocrine Carcinoma of the Lung Shows Increased Thymidylate Synthase Expression Compared to Other Histotypes. J. Surg. Oncol. 2010, 102, 11–17. [Google Scholar] [CrossRef]
  45. Gao, H.; Korn, J.M.; Ferretti, S.; Monahan, J.E.; Wang, Y.; Singh, M.; Zhang, C.; Schnell, C.; Yang, G.; Zhang, Y.; et al. High-Throughput Screening Using Patient-Derived Tumor Xenografts to Predict Clinical Trial Drug Response. Nat. Med. 2015, 21, 1318–1325. [Google Scholar] [CrossRef] [PubMed]
  46. Grasso, C.S.; Wu, Y.M.; Robinson, D.R.; Cao, X.; Dhanasekaran, S.M.; Khan, A.P.; Quist, M.J.; Jing, X.; Lonigro, R.J.; Brenner, J.C.; et al. The Mutational Landscape of Lethal Castration-Resistant Prostate Cancer. Nature 2012, 487, 239–243. [Google Scholar] [CrossRef] [PubMed]
  47. Lapointe, J.; Li, C.; Higgins, J.P.; Van De Rijn, M.; Bair, E.; Montgomery, K.; Ferrari, M.; Egevad, L.; Rayford, W.; Bergerheim, U.; et al. Gene Expression Profiling Identifies Clinically Relevant Subtypes of Prostate Cancer. Proc. Natl. Acad. Sci. USA 2004, 101, 811–816. [Google Scholar] [CrossRef]
  48. Malhotra, S.; Lapointe, J.; Salari, K.; Higgins, J.P.; Ferrari, M.; Montgomery, K.; van de Rijn, M.; Brooks, J.D.; Pollack, J.R. A Tri-Marker Proliferation Index Predicts Biochemical Recurrence after Surgery for Prostate Cancer. PLoS ONE 2011, 6, e20293. [Google Scholar] [CrossRef]
  49. Mortensen, M.M.; Høyer, S.; Lynnerup, A.S.; Ørntoft, T.F.; Sørensen, K.D.; Borre, M.; Dyrskjøt, L. Expression Profiling of Prostate Cancer Tissue Delineates Genes Associated with Recurrence after Prostatectomy. Sci. Rep. 2015, 5, 16018. [Google Scholar] [CrossRef]
  50. Wallace, T.A.; Prueitt, R.L.; Yi, M.; Howe, T.M.; Gillespie, J.W.; Yfantis, H.G.; Stephens, R.M.; Caporaso, N.E.; Loffredo, C.A.; Ambs, S. Tumor Immunobiological Differences in Prostate Cancer between African-American and European-American Men. Cancer Res. 2008, 68, 927–936. [Google Scholar] [CrossRef]
  51. Ross-Adams, H.; Lamb, A.; Dunning, M.; Halim, S.; Lindberg, J.; Massie, C.; Egevad, L.; Russell, R.; Ramos-Montoya, A.; Vowler, S.; et al. Integration of Copy Number and Transcriptomics Provides Risk Stratification in Prostate Cancer: A Discovery and Validation Cohort Study. eBioMedicine 2015, 2, 1133–1144. [Google Scholar] [CrossRef] [PubMed]
  52. TCGA-PRAD. Available online: https://portal.gdc.cancer.gov/projects/TCGA-PRAD (accessed on 4 August 2021).
  53. Taylor, B.S.; Schultz, N.; Hieronymus, H.; Gopalan, A.; Xiao, Y.; Carver, B.S.; Arora, V.K.; Kaushik, P.; Cerami, E.; Reva, B.; et al. Integrative Genomic Profiling of Human Prostate Cancer. Cancer Cell 2010, 18, 11–22. [Google Scholar] [CrossRef] [PubMed]
  54. Shi, W.; Oshlack, A.; Smyth, G.K. Optimizing the Noise versus Bias Trade-off for Illumina Whole Genome Expression BeadChips. Nucleic Acids Res. 2010, 38, e204. [Google Scholar] [CrossRef]
  55. Law, C.W.; Chen, Y.; Shi, W.; Smyth, G.K. Voom: Precision Weights Unlock Linear Model Analysis Tools for RNA-Seq Read Counts. Genome Biol. 2014, 15, R29. [Google Scholar] [CrossRef] [PubMed]
  56. Jain, S.; Lyons, C.A.; Walker, S.M.; McQuaid, S.; Hynes, S.O.; Mitchell, D.M.; Pang, B.; Logan, G.E.; McCavigan, A.M.; O’Rourke, D.; et al. Validation of a Metastatic Assay Using Biopsies to Improve Risk Stratification in Patients with Prostate Cancer Treated with Radical Radiation Therapy. Ann. Oncol. 2018, 29, 215–222. [Google Scholar] [CrossRef]
  57. Sboner, A.; Demichelis, F.; Calza, S.; Pawitan, Y.; Setlur, S.R.; Hoshida, Y.; Perner, S.; Adami, H.O.; Fall, K.; Mucci, L.A.; et al. Molecular Sampling of Prostate Cancer: A Dilemma for Predicting Disease Progression. BMC Med. Genom. 2010, 3, 8. [Google Scholar] [CrossRef]
  58. Long, Q.; Xu, J.; Osunkoya, A.O.; Sannigrahi, S.; Johnson, B.A.; Zhou, W.; Gillespie, T.; Park, J.Y.; Nam, R.K.; Sugar, L.; et al. Global Transcriptome Analysis of Formalin-Fixed Prostate Cancer Specimens Identifies Biomarkers of Disease Recurrence. Cancer Res. 2014, 74, 3228–3237. [Google Scholar] [CrossRef]
  59. Luca, B.-A.; Brewer, D.S.; Edwards, D.R.; Edwards, S.; Whitaker, H.C.; Merson, S.; Dennis, N.; Cooper, R.A.; Hazell, S.; Warren, A.Y.; et al. DESNT: A Poor Prognosis Category of Human Prostate Cancer. Eur. Urol. Focus 2018, 4, 842–850. [Google Scholar] [CrossRef]
  60. Gerhauser, C.; Favero, F.; Risch, T.; Simon, R.; Feuerbach, L.; Assenov, Y.; Heckmann, D.; Sidiropoulos, N.; Waszak, S.M.; Hübschmann, D.; et al. Molecular Evolution of Early-Onset Prostate Cancer Identifies Molecular Risk Markers and Clinical Trajectories. Cancer Cell 2018, 34, 996–1011.e8. [Google Scholar] [CrossRef]
  61. Abida, W.; Cyrta, J.; Heller, G.; Prandi, D.; Armenia, J.; Coleman, I.; Cieslik, M.; Benelli, M.; Robinson, D.; Van Allen, E.M.; et al. Genomic Correlates of Clinical Outcome in Advanced Prostate Cancer. Proc. Natl. Acad. Sci. USA 2019, 116, 11428–11436. [Google Scholar] [CrossRef]
  62. Bland, J.M.; Altman, D.G. The Logrank Test. BMJ 2004, 328, 1073. [Google Scholar] [CrossRef]
  63. Budczies, J.; Klauschen, F.; Sinn, B.V.; Gyorffy, B.; Schmitt, W.D.; Darb-Esfahani, S.; Denkert, C. Cutoff Finder: A Comprehensive and Straightforward Web Application Enabling Rapid Biomarker Cutoff Optimization. PLoS ONE 2012, 7, e51862. [Google Scholar] [CrossRef]
  64. Breslow, N.E. Analysis of Survival Data under the Proportional Hazards Model. Int. Stat. Rev. Rev. Int. Stat. 1975, 43, 45–57. [Google Scholar] [CrossRef]
  65. Therneau, T. A Package for Survival Analysis in S. R Package Version. 1999. Available online: https://www.mayo.edu/research/documents/tr53pdf/doc-10027379 (accessed on 22 September 2024).
  66. Kassambara, A.; Kosinski, M.; Biecek, P.; Fabian, S.; Package ‘Survminer’. Drawing Survival Curves Using ‘ggplot2’. R Package Version 0.3.1. 2014. Available online: https://cran.r-project.org/web/packages/survminer/index.html (accessed on 22 September 2024).
  67. Ishwaran, H.; Kogalur, U.B. randomForestSRC: Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). 2024. Available online: https://cran.r-project.org/web/packages/randomForestSRC/index.html (accessed on 22 September 2024).
  68. Ishwaran, H.; Lu, M. Standard Errors and Confidence Intervals for Variable Importance in Random Forest Regression, Classification, and Survival. Stat. Med. 2019, 38, 558–582. [Google Scholar] [CrossRef] [PubMed]
  69. Kolde, R. Pheatmap: Pretty Heatmaps. 2019. Available online: https://cran.r-project.org/web/packages/pheatmap/pheatmap.pdf (accessed on 22 September 2024).
  70. Kassambara, A.; Mundt, F. Factoextra: Extract and Visualize the Results of Multivariate Data Analyses; R Package. 2020. Available online: https://cran.r-project.org/web/packages/factoextra/index.html (accessed on 22 September 2024).
  71. Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J.-C.; Müller, M. pROC: Display and Analyze ROC Curves; 2010; 1.18.5. Available online: https://cran.r-project.org/web/packages/pROC/pROC.pdf (accessed on 22 September 2024).
  72. Cerami, E.; Gao, J.; Dogrusoz, U.; Gross, B.E.; Sumer, S.O.; Aksoy, B.A.; Jacobsen, A.; Byrne, C.J.; Heuer, M.L.; Larsson, E.; et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discov. 2012, 2, 401–404. [Google Scholar] [CrossRef]
  73. Gao, J.; Aksoy, B.A.; Dogrusoz, U.; Dresdner, G.; Gross, B.; Sumer, S.O.; Sun, Y.; Jacobsen, A.; Sinha, R.; Larsson, E.; et al. Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal. Sci. Signal. 2013, 6, pl1. [Google Scholar] [CrossRef]
  74. de Bruijn, I.; Kundra, R.; Mastrogiacomo, B.; Tran, T.N.; Sikina, L.; Mazor, T.; Li, X.; Ochoa, A.; Zhao, G.; Lai, B.; et al. Analysis and Visualization of Longitudinal Genomic and Clinical Data from the AACR Project GENIE Biopharma Collaborative in cBioPortal. Cancer Res. 2023, 83, 3861–3867. [Google Scholar] [CrossRef]
  75. Dexter, T.A. R: A Language and Environment for Statistical Computing. Quat. Res. 2014, 81, 114–124. [Google Scholar] [CrossRef]
  76. RStudio. RStudio|Open Source & Professional Software for Data Science Teams—RStudio. Available online: https://www.rstudio.com/ (accessed on 22 September 2021).
  77. Wickham, H.; Averick, M.; Bryan, J.; Chang, W.; McGowan, L.; François, R.; Grolemund, G.; Hayes, A.; Henry, L.; Hester, J.; et al. Welcome to the Tidyverse. J. Open Source Softw. 2019, 4, 1686. [Google Scholar] [CrossRef]
  78. Hadley, W. Ggplot2. Elegant Graphics for Data Analysis; Springer: New York, NY, USA, 2016; ISBN 978-3-319-24277-4. [Google Scholar]
  79. Kassambara, A. Ggpubr: “ggplot2” Based Publication Ready Plots. R Package Version 0.4.0.999. 2020. Available online: https://rpkgs.datanovia.com/ggpubr/ (accessed on 22 September 2024).
  80. Neuwirth, E.; Maindonald, J. Package “RColorBrewer”. 2015. Available online: http://cran.nexr.com/web/packages/RColorBrewer/RColorBrewer.pdf (accessed on 22 September 2024).
  81. Davis, S.; Meltzer, P.S. GEOquery: A Bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 2007, 23, 1846–1847. [Google Scholar] [CrossRef]
Figure 1. Stemness-associated gene expression changes in PCa patient samples using multiple public datasets. (A) Schematic representation of gene selection, transcriptomics, and survival analyses to define potential prognostic biomarkers. (B) (i) Volcano plots showing the results of the differential expression analyses of all available genes within the included transcriptomics datasets. Red = significantly upregulated stemness-associated gene. Blue = significantly downregulated stemness-associated gene. Dark gray = Non-significantly dysregulated stemness-associated genes. Light gray = other genes available in the dataset. (ii) Summary heatmap of the transcriptomics analyses performed in multiple publicly available datasets (n = 1259). Genes of interest and the results of the differential expression analysis for each dataset are displayed. Each row represents the results of a specific comparison. Annotation depicts the absolute number (#) of comparisons in which each gene is upregulated (red) or downregulated (blue). Red = significantly upregulated gene. Blue = significantly downregulated gene. White = not significant changes. Gray = not available. Datasets: GSE35988 (n = 122); GSE3933 (n = 103); GSE46602 (n = 50); GSE6956 (n = 87); GSE70768 (n = 179); TCGA-PRAD (n = 548); GSE21034 (n = 150). Statistical significance was set at adjusted p-value < 0.05.
Figure 1. Stemness-associated gene expression changes in PCa patient samples using multiple public datasets. (A) Schematic representation of gene selection, transcriptomics, and survival analyses to define potential prognostic biomarkers. (B) (i) Volcano plots showing the results of the differential expression analyses of all available genes within the included transcriptomics datasets. Red = significantly upregulated stemness-associated gene. Blue = significantly downregulated stemness-associated gene. Dark gray = Non-significantly dysregulated stemness-associated genes. Light gray = other genes available in the dataset. (ii) Summary heatmap of the transcriptomics analyses performed in multiple publicly available datasets (n = 1259). Genes of interest and the results of the differential expression analysis for each dataset are displayed. Each row represents the results of a specific comparison. Annotation depicts the absolute number (#) of comparisons in which each gene is upregulated (red) or downregulated (blue). Red = significantly upregulated gene. Blue = significantly downregulated gene. White = not significant changes. Gray = not available. Datasets: GSE35988 (n = 122); GSE3933 (n = 103); GSE46602 (n = 50); GSE6956 (n = 87); GSE70768 (n = 179); TCGA-PRAD (n = 548); GSE21034 (n = 150). Statistical significance was set at adjusted p-value < 0.05.
Ijms 25 11356 g001
Figure 2. Uni and multivariable survival analysis. (A) (i) Examples of Kaplan–Meier (KM) curves depicting the association of each gene to the risk of event (purple = high expression of a gene; green: low expression of a gene). HR: Hazard Ratio; Cox p: p-value from the Cox proportional hazards model. Log-rank p: p-value of the log-rank test. (ii) Summary heatmap of the univariable survival analyses performed on multiple datasets. (B) (i) Examples of forest plots depicting the association of each gene to the risk of event adjusted for all available covariates using the TCGA-PRAD dataset. (ii) Summary heatmap of the multivariable survival analyses performed on multiple datasets. Red boxes indicates that high gene expression is associated with a higher risk of an event (HR > 1 and Cox p < 0.05), blue boxes indicate that high gene expression is associated with a lower risk of survival-related events (HR < 1 and Cox p < 0.05), and white boxes indicate that there are no significant associations between gene expression and risk of an event. Gray = gene not available. Patients were stratified by the optimal cutoff for each gene, calculated using the Cutoff Finder tool. All comparisons consider low-expression patients as the reference group. Annotation depicts the absolute number (#) of comparisons in which high expression of each gene is associated with high (red) or low (blue) risk. OS: overall survival; DSS: disease-specific survival; PFS: progression-free survival; RFS: relapse-free survival; MFS: metastasis-free survival. Datasets: TCGA-PRAD (n = 497 PFS, n = 337 DFS); GSE70768 (n = 111 RFS); GSE70769 (n = 92 RFS); GSE116918 (n = 248 RFS and MFS); GSE16560 (n = 281 OS). Statistical significance was set at Cox p < 0.05. ** Cox p < 0.01; *** Cox p < 0.001.
Figure 2. Uni and multivariable survival analysis. (A) (i) Examples of Kaplan–Meier (KM) curves depicting the association of each gene to the risk of event (purple = high expression of a gene; green: low expression of a gene). HR: Hazard Ratio; Cox p: p-value from the Cox proportional hazards model. Log-rank p: p-value of the log-rank test. (ii) Summary heatmap of the univariable survival analyses performed on multiple datasets. (B) (i) Examples of forest plots depicting the association of each gene to the risk of event adjusted for all available covariates using the TCGA-PRAD dataset. (ii) Summary heatmap of the multivariable survival analyses performed on multiple datasets. Red boxes indicates that high gene expression is associated with a higher risk of an event (HR > 1 and Cox p < 0.05), blue boxes indicate that high gene expression is associated with a lower risk of survival-related events (HR < 1 and Cox p < 0.05), and white boxes indicate that there are no significant associations between gene expression and risk of an event. Gray = gene not available. Patients were stratified by the optimal cutoff for each gene, calculated using the Cutoff Finder tool. All comparisons consider low-expression patients as the reference group. Annotation depicts the absolute number (#) of comparisons in which high expression of each gene is associated with high (red) or low (blue) risk. OS: overall survival; DSS: disease-specific survival; PFS: progression-free survival; RFS: relapse-free survival; MFS: metastasis-free survival. Datasets: TCGA-PRAD (n = 497 PFS, n = 337 DFS); GSE70768 (n = 111 RFS); GSE70769 (n = 92 RFS); GSE116918 (n = 248 RFS and MFS); GSE16560 (n = 281 OS). Statistical significance was set at Cox p < 0.05. ** Cox p < 0.01; *** Cox p < 0.001.
Ijms 25 11356 g002
Figure 3. Machine learning Random Forest algorithm for prognostic candidates’ selection. (A) Heatmap summarizing the relative importance of the variables (genes) for all training datasets. The relative importance was converted into percentiles, where 1 represents maximum relative importance (red) and 0 indicates minimum relative importance (blue). Gray = gene not available in the dataset. The 15 top-ranked genes (purple box) were selected as candidates for our stemness-associated risk signature. (B) (i) Example of Kaplan–Meier (KM) curve using the TCGA-PRAD dataset depicting the association of the seven-gene score to the risk of progression (purple = high seven-gene score; green: low seven-gene score). The coefficients for each gene were calculated by Lasso regression using TCGA-PRAD data, and the seven-gene score was constructed as follows: 0.284 × KMT5C + 0.272 × MEN1 + 0.218 × TYMS + 0.090 × IRF5 + 0.083 × DNMT3B + 0.048 × CDC25B − 0.060 × DPP4. Patients were stratified by the median of the score. HR: Hazard Ratio; p-value: p-value from the Cox proportional hazards model. Log-rank p: p-value of the Log-rank test. (ii) Summary forest plot displaying the survival analysis of the association of the seven-gene signature with the risk of disease-progression events in the training datasets. Patients’ survival was analyzed by either stratification by the median of the seven-gene score (circles) or taking the seven-gene score as a continuous variable (squares). Red corresponds to statistically significant associations (Cox p < 0.05) and gray corresponds to not significant associations. On the right, heatmap depicting the concordance index value for each of the analyses. The concordance index is a performance measure of the signature within each dataset. Cox p: p-value of the Cox regression coefficient. HR = Hazard Ratio. (95% CI) = 95% Confidence Interval. PFS: progression-free survival; DFS: disease-free survival; RFS: relapse-free survival; OS: overall survival; MFS: metastasis-free survival. Statistical significance was set at Cox p < 0.05 (red). * Cox p < 0.05; ** Cox p < 0.01; *** Cox p < 0.001; **** Cox p < 0.0001.
Figure 3. Machine learning Random Forest algorithm for prognostic candidates’ selection. (A) Heatmap summarizing the relative importance of the variables (genes) for all training datasets. The relative importance was converted into percentiles, where 1 represents maximum relative importance (red) and 0 indicates minimum relative importance (blue). Gray = gene not available in the dataset. The 15 top-ranked genes (purple box) were selected as candidates for our stemness-associated risk signature. (B) (i) Example of Kaplan–Meier (KM) curve using the TCGA-PRAD dataset depicting the association of the seven-gene score to the risk of progression (purple = high seven-gene score; green: low seven-gene score). The coefficients for each gene were calculated by Lasso regression using TCGA-PRAD data, and the seven-gene score was constructed as follows: 0.284 × KMT5C + 0.272 × MEN1 + 0.218 × TYMS + 0.090 × IRF5 + 0.083 × DNMT3B + 0.048 × CDC25B − 0.060 × DPP4. Patients were stratified by the median of the score. HR: Hazard Ratio; p-value: p-value from the Cox proportional hazards model. Log-rank p: p-value of the Log-rank test. (ii) Summary forest plot displaying the survival analysis of the association of the seven-gene signature with the risk of disease-progression events in the training datasets. Patients’ survival was analyzed by either stratification by the median of the seven-gene score (circles) or taking the seven-gene score as a continuous variable (squares). Red corresponds to statistically significant associations (Cox p < 0.05) and gray corresponds to not significant associations. On the right, heatmap depicting the concordance index value for each of the analyses. The concordance index is a performance measure of the signature within each dataset. Cox p: p-value of the Cox regression coefficient. HR = Hazard Ratio. (95% CI) = 95% Confidence Interval. PFS: progression-free survival; DFS: disease-free survival; RFS: relapse-free survival; OS: overall survival; MFS: metastasis-free survival. Statistical significance was set at Cox p < 0.05 (red). * Cox p < 0.05; ** Cox p < 0.01; *** Cox p < 0.001; **** Cox p < 0.0001.
Ijms 25 11356 g003
Figure 4. Gene signature’s performance across external validation datasets. (A) (i) Kaplan–Meier curves depicting the association of the seven-gene score to the risk of disease-progression events included in the validation datasets. (ii) Kaplan–Meier curve depicting the association of the seven-gene score to the risk of death of metastatic PCa patients from the SU2C dataset. The coefficients for each gene were calculated by Lasso regression using TCGA-PRAD data, and the seven-gene score was calculated as follows: 0.284 × KMT5C + 0.272 × MEN1 + 0.218 × TYMS + 0.090 × IRF5 + 0.083 × DNMT3B + 0.048 × CDC25B − 0.060 × DPP4. Patients were stratified by the median of the score. HR: Hazard Ratio; Cox p: p-value from the Cox proportional hazards model. Log-rank p: p-value of the Log-rank test. (iii) Summary forest plot displaying the survival analysis of the association of the seven-gene signature with the risk of disease-progression events in the validation datasets. Patients’ survival was analyzed by either stratification by the median of the seven-gene score (circles) or taking the seven-gene score as a continuous variable (squares). On the right, heatmap depicting the concordance index value for each of the analyses. The concordance index is a performance measure of the signature within each dataset. RFS: relapse-free survival; OS: overall survival. (B) Forest plots depicting the association of each gene to the risk of event adjusted for all available covariates within each validation dataset. Red corresponds to statistically significant associations (Cox p < 0.05) and gray corresponds to not significant associations. Cox p: p-value of the Cox regression coefficient. HR = Hazard Ratio [95% CI] = 95% Confidence Interval. Datasets: GSE54460 (n = 106); GSE94767 (n = 233); DKFZ (n = 81); SU2C-PCF (n = 81). Statistical significance was set at Cox p < 0.05 (red). * Cox p < 0.05; ** Cox p < 0.01; *** Cox p < 0.001; **** Cox p < 0.0001.
Figure 4. Gene signature’s performance across external validation datasets. (A) (i) Kaplan–Meier curves depicting the association of the seven-gene score to the risk of disease-progression events included in the validation datasets. (ii) Kaplan–Meier curve depicting the association of the seven-gene score to the risk of death of metastatic PCa patients from the SU2C dataset. The coefficients for each gene were calculated by Lasso regression using TCGA-PRAD data, and the seven-gene score was calculated as follows: 0.284 × KMT5C + 0.272 × MEN1 + 0.218 × TYMS + 0.090 × IRF5 + 0.083 × DNMT3B + 0.048 × CDC25B − 0.060 × DPP4. Patients were stratified by the median of the score. HR: Hazard Ratio; Cox p: p-value from the Cox proportional hazards model. Log-rank p: p-value of the Log-rank test. (iii) Summary forest plot displaying the survival analysis of the association of the seven-gene signature with the risk of disease-progression events in the validation datasets. Patients’ survival was analyzed by either stratification by the median of the seven-gene score (circles) or taking the seven-gene score as a continuous variable (squares). On the right, heatmap depicting the concordance index value for each of the analyses. The concordance index is a performance measure of the signature within each dataset. RFS: relapse-free survival; OS: overall survival. (B) Forest plots depicting the association of each gene to the risk of event adjusted for all available covariates within each validation dataset. Red corresponds to statistically significant associations (Cox p < 0.05) and gray corresponds to not significant associations. Cox p: p-value of the Cox regression coefficient. HR = Hazard Ratio [95% CI] = 95% Confidence Interval. Datasets: GSE54460 (n = 106); GSE94767 (n = 233); DKFZ (n = 81); SU2C-PCF (n = 81). Statistical significance was set at Cox p < 0.05 (red). * Cox p < 0.05; ** Cox p < 0.01; *** Cox p < 0.001; **** Cox p < 0.0001.
Ijms 25 11356 g004
Figure 5. Transcriptomic analysis of the MDA PCa PDX series. (A) Schematic representation of the MDA PCa PDX series establishment and transcriptome analysis (n = 44) (created with BioRender.com). (B) (i) Heatmap depicting unsupervised clustering analysis of RNAseq data from the 44 MDA PCa PDX series considering the expression of the seven-gene signature (KMT5C, MEN1, TYMS, IRF5, DNMT3B, CDC25B, and DPP4). Red, white, and blue represent greater, intermediate, and lower gene expression levels. (ii) Violin plot showing the seven-gene score levels in no-NEPC and NEPC samples from the MDA PCa PDX series. (iii) Violin plots showing the expression levels (FPKM) of the genes included in the seven-gene score in no-NEPC and NEPC samples from the MDA PCa PDX series. (C) (i) PCA biplot considering the expression of the seven-gene signature using the MDA PCa PDX data assessed by RNA-seq. Each point represents one PDX. Samples are colored according to the histopathological classification: adenocarcinoma (red), sarcomatoid (beige), and neuroendocrine (purple). (ii) Bar plot showing the contribution (%) of each gene in the signature to the variance in the PC1 from the PCA. The red dashed line depicts the expected average contribution if all genes weighed the same (value = 14.29%). (D) ROC curve showing the performance of the seven-gene score in classifying MDA PCa PDX series as NEPC. 95% CI = 95% Confidence Interval. Statistical significance was calculated using Mann–Whitney test and was set at p < 0.05.
Figure 5. Transcriptomic analysis of the MDA PCa PDX series. (A) Schematic representation of the MDA PCa PDX series establishment and transcriptome analysis (n = 44) (created with BioRender.com). (B) (i) Heatmap depicting unsupervised clustering analysis of RNAseq data from the 44 MDA PCa PDX series considering the expression of the seven-gene signature (KMT5C, MEN1, TYMS, IRF5, DNMT3B, CDC25B, and DPP4). Red, white, and blue represent greater, intermediate, and lower gene expression levels. (ii) Violin plot showing the seven-gene score levels in no-NEPC and NEPC samples from the MDA PCa PDX series. (iii) Violin plots showing the expression levels (FPKM) of the genes included in the seven-gene score in no-NEPC and NEPC samples from the MDA PCa PDX series. (C) (i) PCA biplot considering the expression of the seven-gene signature using the MDA PCa PDX data assessed by RNA-seq. Each point represents one PDX. Samples are colored according to the histopathological classification: adenocarcinoma (red), sarcomatoid (beige), and neuroendocrine (purple). (ii) Bar plot showing the contribution (%) of each gene in the signature to the variance in the PC1 from the PCA. The red dashed line depicts the expected average contribution if all genes weighed the same (value = 14.29%). (D) ROC curve showing the performance of the seven-gene score in classifying MDA PCa PDX series as NEPC. 95% CI = 95% Confidence Interval. Statistical significance was calculated using Mann–Whitney test and was set at p < 0.05.
Ijms 25 11356 g005
Figure 6. Clinical validation in NEPC samples. (A) (i) Heatmap depicting an unsupervised clustering analysis of RNAseq data from the MDA PCa PDX series considering the expression of the 70-gene signature proposed by Beltran et al. [12]. (ii) Heatmap depicting an unsupervised clustering analysis of RNAseq data from the MDA PCa PDX series considering the expression of the 70-gene signature proposed by Beltran et al. plus the 7 genes (KMT5C, MEN1, TYMS, IRF5, DNMT3B, CDC25B, and DPP4) from the risk score model proposed in our work. (B) (i) Heatmap depicting an unsupervised clustering analysis of RNAseq data from human patients in Beltran et al., dataset (n = 49) [12] considering the expression of the seven-gene signature. Red, white, and blue represent greater, intermediate, and lower gene expression levels. Expression values are presented as z-scores. (ii) Violin plot showing seven-gene score levels in Castration-Resistant Prostate Cancer-Adenocarcinoma (CRPC-Adeno) and CRPC-Neuroendocrine (NE) samples from the Beltran et al., dataset. (C) (i) Violin plot showing risk score levels in samples from the Beltran et al., dataset according to the histological classification: prostate adenocarcinoma without NE differentiation, prostate adenocarcinoma with NE differentiation >20%, small-cell carcinoma, large-cell NE carcinoma, and mixed small-cell carcinoma–adenocarcinoma. (ii) ROC curve showing the performance of the seven-gene score in classifying PCa patient samples from Beltran et al.’s dataset as Large-Cell NEPC. 95% CI = 95% Confidence Interval. Statistical significance was calculated using Mann–Whitney test or ANOVA followed by Tukey’s test and was set at p < 0.05.
Figure 6. Clinical validation in NEPC samples. (A) (i) Heatmap depicting an unsupervised clustering analysis of RNAseq data from the MDA PCa PDX series considering the expression of the 70-gene signature proposed by Beltran et al. [12]. (ii) Heatmap depicting an unsupervised clustering analysis of RNAseq data from the MDA PCa PDX series considering the expression of the 70-gene signature proposed by Beltran et al. plus the 7 genes (KMT5C, MEN1, TYMS, IRF5, DNMT3B, CDC25B, and DPP4) from the risk score model proposed in our work. (B) (i) Heatmap depicting an unsupervised clustering analysis of RNAseq data from human patients in Beltran et al., dataset (n = 49) [12] considering the expression of the seven-gene signature. Red, white, and blue represent greater, intermediate, and lower gene expression levels. Expression values are presented as z-scores. (ii) Violin plot showing seven-gene score levels in Castration-Resistant Prostate Cancer-Adenocarcinoma (CRPC-Adeno) and CRPC-Neuroendocrine (NE) samples from the Beltran et al., dataset. (C) (i) Violin plot showing risk score levels in samples from the Beltran et al., dataset according to the histological classification: prostate adenocarcinoma without NE differentiation, prostate adenocarcinoma with NE differentiation >20%, small-cell carcinoma, large-cell NE carcinoma, and mixed small-cell carcinoma–adenocarcinoma. (ii) ROC curve showing the performance of the seven-gene score in classifying PCa patient samples from Beltran et al.’s dataset as Large-Cell NEPC. 95% CI = 95% Confidence Interval. Statistical significance was calculated using Mann–Whitney test or ANOVA followed by Tukey’s test and was set at p < 0.05.
Ijms 25 11356 g006
Table 1. PCa transcriptomics datasets for differential expression analysis.
Table 1. PCa transcriptomics datasets for differential expression analysis.
DatasetSamples
GSE35988 [46] Localized PCa (n = 59), matched benign prostate tissues (n = 28), and metastatic CRPC (n = 35).
GSE3933 [47,48]Localized PCa (n = 62) and normal prostate (n = 41).
GSE46602 [49]PCa (n = 36) and benign tissue (n = 14).
GSE6956 [50]Primary PCa (n = 69) and normal adjacent prostate (n = 18).
GSE70768 [51]Primary PCa (n = 112), benign tissue (n = 74) and CRPC (n = 13).
TCGA-PRAD [52]Primary PCa (n = 497) and normal adjacent tissue samples (n = 51).
GSE21034 [53]Primary PCa (n = 131) and metastatic tissue samples (n = 19).
Table 2. PCa transcriptomics datasets for survival analyses.
Table 2. PCa transcriptomics datasets for survival analyses.
DatasetSamplesSurvival EndpointCovariatesCohort
TCGA-PRAD [52]497 PCa (RNAseq)Disease Progression
Disease-Free Time (n = 337)
Gleason Group, PSA levels, Clinical T Stage, Targeted Molecular/Radiation TherapyTraining
GSE70768 [51]111 PCa (Microarray)Biochemical RelapseAge, Gleason Group, PSA levels, T StageTraining
GSE70769 [51]92 PCa (Microarray)Biochemical RelapseGleason Group, PSA levels, T StageTraining
GSE116918 [56]248 PCa (Microarray)Metastasis Development
Relapse
Age, Gleason Score, PSA levels, T StageTraining
GSE16560 [57]281 PCa (Microarray)DeathAge, Gleason GroupTraining
GSE54460 [58]106 PCa (RNA-seq)Biochemical RelapseAge, Gleason Score,
PSA levels, T Stage
Validation
GSE94767 [59]233 PCa (Microarray)Biochemical RelapseGleason Group, PSA levels, T StageValidation
DKFZ [60]81 PCa (RNA-seq)Biochemical RelapseAge, Gleason Score, PSA levels, T StageValidation
SU2C-PCF [61]81 metastatic CRPC (RNA-seq)DeathAge, Gleason Score, PSA levelsValidation
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sabater, A.; Sanchis, P.; Seniuk, R.; Pascual, G.; Anselmino, N.; Alonso, D.F.; Cayol, F.; Vazquez, E.; Marti, M.; Cotignola, J.; et al. Unmasking Neuroendocrine Prostate Cancer with a Machine Learning-Driven Seven-Gene Stemness Signature That Predicts Progression. Int. J. Mol. Sci. 2024, 25, 11356. https://doi.org/10.3390/ijms252111356

AMA Style

Sabater A, Sanchis P, Seniuk R, Pascual G, Anselmino N, Alonso DF, Cayol F, Vazquez E, Marti M, Cotignola J, et al. Unmasking Neuroendocrine Prostate Cancer with a Machine Learning-Driven Seven-Gene Stemness Signature That Predicts Progression. International Journal of Molecular Sciences. 2024; 25(21):11356. https://doi.org/10.3390/ijms252111356

Chicago/Turabian Style

Sabater, Agustina, Pablo Sanchis, Rocio Seniuk, Gaston Pascual, Nicolas Anselmino, Daniel F. Alonso, Federico Cayol, Elba Vazquez, Marcelo Marti, Javier Cotignola, and et al. 2024. "Unmasking Neuroendocrine Prostate Cancer with a Machine Learning-Driven Seven-Gene Stemness Signature That Predicts Progression" International Journal of Molecular Sciences 25, no. 21: 11356. https://doi.org/10.3390/ijms252111356

APA Style

Sabater, A., Sanchis, P., Seniuk, R., Pascual, G., Anselmino, N., Alonso, D. F., Cayol, F., Vazquez, E., Marti, M., Cotignola, J., Toro, A., Labanca, E., Bizzotto, J., & Gueron, G. (2024). Unmasking Neuroendocrine Prostate Cancer with a Machine Learning-Driven Seven-Gene Stemness Signature That Predicts Progression. International Journal of Molecular Sciences, 25(21), 11356. https://doi.org/10.3390/ijms252111356

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop