1. Introduction
Prostate cancer (PrCa) is a tumor formed when cells grow and multiply abnormally in and around the prostate gland. When metastasized (migrated to other parts of the body), it can lead to terminal and aggressive forms with poor diagnosis [
1,
2]. Globally, PrCa is the second-most common cancer diagnosed in men, and for more than half of the global population, it is the most endemic cancer in men and the leading cause of death in Central America and Sub-Saharan Africa [
3,
4]. The American Cancer Society Cancer Facts and Figures (2025) show an estimated 35,770 deaths from 313,780 new cases of PrCa [
3]. The prostate-specific antigen test is commonly used for screening PrCa, and more recently, some blood-based and urine-based biomarkers like PTEN and PCA3 have been reported to be promising diagnostic tools [
5,
6].
While PrCa can be deadly, it is highly treatable if detected early. However, the five-year survival rate declines significantly from close to 100% for early stages to less than 40% when the cancer has progressed to advanced metastatic stages [
3], underscoring the importance of uncovering and understanding the clinical and biological factors influencing PrCa progression. The clinical course of PrCa varies widely, while some patients exhibit dormant forms where cancer cells persist without significant growth for years, others experience aggressive metastasis despite early treatment [
7,
8,
9].
A clinically relevant endpoint in many cancer studies is the progression-free survival (PFS), which refers to the time from initiation of treatment until disease progression or death, whichever comes first [
10]. PFS serves as an indicator of cancer stabilization and has been earmarked as a proxy endpoint for overall survival (OS) that measures the time from treatment initiation until death. Compared to OS, PFS can be measured sooner with less cost, it provides earlier insight into treatment effect, and also quickens the drug development process and approval [
11]. Thus, accurate risk stratification models for PFS are crucial for tailoring intervention intensity, monitoring clinical procedures, and informing patient counseling.
Recently, PFS predictions for PrCa based on certain clinical factors have been discussed (see [
10]). At the same time, high-throughput sequencing technologies have been made available via large-scale genomic sequencing data, opening the opportunity to integrate genomic and clinical profiles of patients, giving rise to the field of clinicogenomics that analyzes clinical and genomic features together for improved prognostic and predictive modeling. As reviewed in [
12], many studies have successfully utilized genomics profiles to identify PrCa-associated genes. Recent works [
13,
14,
15] provided further examples demonstrating the usefulness of genomic information to shed light on various biomedical aspects of PrCa. Combined clinical and omics data from the TCGA PrCa dataset have been used to construct prognostic models to help predict biochemical recurrence and postsurgical PFS [
16,
17]. However, the use of clinical and genomic single-nucleotide variant (SNV) information together for modeling PrCa PFS has not yet been reported to date.
In this paper, we present our exploration into possible associative predictors of PrCa PFS using multiple survival modeling frameworks applied to clinical and genomic information for a cohort of 494 patients with PrCa. Our objective is to identify key clinical and genomics factors associated with PrCa PFS that can be used to generate hypotheses for further in-depth investigation. The clinical data contains information on patients’ demographics, treatment history, disease status, and survival times. The genomics data comprises SNVs extracted from the corresponding patients’ whole exome sequencing results. The SNV information allowed us to identify 27 likely PrCa-related (LPC) protein-coding genes based on the variants’ occurrence frequencies and functional effects, as well as downstream bioinformatics analyses. Combining such clinical and genomics information, we compiled a clinicogenomics dataset for this cohort. The survival modeling methods used to predict PrCa PFS include a traditional penalized Cox model (PCM), a well-established random survival forest (RSF) method, and a deep learning survival model (DeepSurv). While PCM captures linear relationships between variables and also provides interpretable estimates of covariate effects, RSF and DeepSurv capture non-linear, latent relationships and also rank covariates based on their relative contributions to PFS.
In this study, we aim to harness these modeling strategies to offer insights into PrCa progression by systematically analyzing the patients’ PFS using the combined clinicogenomics dataset to uncover factors associated with PFS. We also compared the effectiveness of the three statistical and machine learning models in PrCa PFS prediction. It is anticipated that clinicogenomics data modeling with multiple approaches will help reveal the most important covariates associated with PrCa PFS, which may not be apparent when clinical data is analyzed alone. As such, understanding how specific genomic alterations, treatment exposures, and clinical features interact to influence PrCa PFS is a crucial preliminary step in advancing personalized treatment guidelines for oncology. Ultimately, this integrative clinicogenomics approach has the potential to refine PFS risk stratification modeling to inform more precise and personalized therapeutic interventions.
2. Materials and Methods
The methodological procedure developed for this work is summarized in
Figure 1. This flowchart shows the main steps of data collection, integration and analysis. Clinicogenomics features were constructed by integrating curated clinical variables with selected SNV-based genomic features. Model development followed a consistent workflow across all modelling approaches, including data preprocessing, partitioning and parameter tuning strategies, performance assessment methods. Extended tuning, diagnostic plots and all workflow implementation details are available as
Supplementary Materials (SF01_pipeline_modules_data and SF02_data_processing) and are also publicly accessible through the project’s GitHub repository—
https://github.com/kelvin-meyet/ClinicoGenomicInsights (accessed on 5 February 2026).
2.1. Clinical Data Preparation
Clinical data source for this work is the cBioPortal for cancer genomics, an interactive open-source platform that has made cancer omics profiles accessible to researchers and physicians [
18,
19]. Using their representative state transfer application programming interface (REST-API) and localized storage, clinical data in The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) project was accessible for this study. We initialized, queried, and extracted data that encompassed 55 clinical features from 494 patients with the API client via R software version 4.4.1. From these, 22 clinically relevant and informative features (see
Table 1), excluding identifier columns, were selected for downstream analyses with PFS Status and PFS Months as target variables and the remaining 20 features as clinical predictors. Description of clinical variables can be found in
Supplementary Materials (SF03_clinical_variables).
Several clinical covariates contained missing values and were imputed to enable the construction of a complete clinical feature matrix (see
Table 1). Missing data were addressed using Multivariate Imputation by Chained Equations (MICE) with Predictive Mean-Matching (PMM) and Random Forest (RF) methods [
20]. Imputation models were fitted on the full clinical dataset prior to model development to obtain a complete set of clinical data that can be harmonized with genomic features. PFS months and event status were included as helping variables in the imputation model to preserve associations among observed variables; however, imputed outcomes were not used as prediction targets. A single completed dataset was obtained from the imputation cycles for downstream modeling. Distributional comparisons and visual diagnostics were used to assess the plausibility of imputed values. Subsequent data partitioning, model training, hyperparameter tuning, and evaluation were performed strictly after imputation, with the held-out test set excluded from all model-fitting procedures. The complete set of plots showing the imputation results for all imputed features before and after imputation is in the
Supplementary Materials (SF04_imputation). Thus, results are interpreted as internally validated and exploratory rather than inferential.
2.2. Genomics Data Preparation
We downloaded 503 variant call format (VCF) files containing genomic SNV information from TCGA. Each of these files corresponded to only one individual patient, but we found five pairs of files with the same patient ID. The SNV information for each of these duplicate pairs was combined, resulting in 498 files with genomics data. However, only 494 of these have patient identifiers matching those in our clinical dataset. Each VCF file contained SNV information from a patient’s tumor and normal samples. This procedure is described in [
21], where these data were analyzed, and the nonsynonymous SNVs (i.e., those that can cause changes in the encoded amino acids) on protein-coding regions were identified. A scoring function based on two popular functional effect analysis tools, FATHMM [
22] and PROVEAN [
23] were then employed to calculate the cumulative deleterious effects of these SNVs on protein-coding genes as follows.
For any protein-coding gene
, we calculated its cumulative pathogenicity score
by:
where
is the total coding sequence length for gene
is the average rank of the deleterious functional effects of the variant
as assessed by FATHMM and PROVEAN, and
and
are the numbers of subjects with the pathogenic variant
in the tumor and normal samples.
Subsequent bioinformatics analysis was conducted on the top 1% genes with the highest
scores. Analysis of the protein–protein interaction of these top genes with a compiled list of known PrCa-related genes (based on published literature and databases), and selecting those genes with above average interactions led to 27 likely PrCa-associated (LPC) genes:
NKX3.1,
CSMD3,
TRRAP,
CHD4,
VWF,
EPHB1,
HERC2,
MCM3,
SPTA1,
SALL1,
HERC1,
RYBP,
TTN,
CHD5,
MYH6,
FAT3,
ATM,
KMT2D,
FOXA,
TP53,
SPOP,
SMAD4,
LRP1B,
IDH1,
CTNNB1,
BRAF, and
KMT2C [
24]. The SNV data for each individual patient in the cohort were examined to obtain the counts of deleterious SNVs on these 27 genes. Then, the difference between the deleterious SNV counts in each gene between the tumor tissue and the normal tissue of the patient was used as a genomic feature. Essentially, the 27 selected genomic features reflected the cumulative deleterious effects of the SNVs on these genes.
2.3. The Clinicogenomics Dataset
The merging of unique patient clinical identifiers (Patient ID) with their corresponding genomic case identifiers (Case ID) produced an integrated dataframe of clinical and genomic data from 494 patient rows and 49 columns (22 clinical plus 27 genomic features). Again, the codes used for this task can be found in
Supplementary Materials (SF02_data_processing).
In this study, the dataset was partitioned into training and testing subsets (70:30) using the sample.split function from the CaTools package in R, ensuring proportional representation of survival outcome (PFS) status across both sets prior to training. For the DeepSurv model, cross-validation folds within the training set were explicitly stratified by PFS status to mitigate event imbalance. For PCM and RSF, internal resampling procedures inherent to each method were used, which accommodated right-censored data but did not explicitly enforce stratified cross-validation folds. The training set (70%) was used for model training and hyperparameter tuning via cross-validation, while the held-out test set (30%) was reserved exclusively for model evaluation. All predictors and outcome variables were preprocessed for compatibility with survival modeling techniques. Distribution of PFS time and status was first explored separately within the training and test sets to ensure consistency of PFS objects. This was aimed at assessing the class imbalance of our survival endpoint. We also explored PFS status for all patients during the study period, as well as the follow-up times within the censored and non-censored groups. Summary statistics of PFS time were computed separately for censored and non-censored groups, which enabled the detection of early or late censoring patterns within the follow-up times. Univariate Kaplan–Meier (KM) survival analysis was conducted to primarily investigate the influence of selected covariates on PFS [
25]. Specific covariates of interest, such as radiation therapy (RT), history of neoadjuvant treatment (HNT), neoplasm cancer status (NCS), and new tumor after initial treatment (NTAIT), were individually stratified into categorical levels. KM survival curves were estimated for each covariate, and the log-rank test was used to assess the statistical significance of survival difference across each stratum [
26]. See exploratory_analysis.R file in
Supplementary Materials (SF01_pipeline_modules_data).
2.4. Survival Modeling Methods
The three survival modeling approaches employed in this study were selected to reflect complementary methodological strengths. The PCM was included for its interpretability and ability to perform variable selection in the presence of correlated predictors. The RSF was chosen to capture non-linear relationships and higher-order interactions without requiring parametric assumptions. DeepSurv was included as a complementary deep-learning-based extension of the Cox model to explore complex, potentially high-dimensional clinicogenomics interactions. These models provide a synergistic balance between interpretability, flexibility, and expressive capacity.
All multivariate survival models used in this study are inherently designed to accommodate right-censored survival data. PCM and DeepSurv optimize the partial likelihood-based objective function, which directly incorporates censoring, while RSF employs log-rank-based splitting rules that also incorporate censored observations during tree construction. No additional re-weighting was applied to censored events, as our primary objective was comparative risk stratification in this single cohort rather than unbiased estimation of precise PrCa PFS risk prediction. As such, our models employed are censoring-aware and not censoring-adjusted. We applied these models to estimate the PFS within the integrated clinicogenomics dataset, with PFS months (time to PrCa progression) and PFS status (disease progression observed or not) as target survival objects for modeling PrCa PFS risk stratification rather than as the precise surrogate for overall survival per definition. Each modeling approach involved model-specific hyperparameters that were optimized within the training set using internal cross-validation and resampling procedures.
Model performance across all models was evaluated using Harrell’s concordance index (C-index) [
27,
28], which measured the ability of a model to correctly rank patients according to their relative risk of progression while accounting for right-censored observations through pairwise compatibility. The C-index assesses the discriminatory performance but does not evaluate calibration or accuracy of absolute risk estimates. Thus, results were interpreted in terms of relative risk stratification rather than precise probability prediction. Formal calibration analyses, such as time-dependent Brier scores or calibration curves [
29], were not conducted but would be an important direction for future validation.
The C-index was computed separately for the training and held-out test sets and interpreted as a measure of discrimination rather than calibration or absolute risk accuracy. All models were implemented using survival-specific objective functions or splitting rules that inherently accounted for right-censored data. The PCM employed partial likelihood-based estimation with regularization, RSF utilized log-rank splitting rules with a bootstrapped aggregation framework, and DeepSurv optimized a Cox partial likelihood loss function using a neural network. Model implementations, hyperparameter optimization procedures, and performance evaluations are provided in the multivariate-analysis.R file in
Supplementary Materials (SF01_pipeline_modules_data).
2.4.1. PCM: A Penalized Survival Model
PCM was employed using the glmnet package in R [
30] with elastic net penalization to balance variable selection and model shrinkage via a 5-fold cross-validation (CV) optimization and parameter tuning framework within the training dataset only, based on Harrell’s C-index. Predictors with non-zero coefficients under the optimal penalty were retained and subsequently used to refit a final standard Cox proportional hazards model to facilitate descriptive interpretation of relative associations and survival probabilities.
To assess the stability of model evaluation with respect to training data resampling, the model fitting and tuning procedure was repeated across multiple random refits within the training set while preserving the fixed held-out test set. In each iteration, optimal parameters were selected independently, and model discrimination was evaluated on the held-out test set. Reported mean and standard deviation values reflect between-run variability arising from internal resampling rather than uncertainty due to test set resampling. Model discrimination was assessed using Harrell’s C-Index on both training and test sets.
Because variable selection was performed prior to refitting the unpenalized Cox model, statistical inference from the refitted model should be interpreted cautiously; thus, standard errors and
p-values do not account for uncertainty introduced by the selection process and may be optimistically biased. Accordingly, reported hazard ratios (HRs) are interpreted as measures of association rather than causal effects, with emphasis on directional and relative importance over precise effect estimation. Survival probabilities and relative risk estimates were derived using the survex package [
31].
2.4.2. RSF: A Tree-Based Machine Learning Method for Survival Analysis
The random survival forest (RSF) model was implemented using the randomForestSRC package [
32,
33] on the same processed training dataset. Model development employed bootstrapped aggregation with log-rank splitting rules to accommodate right-censored observations. Hyperparameters, including the number of trees (ntree), number of random variables randomly selected at each split (mtry) and minimum node size, were tuned using internal resampling procedures within the training set. Following hyperparameter selection, a final RSF model was trained on the full training dataset and evaluated on the held-out test. To assess the variability in model performance arising from resampling and tuning, the RSF fitting and evaluation procedure was repeated across multiple refits using different random seeds, while maintaining a fixed test set. Model discrimination was evaluated using Harrell’s C-Index on both training and test sets, with reported mean and standard deviation reflecting between-run variability. Feature importance was assessed using the VIMP metric from the minimal depth criterion [
34,
35]. Survival probabilities and relative risk estimates were obtained in the same way as the PCM.
2.4.3. DeepSurv: A Deep Learning Neural Network Model for Survival Analysis
This model was included as a nonlinear comparator to explore the potential complex interactions between clinical and genomic features and PFS risk stratification. Model training and evaluation followed stratified training and held-out partitions. DeepSurv [
36] model was implemented using Keras in R [
37]. Numeric variables in the training set were scaled using the min-max normalization, and categorical variables were one-hot encoded. Transformation parameters learned from the training dataset were applied to t to the held-out test set. Model development and hyperparameter tuning were conducted exclusively within the training dataset. Hyperparameters (number of hidden layers, nodes, dropout rate, learning rate and L2 regularization) were tuned using 5-fold CV within the training set only. The optimal hyperparameter combination was selected based on the highest cross-validated Harrell’s C-index. Using the selected hyperparameters, a final DeepSurv model was trained on the full training dataset using the Scaled Exponential Linear Unit (SELU) activation function. Early stopping and adaptive learning rate adjustments were applied during training to mitigate overfitting, and the model was evaluated on a held-out test set [
38,
39,
40,
41].
For interpretability, a gradient-based variable importance approach was implemented to assess the sensitivity of the model’s predicted risk score to perturbations in individual input features [
42,
43]. Importance scores were computed as the mean absolute gradient across samples. DeepSurv provides the log-risk scores, which were exponentiated to obtain relative risk estimates, where higher risk scores imply higher hazards of PrCa progression. Harrell’s C-index is once again used to assess model discrimination. Given the stochastic nature of neural network optimization and the modest sample size with limited events, DeepSurv results are interpreted as complementary rather than confirmatory.
3. Results
We will first describe the compiled clinicogenomics dataset along with summary statistics and the univariate KM analysis results. Then, the PFS risk stratification insights based on PCM, RSF and DeepSurv will be presented.
3.1. Compiled Clinicogenomics Dataset and the PFS Distribution
The final compiled dataset (see prca_clinicogenomics_data in
Supplementary Materials SF01_pipeline_modules_data) combined both clinical and genomics information of the 494 patients. Our target response is made up of a pair of variables. The first variable is the binary PFS status with 1 indicating an observed PrCa progression event and 0 otherwise. The second variable, “PFS months”, is a measure of the PFS time in months. The other columns in the dataset provide observed and imputed values of the 22 selected clinical covariates and the SNV frequency difference (tumor–normal) in the 27 selected LPC genes as described earlier.
Figure 2 displays the distribution of PFS status during the study period; the higher proportion of censored events is an indicator that more patients were lost during follow-up or did not experience PrCa progression within the study period. The limited number of patients beyond 100 months suggests the scarcity of long-term data.
The summary statistics shown in
Table 2 indicate a clear distinction between the two groups. We have 401 patients (81.2%) censored, while the remaining 93 patients (18.8%) experienced progression. For the censored group (subjects yet to experience progression), the follow-up periods tend to be longer as reflected by the higher means and medians and the wider ranges. In contrast, the observed progression group shows shorter follow-up times on average, suggestive of earlier occurrence of disease progression.
To ensure representativeness of the evaluation cohort, we examined the distribution of PFS times in both training and test sets. The distributions were found to be comparably similar since both training and test datasets displayed a right-skewed distribution of PFS time, where most events occur earlier in follow-up, and longer tails representing patients who remain progression-free for substantially longer periods (see
Supplementary Materials SF05_kmcurves_and_plots, Figure S1).
3.2. KM Survival Analysis with Binary Clinical Predictors
We estimated survival probabilities over time based on the binary clinical variables using KM survival curves. The few variables that made a significant difference for PrCa PFS with
p-value < 0.05 are NTAIT, RT, NCS, and HNT. We present the KM curve for NTAIT in
Figure 3, while
Supplementary Materials SF05_kmcurves_and_plots (Figures S2 and S3) contain those for the other clinical variables.
The univariate effect of NTAIT on PFS shown in
Figure 3 indicates that the non-persistent-tumor group maintains high chances of PFS throughout; the curve flattens early and remains near 0.9, indicating minimal progression events over time. For the persistent-tumor group, there is a steep decline in PFS, especially within the first 40 months, reflecting a high incidence of PrCa progression; the dotted line at 0.5 shows the median PFS, which indicates that at approximately 24 months, half of this group had experienced PrCa progression. But this was not seen in the non-persistent-tumor group. This stark difference in PrCa progression between the two subgroups is very significant, as indicated by the
p-value less than 0.0001. This preliminary finding emphasizes the critical prognostic role of new tumor occurrence in predicting early progression and could inform closer surveillance or more aggressive follow-up therapies for patients who develop new tumors after initial treatment.
3.3. Statistical and Machine Learning Models for PFS
In this section, we present results from the optimized multivariate survival analysis models, namely PCM, RSF, and DeepSurv, along with their discriminatory ability as evaluated by their C-indices on both training and held-out test sets. Because variable selection was performed via regularization and model-specific importance criteria without post-selection inference adjustment, reported model estimates are interpreted as associated signals for risk stratification and not as causal or clinically actionable effect sizes. For PCM and RSF, reported discriminatory metrics represent the mean C-index across repeated model refits, with associated standard deviation reflecting between-run variability arising from internal resampling and hyperparameter tuning. Test set performance was evaluated on a fixed held-out dataset and was not resampled. For DeepSurv, model performance is reported from a single prescribed run with a fixed random seed and optimized hyperparameters selected within the training set; therefore, standard deviations are not applicable, and results are interpreted as complementary. C-indices are compared descriptively across models, as the study was not designed for formal hypothesis testing of model superiority. In each method, we only present the results from the final optimized model, while the full tuning procedures are provided in the
Supplementary Materials (SF01_pipeline_modules_data; see the file multivariate-analysis.R).
3.3.1. PCM
After model tuning was performed across a grid of elastic-net mixing parameter with (α ∈ [0,1]), using cross-validated Harrell’s C-Index within the training data. Moderate regularization (α = 0.5) achieved the highest cross-validated discrimination (C-Index = 0.86) while retaining six predictor variables with non-zero coefficients, yielding a predictive formulation for modeling PrCa PFS via the hazard function below:
Here,
is the hazard function at time
t, while
is the baseline hazard, and each coefficient represents the log hazard ratio (HR) for the respective covariate. The estimated hazard ratios of each selected predictor are shown in
Table 3. These estimates are deemed as associative signals of relative risk stratification and should not be interpreted as causal or clinically actionable effect sizes.
Table 3 summarizes the coefficients retained under the optimized PCM. Reported HRs are provided to describe the direction and relative magnitude of associations within the training data rather than to support causal or clinically actionable interpretations. Several clinical variables exhibited elevated relative hazard estimates, including new tumor after initial treatment (NTAIT), history of neoadjuvant treatment (HNT), and neoplasm tumor cancer (NCS), indicating that patients with these characteristics were ranked at higher progression risk. Among the genomic features, the
MYH6 gene showed a positive hazard estimate, suggesting higher modeled progression risk with increasing tumor-normal deleterious SNV burden. As genomic predictors are defined on a sequence-derived scale, this association is interpreted as a relative risk contribution rather than a clinically standardized effect size. WHS demonstrated a modest positive association, while MSISS exhibited a negative hazard estimate, although the latter showed limited statistical strength. To assess the stability of discrimination performance, the PCM procedure was repeated across multiple resampling iterations within the training dataset while maintaining a fixed held-out test set. Across repeated model refits, average Harrell’s C-index values were 0.8442 on the training data and 0.8513 on the test data, indicating consistent risk-ranking performance on unseen data within the TCGA-PRAD cohort. Overall, the PCM procedure demonstrated that a small subset of clinical and genomic variables can provide relatively stable risk stratification for PFS with the TCGA-PRAD clinicogenomics cohort.
3.3.2. RSF
RSF fitted on the clinicogenomics training dataset captured potential non-linear effects and interactions among predictors. Model hyperparameters were selected using internal out-of-bag (OOB) error minimization, yielding an optimal configuration with 20 trees, a terminal node size of seven, and eight variables randomly sampled at each split. Under this setting, RSF achieved a low OOB error (0.1181), indicating adequate internal risk ranking capability within the cohort. Across repeated model refits, average Harrell’s C-index values were 0.9080 on the training data and 0.8552 on the test data, indicating satisfactory risk-ranking within the TCGA-PRAD cohort.
The most influential variables selected by the RSF model based on the VIMP scores (see
Table 4) include NTAIT, NCS, HNT, hypoxia-related scores (RHS and WHS) and mutation-burden clinical features (FGA and TMB), as well as the
MYH6, BRAF and
TP53 tumor-normal SNV differences. Some clinical predictors identified by the RSF overlapped with those selected by the PCM, showing convergence across modeling approaches. Additional mutational and genomic features emerged uniquely in RSF; this is consistent with its capacity to capture latent or non-linear relationships. Variable influence measures are interpreted as relative contributions to model-based risk stratification rather than causal effects.
3.3.3. DeepSurv
Using the optimized single-run DeepSurv configuration (one hidden layer with one node, dropout = 0.2, learning rate = 0.001, L2 = 0.2), the model achieved a training C-index of 0.8344 and a test C-Index of 0.8384, indicating satisfactory discriminatory ranking ability on an unseen held-out cohort [
36,
44]. For DeepSurv, we used gradient-based sensitivity scores to rank the importance of the clinicogenomics predictor variables, where higher scores indicate greater influence [
42].
Table 5 below shows the sensitivity scores for all variables, both clinical and genomic variables contributed to model-based risk stratification with NCS, NTAIT, HNT and
MYH6 ranking consistent with findings from PCM and RSF models. However, gradient-based feature importance measures are model-dependent and do not represent causal effects; therefore, we interpret this conservatively, emphasizing features with consistent relevance across multiple modeling approaches. Given the modest sample size and event rate, we consider DeepSurv results as complementary and exploratory providers of additional perspective on clinicogenomics risk patterns rather than ultimate clinical prognostic conclusions.
3.4. Predicted Survival Probabilities and Risk Scores
In the context of PFS, survival probabilities for a time
t represent the patients’ chances of cancer stabilization (i.e., no disease progression) for at least
t months. Survival probabilities for PCM were obtained using the baseline survival function multiplied by the exponentiated linear risk score, while RSF survival probabilities are the exponential of the mean ensemble cumulative hazard function (CHF) across the ensemble of survival trees. Summary statistics for the predicted 6-year (72-month) survival probabilities for the patients in our test set are shown in
Table 6. Our implementation of DeepSurv in R with Keras does not provide predicted survival probabilities but can calculate risk scores (see
Table 6).
Risk scores across the models quantify each patient’s relative hazards for PrCa risk progression. In PCM, risk scores are calculated as the linear combinations of covariates weighted by the optimized regression coefficients, yielding log-relative hazard values and actual risk scores when exponentiated. For RSF, risk scores are derived as the cumulative hazard function aggregated over multiple survival trees, representing the expected risk over time. DeepSurv outputs a non-linear risk function trained via a neural network to approximate the Cox log-partial likelihood. In all cases, higher risk scores indicate greater susceptibility to disease progression or shorter duration of cancer stabilization. Summary statistics of all patient-level survival probabilities and risk scores are shown in
Table 6, the upper rows for survival probabilities and the last 3 rows for risk scores.
Across all three modeling approaches, the median 6-year PFS (disease stabilization) was consistently high (median ≈ 0.90), indicating the majority of patients are expected to remain progression-free through the 6-year mark. However, the presence of very minimal survival probabilities (PCM: 0.00, RSF: 0.01) highlights a small but clinically important subgroup of patients at high-risk of early progression, underscoring substantial heterogeneity in patient trajectories.
Risk score distributions further clarified this heterogeneity. While all models produced positively skewed risk scores (mean > median), they identified higher-risk patients. DeepSurv-derived relative risk scores exhibited a median of 0.73 (IQR: 0.60–1.23), indicating that most patients were assigned lower than baseline progression risk, with a smaller subset displaying elevated risk. This moderately dispersed distribution suggests limited but meaningful risk stratification compared to PCM and RSF, which produced broader and more right-skewed distributions. DeepSurv demonstrated a more compressed risk range.
Accordingly, DeepSurv risk estimates are interpreted as complementary to the PCM and RSF models. While many patients exhibit indolent disease courses, those flagged as high-risk at 72 months across models may warrant closer monitoring or intensified treatment intervention. The overall concordance in survival probability estimates and the complementary nature of the risk stratification patterns across modeling approaches support the consistent findings within this cohort.
4. Discussion
This study investigated the potential prognostic utility of clinical and genomic features for predicting PFS in PrCa using three different modeling approaches: PCM, RSF and DeepSurv. In this section, we discuss the implications of our key findings by examining the contribution of genomics data to PFS prediction and the performance of the three models.
4.1. Contributions of Genomics Data to PFS Prediction
We integrated patient-level SNV information with clinical variables to construct a clinicogenomics dataset for PFS analysis. To assess the added value of genomic information, we applied the same modeling pipeline to both the clinicogenomics dataset and a clinical-only dataset, and summarized the variables identified as important by each modeling approach in
Table 7. A consistent core set of clinical variables, HNT, NCS and NTAIT, which reflects neoadjuvant treatment history, neoplasm cancer status, and tumor recurrence, was repeatedly identified as influential for PFS risk stratification, regardless of whether genomic features were included. The convergence of these variables across PCM, RSF and DeepSurv models favors them as associative candidate markers of disease burden and risk intervention trajectory within this cohort. The incorporation of genomic variables enabled additional SNV-based signals to emerge, as the
MYH6 gene was also selected consistently across all models. Interestingly,
MYH6 is well known for its critical role in cardiac muscle contraction but had not been associated directly with PrCa until Wang et al. in 2024 reported that
MYH6 suppressed tumor progression in PrCa [
45], which corroborated its importance in PFS prediction.
It should be emphasized that, at this stage, we focused only on interpreting features demonstrating cross-model relevance as associated signals influencing PrCa PFS risk stratification. Thus, clinicogenomics predictors that were not consistently selected across models are considered exploratory. Differences in predictor selection across models reflect expected methodological trade-offs, with linear penalized Cox models favoring parsimony and stability, and flexible machine learning models identifying a broader set of potential interacting features. Altogether, these results indicate that clinical variables remain the primary drivers of PFS risk stratification in this cohort, while genomic information provides complementary, hypothesis-generating insights.
4.2. Comparison of Model Performance
According to the C-indices shown in
Table 8, all three models performed well on both training and test data. Any observed differences in C-index across models should be interpreted as indicative of relative discrimination within this cohort rather than a generalized statistically significant superiority. PCM showed consistent performance between training and test data, confirming favorable performance in ranking PFS risks. RSF performed well on the training set and slightly declined on the test set. The slight decline in C-index from training to test data is less than 0.1, suggesting acceptable for risk discrimination tasks [
46,
47]. Compared to PCM, RSF and DeepSurv demonstrated comparable test-set evaluation but showed greater sensitivity to model configuration. However, it should be noted that RSF and DeepSurv are complex models, which require a larger training dataset to achieve excellent performance.
Table 8 also shows that the inclusion of genomic variables alongside clinical variables introduced greater complexity and, in some cases (e.g., DeepSurv), provided satisfactory risk stratification performance. Although the overall gains in C-Index were limited, the integration of genomic data provided complementary value without heavily compromising model performance.
Beyond individual model performance, the multi-model design of this study provides methodical insight into clinicogenomics survival modeling. Predictors that consistently emerge across models, particularly core clinical and select genomic variables, represent signals of PFS risk stratification, while model-specific findings may highlight areas where non-linear or higher-order interactions may exist. This approach demonstrates how heterogeneous modeling paradigms can be leveraged to balance interpretability with exploratory discovery in clinicogenomics analyses within this TCGA-PRAD cohort.
4.3. Multivariate Survival Models Versus Univariate KM Curves
Univariate KM analyses were performed to provide descriptive summaries of unadjusted PFS patterns across selected clinical covariates. One counterintuitive pattern observed was the apparent association between radiation therapy (RT) and poorer PFS, with patients receiving RT exhibiting a steeper decline in unadjusted survival probabilities (see
Figure 4), giving the impression that patients receiving RT experienced disease progression more rapidly (half had PrCa progressed at approximately 48 months) than those who did not. This result should not be interpreted as evidence of the detrimental effect of RT. Rather, it is most plausibly explained by confounding by indication [
48], a common source of bias in observational studies where treatment assignment is not random. Patients receiving RT are more likely to have adverse disease characteristics, such as terminal tumor stage or greater clinical severity, which are not accounted for in univariate KM analyses. Thus, the KM curves presented are for the sole purpose of associative and explainable exploration and not to provide causal effects.
Consistent with this interpretation, RT was not retained as an important predictor in any of the multivariable survival models (PCM, RSF and DeepSurv) when evaluated alongside other clinical and genomic covariates. Thus, all inferential interpretations of covariate associations are restricted to multivariate survival modeling, with KM analyses serving a descriptive role only.
4.4. Limitations of the Study
This study is based on a single TCGA-PRAD cohort and therefore provides evidence of internally associated PFS risk stratification validity rather than external generalizability, while independent validation in external cohorts is needed prior to clinical translation. Since a fully nested cross-validation was not pursued, and also given the modest number of progression events, substantial right-censoring, and the use of machine learning modeling approaches, there is an inherent risk of overfitting despite the use of stratified train-test evaluation in DeepSurv and internal resampling in PCM and RSF. Thus, model performance evaluations are interpreted in terms of internally validated relative risk stratification rather than a generalized and precise prediction of progression events.
Clinical variables were imputed prior to data partitioning using a single completed dataset, and uncertainty due to imputation was not formally propagated, so imputed patterns should be deemed as exploratory. In addition, clinical and gene-level feature importance was assessed across multiple survival models without formal resampling or stability selection, and nonlinear models such as RSF and DeepSurv are sensitive to sample size and event rates; therefore, genomic findings and model-specific results are interpreted conservatively and will warrant confirmation in larger, externally validated datasets. Again, we state that gene-level importance was interpreted conservatively, with emphasis placed only on genomic features that exhibited consistent importance across all models.
5. Conclusions
We have explored the utility of combined clinical and genomic features in modeling PrCa PFS within a patient cohort using different statistical and machine learning models. The models consistently identified a core set of influential variables associated with PrCa progression, including the clinical variables HNT, NTAIT, and NCS, as well as the MYH6 gene that is well known to be relevant to cardiac functions but only reported to be a tumor suppressor gene for PrCa progression relatively recently. These results suggest that integration of genomics with clinical data can help provide insights into PFS for patients with cancer.
It is noted that the modest cohort size, along with the lack of independent sets of genomics data and features for model assessment and validation, posed considerable limitations on the current study. However, the clinical and genomics variables consistently identified by multiple survival models to be associated with PrCa PFS can be useful for generating hypotheses for future experiments to uncover driving factors for cancer progression.
Future work will focus on extending validation in larger diverse cohorts with confounding analyses to help clarify counterintuitive treatment effects of RT, functional and biochemical pathway analyses of the MYH6 gene in relation to PrCa progression, and the development of survival models to capture additional types of molecular data such as RNA and protein expression profiles from transcriptomics and proteomics data for the same TCGA cohort of patients with PrCa.