1. Introduction
Prostate cancer (PCa) ranks as the second most common cancer and the fifth leading cause of cancer mortality among men worldwide, with an estimated 1.4 million new cases and 375,000 deaths in 2022 [
1]. While localized PCa is often curable, advanced PCa poses a significant challenge due to its high risk of biochemical recurrence (BCR), a critical precursor to clinical progression, local recurrence, and distant metastasis [
2,
3]. Current clinical models, reliant on serum prostate-specific antigen (PSA) levels, Gleason scores, and TNM staging, suffer from limited predictive accuracy, failing to capture the complex tumor biology of advanced PCa [
2,
3,
4,
5]. This unmet need for precise, individualized prognostic tools underscores the urgency of developing advanced methods to enhance cancer progression prediction and optimize patient outcomes.
Multiparametric MRI (mpMRI) has become a cornerstone for identifying PCa patients at risk of recurrence, leveraging its ability to visualize tumor characteristics non-invasively [
6,
7,
8,
9,
10,
11,
12,
13,
14]. However, its application in advanced PCa is underexplored, and current MRI-based workflows face significant limitations: (1) qualitative assessments are prone to interobserver variability, compromising reproducibility [
7,
8], and (2) manual measurements of tumor volume, apparent diffusion coefficient (ADC) values, and lesion boundaries are time-consuming and lack standardization [
9]. These challenges hinder the integration of imaging biomarkers into robust predictive models, limiting their clinical utility in guiding personalized treatment for advanced PCa.
Recent advancements have extended the application of deep learning beyond radiological imaging to include high-precision pathological assessment. For instance, Devnath et al. [
15] recently developed an integrated machine learning network to accurately recognize epithelial cells within prostatic glands, demonstrating the potential of artificial intelligence (AI) to mitigate inter-observer variability in histopathological grading. These multi-scale AI-driven approaches collectively aim to standardize clinical workflows and provide more objective prognostic insights.
Deep learning-based radiomics has shown promise in predicting 24-month progression in advanced PCa using pretreatment mpMRI-derived features, surpassing traditional radiomics and clinical models in accuracy [
16,
17]. By extracting quantitative imaging features, radiomics captures subtle tumor characteristics missed by conventional methods. However, two critical gaps remain. First, the reliability of radiomic features depends heavily on region-of-interest (ROI) annotation, yet the impact of manual (ROIref) versus AI-driven (ROIai) tumor segmentation on feature stability and predictive performance remains unquantified [
9]. Second, while radiomics excels at binary progression prediction, its potential to estimate time-to-progression—a key factor in determining optimal timing for salvage therapies—has not been explored. Addressing these gaps is essential to developing scalable, observer-independent tools for precision oncology.
To bridge these gaps, we developed and validated a deep learning-based radiomics model trained on manually annotated ROIs (ROIref) and tested on AI-segmented ROIs (ROIai) to predict progression in advanced PCa. We also assessed the prognostic value of radiomics-derived risk scores for estimating time-to-progression through survival analysis. By demonstrating the equivalence of the ROIai-based approach to the ROIref-based approach and extending radiomics to temporal risk stratification, this study establishes a robust, automated framework to enhance progression prediction, reduce reliance on labor-intensive manual annotations, and guide risk-adapted therapeutic strategies for improved patient outcomes.
2. Materials and Methods
2.1. Data Enrollment
This retrospective study was approved by the local Institutional Review Board (IRB number: 2021-342) with a waiver for written informed consent. Patients were enrolled from October 2017 to March 2024. All patients were diagnosed with PCa through ultrasound-guided systematic prostate biopsy. Inclusion criteria required pretreatment MRI scans in the Picture Archiving and Communication System (PACS), initial treatment with radiation therapy (RT), hormone therapy (HT), chemotherapy or a combination, regular follow-up (every 3 months in the first year and every 6 months thereafter), complete clinical records, and a minimum follow-up of 24 months with documented progression or non-progression. Based on the initial 2232 cases in our previous study [
16,
17], 182 cases were finally included in current study and stratified into Cohort 1 (follow-up at 24 months;
n = 139) and Cohort 2 (follow-up > 24 months;
n = 43). Treatment modalities: 94 cases received RT alone or RT with HT; 74, HT alone; 10, HT combined with chemotherapy; and 4, multiple-line sequential therapy.
2.2. Image Scanning Protocols
All MR data were acquired using 11 scanners from four different vendors. No statistically significant differences were observed between Cohort 1 and Cohort 2 in scanner manufacturers, field strengths, or most acquisition parameters (
p > 0.05). Detailed scanning parameters are provided in
Table 1.
2.3. Clinical Information and Reference Standard for Cancer Progression
Time to cancer progression was calculated as the months elapsed from treatment to the occurrence of progression or the last follow-up. At the end of the follow-up period, the treatment method was recorded as either single (HT only or RT only) or multiple (combination of two or more methods).
The primary endpoint of this study was progression-free survival, defined as the time from treatment to the first documented progression event. Progression was operationally defined as a composite endpoint encompassing biochemical, radiologic, or clinical progression, whichever occurred first. Biochemical progression was defined according to established criteria based on the initial treatment modality. For patients receiving HT, biochemical progression was defined using the Phoenix definition (PSA nadir + 2 ng/mL) [
2]. For patients treated with androgen deprivation therapy (ADT), biochemical progression was defined according to the American Urological Association (AUA) criteria. Among patients receiving HT alone, progression was further characterized by the development of castration-resistant prostate cancer (CRPC). CRPC was defined as castrate serum testosterone levels (<1.7 nmol/L) in combination with either (1) three consecutive rises in PSA (with intervals of ≥1 week), resulting in a >50% increase over the PSA nadir and a PSA level exceeding 2 ng/mL [
18], or (2) radiologic evidence of new metastatic lesions [
16,
17,
19,
20]. Radiologic progression was defined as the appearance of new metastatic lesions on follow-up imaging. Clinical progression was defined as the development of disease-related symptoms or complications attributable to disease progression, as documented by the treating physician.
To ensure comparability across treatment subgroups, all progression events were harmonized into a single composite endpoint, and time to progression was calculated uniformly regardless of the type of progression event.
2.4. ROI Annotation
Based on findings from prior research [
16], we extracted imaging features from ROIs corresponding to PCa areas visible on mpMRI images. Two distinct ROI annotation methods were employed. Method 1 (ROIref): Two genitourinary radiologists (H.W., 15 years of experience; X.W., 30 years of experience) annotated ROIs by synthesizing information from diffusion-weighted imaging (DWI), ADC maps, T2-weighted imaging (T2WI), and dynamic contrast-enhanced (DCE) sequences (when available). The lesion with the highest Prostate Imaging Reporting and Data System (PI-RADS) score [
21] was selected. Discrepancies between radiologists were resolved through consensus, establishing ROIref as the reference standard. Method 2 (ROIai): A pretrained deep learning model for PCa segmentation [
22] was used to automatically identify ROIai. This model, based on a cascade 3D U-Net architecture, was trained on a large dataset (
n = 1428 patients from 7 MRI scanners across 4 vendors at both 1.5T and 3.0T field strengths) and validated for detecting clinically significant PCa on mpMRI, achieving a Dice similarity coefficient of 0.69 ± 0.28 and patient-level sensitivity of 90.0% in patients with PSA levels of 4–10 ng/mL. For each patient in the current study, the model automatically segmented all suspected lesions on ADC maps, and the largest predicted lesion within the prostate was selected as ROIai, consistent with our selection of the highest PI-RADS lesion in ROIref. Notably, the ROIai were generated in a fully automated manner without any manual correction to rigorously assess the model’s tolerance to segmentation variability. No complete segmentation failures (i.e., failure to detect a lesion) occurred in the final 182-patient cohort.
Imaging features were extracted from ROIs corresponding to PCa areas visible on mpMRI images. For patients with multifocal disease, only the index lesion was analyzed. For ROIref, the lesion with the highest PI-RADS score was selected by the consensus of two radiologists after reviewing all available sequences. In cases where multiple lesions shared the same highest PI-RADS score, the one with the largest volume was selected. For ROIai, the pretrained model automatically identified the largest predicted lesion volume as the target ROI.
The progression prediction radiomics model was trained using ROIref-derived features. Once established, the radiomics model was applied to ROIai to predict progression probabilities. Predictive outcomes from both ROIref-based approach and ROIai-based approach were systematically compared.
To compare measured data of ROIref and ROIai, volumetric measurements (volume), dimensional parameters (RL, AP, and SI diameters), and ADC values were quantified and compared for each case. Spatial overlap between ROIref and ROIai was assessed using the Dice similarity coefficient (DSC), volume similarity (VS), and average Hausdorff distance (HD).
2.5. Progression Prediction Model Development
The data in Cohort 1 (
n = 139) were randomly divided into a training set (
n = 98) and an independent test set (
n = 41) at a ratio of 7:3. We developed a deep learning-based radiomics model from pretreatment MRI ADC maps with ROIref annotations to predict progression within 24 months in advanced PCa patients using the training set. The data split method is illustrated in
Figure 1.
To mitigate scanner variability, ADC maps underwent intensity normalization during preprocessing. Three distinct image sets were analyzed (
Supplementary Figure S1): (1) Original Images (unprocessed ADC maps), (2) LoG Images (processed with a Laplacian of Gaussian filter to enhance edge details), and (3) Wavelet Images (generated via 3D wavelet decomposition using the PyWavelet package across the
x,
y, and
z axes).
ROIs were resampled to a uniform spatial resolution to standardize input dimensions. A pre-trained MedicalNet architecture [
23], initialized with weights pretrained on large-scale medical imaging datasets, was employed to extract deep features. The MedicalNet architecture is a 3D extension of the ResNet family specifically designed for medical imaging applications. It was pre-trained on the 3DSeg-8 dataset (1638 3D medical volumes from 8 segmentation tasks covering multiple organs and imaging modalities including CT and MRI). Using features learned from diverse medical imaging data, MedicalNet provides better initialization for medical imaging tasks compared to natural image pre-trained models or training from scratch. Transfer learning from MedicalNet has been shown to accelerate training convergence by 2–10 times and improve accuracy by 3–20% across various 3D medical imaging applications. The model’s convolutional layers processed ROIs to generate channel-wise feature maps, which underwent global max pooling to reduce dimensionality, yielding 2048 one-dimensional features per ROI.
Subsequent feature engineering included z-score normalization followed by principal component analysis (PCA), retaining 95% of the cumulative variance to reduce dimensionality and minimize redundancy. Given the high dimensionality of the deep learning-derived features (n = 2048), PCA was performed exclusively on the standardized feature matrix of the training cohort, and the resulting transformation was subsequently applied to the test cohort without refitting. The number of retained principal components was determined based on the cumulative explained variance.
From the PCA-reduced feature space, the eight most discriminative features were selected using a combination of statistical significance testing (comparing progression versus non-progression groups) and correlation analysis, with highly correlated redundant features removed (|r| > 0.9). These final eight deep learning-derived features were then used as inputs to a logistic regression classifier optimized with L2 regularization. Unlike traditional radiomics approaches that rely on handcrafted features (e.g., texture, shape, and intensity statistics), the proposed deep learning-based radiomics framework automatically learns hierarchical feature representations directly from ADC maps through a convolutional neural network.
Model performance was evaluated in the training set using stratified 5-fold cross-validation, ensuring each fold preserved the progression class distribution.
2.6. Progression Prediction Model Evaluation
The progression prediction radiomics model was trained using deep learning-derived features extracted exclusively from ROIref in the training subset and was subsequently evaluated on both the training and independent test subsets using features derived from ROIref and ROIai. Once established, the same trained radiomics model was applied to features extracted from both ROIref and ROIai in the test subset to predict progression probabilities. Predictive outcomes obtained using the two ROI annotation methods were then systematically compared. Importantly, the feature extraction approach (deep learning-based) and the trained model were identical for both ROI types; only the source of tumor delineation differed (
Figure 1 and
Supplementary Figure S2).
The discrimination ability of the ROIref- and ROIai-derived probability scores was quantified using receiver operating characteristic (ROC) analysis, with the area under the curve (AUC) calculated for both ROI types. Statistical differences between the AUC values were assessed using the DeLong test.
2.7. Survival Analysis
Survival analysis was conducted to evaluate the time to progression in the entire enrolled data using both ROIref and ROIai. Covariates were selected based on prior evidence linking them to adverse prognosis and included age, baseline PSA, biopsy Gleason grade, clinical TNM stage, and mpMRI findings [
4,
16,
17,
24]. Additionally, the radiomics model’s predicted risk probabilities were evaluated as covariates.
For risk stratification, ROIref-based prediction probabilities from the training subset (n = 98, 70% of Cohort 1) were used to determine an optimal cutoff for high- versus low-risk classification via the Youden index. This cutoff was subsequently applied to categorize all 182 patients for ROIref-based and ROIai-based approaches. Survival differences between risk groups were visualized using Kaplan–Meier curves.
Univariable Cox regression analysis was conducted to assess associations between covariates (clinical variables and radiomics-derived probabilities from ROIref/ROIai) and progression risk. Variables showing a trend toward significance (p < 0.10) in univariable analysis were retained for multivariable modeling. Two multivariable Cox proportional hazards models were then constructed: Model 1 combined ROIref-derived probabilities with significant clinical variables, while Model 2 used ROIai-derived probabilities with the same clinical covariates. Both models reported hazard ratios (HRs) with 95% confidence intervals (CIs).
Prognostic performance was evaluated using three metrics: discrimination, calibration, and clinical utility. Discrimination was quantified via the concordance index (C-index), which measures the agreement between predicted probabilities and observed progression events. Calibration was assessed by plotting observed versus predicted progression incidences at 12, 24, 36, and 48 months, with Brier scores calculated using bootstrap cross-validation (1000 iterations) to evaluate prediction accuracy. Finally, clinical utility was appraised using decision curve analysis (DCA) [
25], which quantified the net benefit of ROIref-based method and ROIai-based method across risk thresholds spanning 12 to 48 months, thereby informing their applicability in clinical decision-making.
To assess the robustness of the ROIai-predicted probabilities across different clinical subgroups, subgroup analyses based on International Society of Urological Pathology (ISUP) grade, cT stage, cN stage, cM stage, and treatment category were conducted with interaction testing.
2.8. Statistical Analysis
All statistical analyses were performed using R software (version 4.3.1;
http://www.r-project.org). Continuous variables are reported as medians with interquartile ranges (IQR), and categorical variables are summarized as frequencies with percentages [
n (%)]. Differences in continuous variables between groups were assessed using the Mann–Whitney U test, while categorical variables were compared using the chi-square test or Fisher’s exact test, as appropriate. A two-tailed
p value < 0.05 was considered statistically significant.
4. Discussion
Our study demonstrated that AI-derived tumor segmentation achieves diagnostic and prognostic parity with manual expert annotations in advanced PCa. As a scalable, observer-independent tool for precision risk assessment, AI-driven radiomics could reduce reliance on labor-intensive manual workflows without compromising diagnostic or prognostic accuracy.
Our study addresses critical gaps in the progression prediction literature through two key distinctions. First, unlike prior investigations focusing localized PCa [
11,
12,
13,
14,
26], our study specifically targeted advanced PCa patients undergoing non-surgical therapies. This distinction is biologically significant, as recurrence mechanisms in advanced PCa—shaped by treatment-induced microenvironmental changes (e.g., hypoxia, androgen receptor alterations)—differ fundamentally from postoperative recurrence driven by residual tumor burden. Second, while existing models rely on labor-intensive manual evaluations of preoperative MRI that are prone to interobserver variability and inefficiency, our AI-driven radiomics framework automates feature extraction, reducing subjectivity (Dice = 0.901 for AI vs. manual ROIs) and analysis time.
Building on our team’s earlier work [
16,
17], this study introduces three methodological advancements. (1) Enhanced Validation: Validation in an expanded cohort (
n = 182 vs. prior
n = 131) with extended follow-up (median 34 months vs. 24 months). (2) ROI Robustness Analysis: First direct comparison of manual and AI-generated segmentation impacts, demonstrating equivalent prognostic value and resolving prior concerns about annotation dependency. These findings support the feasibility of replacing manual tumor delineation with automated AI-based segmentation in radiomics-driven progression prediction, without compromising predictive accuracy. (3) Temporal Prognostication: Beyond binary progression prediction, we established radiomics as a time-to-event predictor, with stable discrimination (48-month AUC > 0.75) and calibration (Brier score < 0.15), enabling risk-adapted surveillance intervals—a capability absent in earlier models. These innovations collectively advance radiomics toward clinically actionable, observer-agnostic tools for precision oncology.
The equivalent prognostic performance of AI-based and manual segmentation represents a significant step toward observer-agnostic precision oncology. By resolving the dependency on manual annotation—a primary source of variability and a major hurdle in clinical workflows—our findings demonstrate that deep learning radiomics can be transitioned from a research tool into a scalable, automated clinical decision support system. This ensures that the high predictive accuracy is not only achievable in a controlled study environment but also reproducible in real-world clinical practice where time and expert resources are limited.
Radiomics studies should adhere to technical guidelines to enhance research quality [
27,
28,
29,
30,
31]. These guidelines emphasize the formal evaluation of fully automated segmentation [
9,
27]. In this study, our segmentation model has demonstrated high performance, validated across multiple studies [
22,
32]. Beyond prior validation, we directly compared AI-segmented ROIs with expert-manually annotated ROIs in this study. The results showed good consistency between the two, likely due to the well-defined, larger-volume lesions in our cohort. Previous research has shown that AI can more accurately detect lesions with lower ADC values and larger volumes [
33]. Given the relatively simple task for AI in this study, these results are promising for future applications. Automated AI segmentation not only eliminates this coordination challenge but also significantly reduce waiting time and improve clinical efficiency.
In this study, logistic regression was utilized as the final classifier to integrate the selected deep learning features. This choice was motivated by its robustness and resistance to overfitting, particularly in radiomics studies where the feature-to-sample ratio must be carefully managed. Unlike ‘black-box’ machine learning algorithms, logistic regression offers high clinical interpretability, providing a transparent relationship between imaging phenotypes and the probability of cancer progression [
34]. Furthermore, its performance was found to be equivalent to more complex classifiers in our pilot study, consistent with previous reports suggesting that model simplicity often enhances reproducibility in medical imaging AI [
35]. In the model-selection phase of this study, various machine learning classifiers, including Support Vector Machine and Random Forest, were evaluated. Since these complex algorithms did not significantly outperform the linear model, logistic regression was finalized as the classifier to ensure the highest degree of model stability and interpretability, consistent with the preference for simpler models in clinical prognostic research [
36].
The survival analysis further supports the clinical relevance of the proposed radiomics-based progression prediction model. Specifically, the HRs derived from the Cox regression analyses indicate that increasing radiomics-predicted probabilities are associated with a substantially higher risk of disease progression over time, providing an interpretable measure of relative risk rather than a simple binary classification. Based on the predefined risk stratification threshold, patients could be separated into high-risk and low-risk groups with clearly distinct progression-free survival curves, suggesting potential utility for individualized surveillance strategies and risk-adapted clinical management.
Importantly, the model demonstrated stable time-dependent performance throughout follow-up, with sustained discrimination (time-dependent AUC values exceeding 0.75 up to 48 months) and good calibration, indicating consistent prognostic value across clinically relevant time horizons. This temporal robustness suggests that the radiomics signature captures biologically meaningful imaging features associated with disease progression, rather than reflecting short-term or time-specific effects. Together, these findings highlight the potential of radiomics-based survival modeling to support longitudinal risk assessment in advanced PCa.
This study has several limitations. First, its retrospective, single-center design and modest sample size risk selection bias and may limit generalizability, particularly in underrepresented subgroups. While the model showed high performance in our independent test set, variations in MRI scanners and imaging protocols across different centers could impact the stability of radiomic features. Future studies involving multi-center cohorts are warranted to evaluate the robustness and clinical utility of our model in more diverse clinical settings. Advanced harmonization methods, such as ComBat, could also be applied to improve feature stability in future multi-center studies. Second, heterogeneous treatment protocols were not rigorously controlled, potentially confounding progression risk estimates. Although treatment-stratified subgroup and interaction analyses demonstrated consistent prognostic effects of the radiomics-based model, residual confounding related to treatment heterogeneity and unmeasured treatment intensity (e.g., duration of ADT) cannot be completely excluded in this retrospective cohort. Future prospective studies with standardized treatment protocols and detailed treatment exposure data are therefore necessary to further optimize and validate radiomics-based progression prediction models. Third, while AI-derived radiomics demonstrated parity with manual annotations, the model’s generalizability may be constrained by scanner variability and vendor-specific ADC quantification biases. Additionally, the lack of integration with multimodal biomarkers (e.g., genomics, PSMA-PET) and real-world validation of clinical workflow integration represent critical gaps. Future multicenter studies with standardized imaging protocols, treatment-stratified analyses, and prospective validation are needed to translate these findings into robust, clinically actionable tools.