Novel Multiparametric Magnetic Resonance Imaging-Based Deep Learning and Clinical Parameter Integration for the Prediction of Long-Term Biochemical Recurrence-Free Survival in Prostate Cancer after Radical Prostatectomy

Simple Summary Existing research on predicting biochemical recurrence after prostate surgery has been insufficient. Here, we aimed to predict biochemical recurrence after radical prostatectomy leveraging recent advances in deep learning. We combined clinical variables with multiparametric magnetic resonance imaging using deep learning methods. Our method performed better than existing methods. Our method could direct patients to individualized care using routine medical imaging and could achieve better patient care. Abstract Radical prostatectomy (RP) is the main treatment of prostate cancer (PCa). Biochemical recurrence (BCR) following RP remains the first sign of aggressive disease; hence, better assessment of potential long-term post-RP BCR-free survival is crucial. Our study aimed to evaluate a combined clinical-deep learning (DL) model using multiparametric magnetic resonance imaging (mpMRI) for predicting long-term post-RP BCR-free survival in PCa. A total of 437 patients with PCa who underwent mpMRI followed by RP between 2008 and 2009 were enrolled; radiomics features were extracted from T2-weighted imaging, apparent diffusion coefficient maps, and contrast-enhanced sequences by manually delineating the index tumors. Deep features from the same set of imaging were extracted using a deep neural network based on pretrained EfficentNet-B0. Here, we present a clinical model (six clinical variables), radiomics model, DL model (DLM-Deep feature), combined clinical–radiomics model (CRM-Multi), and combined clinical–DL model (CDLM-Deep feature) that were built using Cox models regularized with the least absolute shrinkage and selection operator. We compared their prognostic performances using stratified fivefold cross-validation. In a median follow-up of 61 months, 110/437 patients experienced BCR. CDLM-Deep feature achieved the best performance (hazard ratio [HR] = 7.72), followed by DLM-Deep feature (HR = 4.37) or RM-Multi (HR = 2.67). CRM-Multi performed moderately. Our results confirm the superior performance of our mpMRI-derived DL algorithm over conventional radiomics.


Introduction
Radical prostatectomy (RP), the first-line treatment for organ-confined or locally advanced prostate cancer (PCa), yields excellent long-term survival outcomes [1][2][3]. Post-RP biochemical recurrence (BCR), affecting 50% of the patients, is a strong surrogate marker for indicating subsequent distant metastasis, as well as PCa-specific and overall mortality [1,4]. BCR predominantly develops in patients with high-risk features, such as high initial prostate-specific antigen (PSA) levels, adverse RP pathology, including extracapsular extension (ECE) and seminal vesicle invasion (SVI), positive surgical margins (PSM), and surgical Gleason score (GS) ≥8 [1,[4][5][6]. However, the natural history of relapse following RP is heterogeneous even in patients with high-risk features, reflecting high heterogeneity in clinical, tumor pathophysiological, and molecular aspects of PCa [4,7]. Therefore, early identification of patients with PCa at a higher risk for developing post-RP BCR is important for intensive monitoring or providing adjuvant therapies at the proper time for the control of systemic micro-metastasis for achieving long-term survival benefits.
Accurate prediction of post-RP BCR requires long-term follow-up (>10 years) owing to the high variability of BCR events and PCa's slow-growing and often indolent nature [37,38]. More importantly, the characteristics of PCa exhibit regional and ethnic variations, suggesting that race-and nation-specific characteristics of PCa should be considered for developing a more accurate prediction model. Asian men, especially Koreans, are more likely to possess unfavorable risk profiles, such as advanced pathological stage, high GS, high PSA levels, and worse prognoses than Americans and Europeans [39][40][41][42]. The performance of conventional clinical predictive tools in the Asian population, including Koreans, has been reported as suboptimal compared to those of the European and American populations [34,[43][44][45]. These limitations motivated us to develop a new system for predicting long-term post-RP BFS in Korean patients with high-risk PCa according to our own data by completely integrating all the available clinical variables and mpMRI features that could be clinically useful. To date, researchers have not developed clinically relevant mpMRI-based DL risk models for predicting post-RP long-term BCR-free survival because the erstwhile standard protocol of PCa care did not involve preoperative mpMRI before RP in most countries. Notably, in contrast to other hospitals, preoperative prostate mpMRI has routinely been performed before RP in patients with localized PCa at our institution since 2008 [46], enabling the accumulation of large-scale, high-quality mpMRI datasets in addition to long-term follow-up clinical data for developing more dependable mpMRI-based DL risk models.

Patient Selection
This monocentric study was approved by the Institutional Review Board of Samsung Medical Center (SMC) (no. 2022-01-134), which waived the requirement for written informed consent owing to the retrospective study design. The study protocols were conducted in accordance with the tenets of the Declaration of Helsinki. The inclusion criteria were as follows: (1) primary prostate acinar adenocarcinoma confirmed by RP; (2) undergoing RP after 3 T prostate mpMRI; (3) mpMRI images of adequate quality available in the Picture Archiving Communication System (Centricity; GE Healthcare, Chicago, IL, USA); (4) no history of neoadjuvant or adjuvant therapies for PCa; (5) available complete pathological and follow-up information. The exclusion criteria were as follows: (1) incomplete mpMRI or poor-quality MRI scans; (2) PSA persistence characterized by a postoperative PSA nadir >0.04 ng/mL 3 months post RP; (3) missing clinical variables; (4) tumors of other pathological types or mixed pathology. The primary endpoint was the prediction of post-RP BCR-free survival. Post-RP BCR was defined as an initial PSA value ≥ 0.2 ng/mL, confirmed by a subsequent PSA value of ≥0.2 ng/mL [1,[3][4][5]47]. The time of BCR incidence was defined as the midpoint between the nadir and the first of three consecutive elevations [47]. BCR-free survival was defined as the interval between the date of RP and that of BCR [47]. We censored the data of patients who were still alive but without BCR at the last reported follow-up.
Initially, we retrospectively reviewed the baseline clinicopathological data of 512 patients with pathologically confirmed PCa who underwent RP, with or without lymphadenectomy, at the SMC between 2008 and 2009. RP was performed by five surgeons with varying surgical experience at our institution. The decision to perform lymphadenectomy was based on the preoperative lymph node involvement risk assessment and surgeon's discretion. Participants were followed up every 3 months during the first 2 years, every 6 months between the second and fifth years, and annually thereafter. Finally, a total of 437 patients were analyzed. The study flowchart is depicted in Figure 1.

mpMRI Acquisition Protocol and Interpretation
All the patients underwent prostate mpMRI using a 3 T MRI scanner (Achieva TX; Philips Healthcare, Best, The Netherlands) equipped with a phased-array coil. Routine mpMRI protocols included T1WI, T2WI, DWI, ADC maps, and DCE sequences, acquired following intravenous injection of gadolinium diethylenetriamine penta-acetic acid (Gadovist; Schering, Berlin, Germany) [46,48]. Table 1 summarizes additional details regarding image acquisition. All the mpMRI scans were reviewed by an experienced uroradiologist (C.K.K.) with 14 years of experience in prostate MRI interpretation.   Figure 1 shows the study's conceptual workflow. First, we built a clinical variable risk model (denoted as CM) comprising six clinically important variables for post-RP BCR, namely, the patients' age during RP, preoperative PSA level, ECE, SVI, PSM, and surgical GS International Society of Urological Pathology (ISUP) grade, to predict the BCR-free survival using a Cox proportional hazards model. We assigned surgical GS values to the corresponding ISUP grade to create an ordinal classifier ranging from 1 to 5 instead of the grade group as follows: grade 1 = GS 3 + 3, grade 2 = GS 3 + 4, grade 3 = GS 4 + 3, grade 4 = GS 8 (4 + 4, 3 + 5, or 5 + 3), and grade 5 = GS 9 (4 + 5, 5 + 4) or GS 10 [49].

Tumor ROI Delineation, Radiomics Feature Extraction, Selection, and Risk Model Generation
For each patient, tumor delineation was performed by manually placing the ROI on T2WI, ADC map, and DCE images, according to the histopathological results ( Figure 2). Tumor ROI was depicted for each patient's index lesion using a three-dimensional (3-D) slicer. The lesion with the largest diameter or that demonstrating ECE was considered the index lesion. Tumor ROI measurement encompassed the maximum possible lesion extent in the image with the greatest visibility. We extracted radiomics features from the ROIs drawn on the T2WI, ADC, and DCE sequences of each patient using the open-source software, Python package "Pyradiomics" (version 3.0.1) [50]. Seventy-two radiomics features were extracted and divided into three categories: (1) 14 shape-based features, including the ROI's 3D size and shape descriptors; (2) 18 first-order features describing the voxel intensity distribution within the ROI; (3) 40 textural features, including 24 gray-level co-occurrence matrix and 16 gray-level size zone matrix features ( Figure 1). The radiomics features from the test set of each fold were z-score-normalized by applying the mean and standard deviation values of the training set. Details of the training and test set splits are described in Section 2.6. We constructed a radiomics risk model based on the radiomics features extracted from mpMRI for BCR-free survival prediction (RM-Multi) ( Figure 1). We adopted a Cox model regularized with the least absolute shrinkage and selection operator (Cox-LASSO) logistic regression analysis using fivefold cross-validation to select an optimized subset of radiomics features with nonzero coefficients and reduce overfitting and selection bias while constructing the final model ( Figure 1 and Figure S1). We also constructed a combined risk model by adding six clinical variables to the RM-Multi model using a similar Cox-LASSO approach to assess the impact of the clinical parameters (CRM-Multi) ( Figure 1).

DL Procedures and Construction of the Related Risk Models
In this novel study, we developed a deep neural network for predicting post-RP BCRfree survival following the process outlined in Figures 1 and 3. First, we extracted deep features by cropping the raw mpMRI images of T2WI, DWI with ADC, and DCE sequences of each patient into a dataset of 2.5D images as input for the DL model. The axial slice depicting the largest tumor and two additional axial slices above and below the center slice were used to sample the tumor characteristics with varying z-dimensions for each MRI modality. We normalized the intensity for each slice to 0-1. We extracted an image patch of 64 × 64 pixels centered on the tumor centroid from the center slice, since a 64 × 64 patch size was determined to contain the largest tumor. Two additional 64 × 64 patches were extracted from the upper and lower slices. The regular neural network of the natural images procured a three-channel (i.e., red, green, and blue channels) input as default. Our input consisted of three stacked axial patches, which were matched to the three-channel input of the natural image (Figures 1 and 3). Thereafter, we constructed a neural network to predict the binary BCR status, not BCRfree survival. The CNN architecture was characterized by connected nonlinear functions that learn multiple levels of representations of the input data, thereby yielding millions of possible features using a raw image [23,25,27,29,31,32,51]. In general, the performance of the DL models relies on the vast number of sample images because a large dataset allows improvement of generalizability and performance by training deeper networks [29,31,32,51]. In contrast, a key limitation is the massive number of sample images required to robustly train a model. EfficientNet, one of the most potent CNN architectures, uses a compound scaling method to increase the network depth, width, and resolution, while requiring fewer computational resources than other models [52,53]. Hence, we used an EfficientNet-B0 network pretrained from the vast ImageNet natural image database as a backbone network to handle the raw mpMRI input and partially mitigate the sample size issue of DL. The DL deep features of the intermediate layer were used to build a risk model (Figures 1 and 3). The first 45 layers were frozen, and the subsequent 158 layers were fine-tuned using local data, resulting in 320 features for each modality (Figures 1 and 3). The amalgamation of the three modalities yielded 960 deep features, which were used to predict the binary BCR status from two fully connected layers (Figures 1 and 3). Second, the deep features from the intermediate layer contained useful information for predicting BCR, which could be beneficial for BCR-free survival analysis. Consequently, these deep features were subjected to Cox-LASSO, yielding a DL deep feature risk model (DLM-Deep feature). Collating DL's deep features with six clinical variables resulted in a combined DL risk model (CDLM-deep feature) using the previously described approach. Furthermore, we applied gradient-weighted class activation mapping (Grad-CAM) to interpret the DL model for BCR prediction, which was consequently applied to the last convolutional layer.

Statistical Analysis
We performed Student's t-test and the chi-square or Fisher's exact test for the numerical and categorical variables, respectively, to compare the clinical characteristics of the patients with and without BCR. Quantitative variables are presented as the median (interquartile range [IQR]) or mean (standard deviation [SD]), whereas qualitative variables are presented as absolute values (percentages). We performed a univariate analysis to confirm the relevance of each clinical variable for BCR. Furthermore, Kaplan-Meier (K-M) curves were utilized to compare the ability of the risk models to assign a high and low risk of 10-year BCR-free survival using the median risk score as threshold. We used the hazard ratio (HR) with 95% confidence interval (CI) and log-rank tests to compare the two risk groups. The C-index measured the rank correlation between the risk score and BCR-free survival. The C-index value was higher if the time to BCR was shorter in patients at a higher risk than in patients at lower risk. Lastly, we calculated the integrated time-dependent area under the curve (iAUC), a time-wise AUC average for predicting events.
We split our data into two groups, the training and test sets, in a fivefold crossvalidation manner. Thus, our data were split into five parts, where four parts were used for training, while the remaining part was used for testing. By repeating the procedure for the five different test sets and retaining the same ratio of BCR to non-BCR cases (1:3) across all test sets, potential overfitting caused by the relatively small cohort was reduced (Table S1). We computed the average and SD of HRs with 95% CIs, C-indices, and iAUCs across all repetitions to obtain the final scores. The performance of each risk model for predicting post-RP BCR-free survival was compared to the average HRs (95% CI), C-indices, and iAUCs in a fivefold cross-validation. Two-sided p-values <0.05 were considered statistically significant. All statistical analyses were performed using Python (version 3.6.8; Python Software Foundation, Wilmington, DE, USA). Table 2 summarizes the major clinicopathological characteristics of all the patients (median age, 66 years; mean preoperative PSA level, 7.61 mg/dL). BCR occurred in 110 (25.2%) of the 437 patients following a median BCR-free survival period of 23 months (IQR: 9-52 months) during a median follow-up period of 61 months (IQR: 24-110 months). Significant differences in the clinicopathological risk factors associated with BCR (preoperative PSA, ECE, SVI, SM, and pathologic GS ISUP grade; all p < 0.001) were observed between the BCR and non-BCR groups, which was consistent with previous research [1][2][3][4]6,54]. Table 3 summarizes the performances of multiple risk models for BCR-free survival using fivefold cross-validation in our single-center dataset for both the training and the test sets. Figure 4 depicts the representative K-M analysis of the five models for the test sets described in Table 3 (Table 3 and Figure 4) in the test set.   Grad-CAM interpretation of our DL model confirmed its high degree of attention over regions close to the primary cancer for predicting the BCR status ( Figure 5). Notably, the DL deep feature risk model (denoted as DLM-Deep feature) exhibited better predictive performance than the RM-Multi, according to the cross-validation analysis; however, it fared worse than the CM, exhibiting an HR
Our DL model included 17 layers of convolution and, thus, could effectively model a wide spectrum of features starting from fine-grained (found in early layers) to abstract (found in late layers) features compared to fixed scale features of radiomics. Moreover, our DL model included nonlinear activation functions in the intermediate layers and, thus, could model possibly nonlinear interactions among the features. Another possibility could be that the DL model adopted 320 deep features in the final layer compared to 72 radiomics features per modality, and the additional degree of freedom in the DL model might have resulted in better handling of the multimodal MRI data. Lastly, our DL model adopted the pretrained EfficientNetB0 network and, thus, required fewer samples than the standard CNN training approach. Yan et al. adopted a similar approach to ours, using DL with mpMRI features to develop a deep survival neural network for predicting BCR-free survival following RP [34]. They retrospectively enrolled 485 patients who underwent RP between 2010 and 2017 from three institutions [34]. However, the performance of their deep radiomics signature was worse (C-index: 0.80) than ours (C-index: 0.89), owing to limitations in their DL approach [34], such as not directly extracting features from raw MR images but rather using previously extracted radiomics features [34]. Similarly, Hiremath et al. constructed an integrated nomogram (ClaD) using PSA levels, prostate volume, lesion volume, prostate imaging reporting and data system score, and DL predictions from biparametric MRI to identify clinically significant PCa [35]. The ClaD nomogram resulted in a significant separation in K-M survival curves between the patients with and without BCR (HR: 5.92, p = 0.044) [35]; however, its performance was inferior to that of our model owing to the lack of DCE image data and its small sample size (n = 81).
Moreover, in principle, DL models may not require tumor segmentation if the input of the model is a full slice [71]. However, in our case, we adopted an image patch centered on the tumor centroid partly because using the patch requires less memory compared to using the full slice. Thus, we required tumor centroid information and not the full tumor segmentation information. Specifying the patch is easier to perform than providing voxel-level tumor segmentation. Hence, we reduced the expert burden and associated variability compared to using tumor segmentation in the radiomics models. We sampled three 64 × 64 image patches that enclosed the primary tumor in three axial slices to apply patch-based DL approaches focusing on the tumor ROI. Through this approach, we could easily incorporate the tumor periphery related to the TME (peritumoral region immediately surrounding the visible tumor related to the tumor-adjacent normal tissues and TME), as the DL analysis included a rectangular region comprising both the tumor ROI and the periphery [72]. Similarly, Hiremath et al. explored multiple input configurations of the DL network, by extracting patches at different scales; each subsequent scale incorporated additional information from the peritumoral region on biparametric MRI [35]. The additional information provided by out-of-lesion tissues could have improved the learning of the DL framework, since mpMRI consistently underestimates the PCa pathological tumor size/extent when extending substantially beyond the ROI surface [22,73,74]. Furthermore, peritumoral lesions, forming the TME at the intermediate and preneoplastic states, provide critical molecular information that could serve to differentiate between aggressive and indolent tumors [35,72,75,76]. Dynamic cell-to-cell interactions with the tumor, extracellular matrix, or secreted protumoral factors lead to TME heterogeneity, which plays a key role in PCa progression, distant metastasis, lethal outcomes, and therapeutic failure [75][76][77].
The EfficientNet network series demonstrated a high performance with respect to the efficiency and accuracy of ImageNet tasks [52,78]. Herein, the shortcoming of the relatively small sample size of the mpMRI input was partially mitigated by adopting the EfficientNet-B0 model [10,78]. The multimodality deep survival network based on Effi-cientNetB0 and clinical features demonstrated the best performance for predicting post-RP long-term BCR-free survival. We applied several optimization steps to the learned deep features to improve the prediction performance, and we combined previously known important clinical variables related to BCR to achieve complete integration of the available useful clinical information, concurrent with other studies on BCR-free survival prediction tasks [34,35,79]. Recently, a Korean Prostate Cancer Database registry-based analysis demonstrated that the PSA levels, surgical GS ≥8, adverse pathological features [PSM, SVI, ECE], and adverse laboratory features (detectable PSA levels after 6 weeks) were significant predictors of BCR-free survival and overall survival (OS) following RP in Korea [6]. On the basis of these findings, the clinicopathological characteristics, including preoperative PSA, surgical GS, PSM, ECE, and SVI, rather than the preoperative biopsy finding, were considered for constructing our prediction model. The combined model CDLM-Deep feature achieved stronger discriminatory power than any other model (CM, RM, CRM-Multi, and DLM-Deep feature) as indicated by performance indicators, such as the HR, C-index, and iAUC. The CDLM-Deep feature model performed better than the DLM-Deep feature or CM models, providing further evidence that deep features can adjust to domain-specific data and complementary information beyond clinical variables. This points to further evidence of deep features adjusting to domain-specific data and providing complementary information beyond clinical variables. Compared with the traditional predictive tools, our approach of combining DL deep features with clinical risk factors could facilitate prognostic predictions based on pre-RP mpMRI, further contributing to precision oncology and enhanced quality of care following RP in Korean patients with high-risk PCa. Additional advantages, such as a longer follow-up with larger sample sets compared to previous radiomics approaches that missed out on recent advances in DL [10,15,[17][18][19][20]70,80], could improve the robustness and relevance of our study. Interestingly, the performance of the CRM-Multi model was comparable (or worse) to that of the CM model while several studies demonstrated that the clinical-radiomics combined model showed higher performance than the clinical or radiomics model alone for predicting clinically significant PCa and BCR-free survival [81,82]. Our contradictory findings suggest that radiomics features were predetermined; thus, they could not be adjusted to compensate for the possible overlap in the clinical variables. Further studies are needed to validate the improvement of predictive accuracy by incorporating additional clinical features to DL based on prostate mpMRI for post-RP long-term BCR-free prognosis.
Prostate-specific membrane antigen (PSMA) is an oncogenic transmembrane glycoprotein overexpressed on PCa cells, and higher degrees of PSMA expression are associated with higher aggressive biology associated with PCa progression and recurrence [83,84]. Quantitative PSMA positron emission tomography (PET) analysis has shown high performance in identifying clinically significant intraprostatic lesions and in providing noninvasive and objective risk stratification of patients with primary PCa [85][86][87][88][89][90][91][92][93][94][95]. Recent studies regarding radiomics combined with machine learning (ML) and DL applications in PSMA-PET demonstrate the potential feasibility of PSMA-PET as a novel tool for PCa risk stratification and characterization of its biological properties [85][86][87][88]91,93,96,97]. For example, in the prospective study by Cysouw and colleagues, ML-based analysis of pre-operative quantitative [ 18 F]DCFPyL PET metrics could predict LNI and high-risk pathological tumor features in patients with intermediate-and high-risk primary PCa scheduled for robot-assisted RP with extended PLND [85]. Furthermore, combining information from mpMRI and the PSMA-PET might offer complementary information in the diagnosis of intermediate and high-risk PCa patients, as well as detect tumor recurrence, overcoming the limitation of a single technique to the analysis of the tumor and TME within the whole prostate gland [87,91,93,96,97]. For example, a recent study suggested the potential to discriminate between low-and high-risk PCa and to predict the BCR or the overall patient risk in primary PCa patients built on [ 68 Ga]Ga-PSMA-11 PET/MRI radiomics and ML equally compared to preoperative invasive biopsy [91]. Therefore, the added value of ML/DL and radiomics-based PSMA-PET/mpMRI images in predicting BCR and risk stratification in patients with newly diagnosed PCa deserves to be further explored.
This study had several limitations. Firstly, single-center mpMRI imaging data performed 14-15 years ago were retrospectively evaluated. Secondly, tumor ROI delineation was manually performed by one expert. Thirdly, the lack of generalizability of the model predictions prevented adequate external and prospective validation of our prediction models. Thus, future research should focus on multicenter, prospective multifaceted studies combined with multi-omics (e.g., genomics, proteomics, and metabolomics) in large samples to assist the development of reproducible and interpretable DL risk models.

Conclusions
Collectively, the effective integration of deep features from presurgical prostate mpMRI with clinicopathological parameters was the most powerful prognostic tool for long-term post-RP BCR-free survival. Thus, our novel DL risk model could facilitate prognostication based on routine prostate mpMRI, facilitating patient stratification following RP and individualized postoperative management.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers15133416/s1, Figure S1: Details of the RM-multi model in terms of selected features over five folds. Table S1: Data distribution of 5-fold cross-validation. Each fold keeps the ratio of BCR cases (around 75%).  Data Availability Statement: The data are unavailable due to privacy concerns.

Conflicts of Interest:
The authors declare no conflict of interest.