International Multi-Site Initiative to Develop an MRI-Inclusive Nomogram for Side-Specific Prediction of Extraprostatic Extension of Prostate Cancer

Simple Summary For patients with newly diagnosed prostate cancer, it is important to detect tumor growth beyond the prostate, as this can affect a patient’s prognosis, influence management decisions, and alter treatment strategies. It is recognized that on prostate MRI, some instances of extraprostatic tumor growth can be missed. In this study, we merged patient data from multiple hospitals in different countries and developed a type of mathematical formula called “nomogram” that combines MRI findings with other available patient data. The results of our study allow physicians to more accurately diagnose extraprostatic tumor growth by combining clinical, biopsy, and MRI-derived information according to their relative statistical importance. Abstract Background: To develop an international, multi-site nomogram for side-specific prediction of extraprostatic extension (EPE) of prostate cancer based on clinical, biopsy, and magnetic resonance imaging- (MRI) derived data. Methods: Ten institutions from the USA and Europe contributed clinical and side-specific biopsy and MRI variables of consecutive patients who underwent prostatectomy. A logistic regression model was used to develop a nomogram for predicting side-specific EPE on prostatectomy specimens. The performance of the statistical model was evaluated by bootstrap resampling and cross validation and compared with the performance of benchmark models that do not incorporate MRI findings. Results: Data from 840 patients were analyzed; pathologic EPE was found in 320/840 (31.8%). The nomogram model included patient age, prostate-specific antigen density, side-specific biopsy data (i.e., Gleason grade group, percent positive cores, tumor extent), and side-specific MRI features (i.e., presence of a PI-RADSv2 4 or 5 lesion, level of suspicion for EPE, length of capsular contact). The area under the receiver operating characteristic curve of the new, MRI-inclusive model (0.828, 95% confidence limits: 0.805, 0.852) was significantly higher than that of any of the benchmark models (p < 0.001 for all). Conclusions: In an international, multi-site study, we developed an MRI-inclusive nomogram for the side-specific prediction of EPE of prostate cancer that demonstrated significantly greater accuracy than clinical benchmark models.


Introduction
The diverse natural history of localized prostate cancer makes accurate risk stratification a challenging but indispensable requirement for selecting the most appropriate management strategy for any individual patient. Along with other clinical, blood-and biopsy-derived biomarkers, clinical cancer stage on digital rectal examination, which takes into account the presence or absence of extraprostatic disease extension, plays an integral part in risk stratification. While multiple prospective studies and meta-analyses have shown that magnetic resonance imaging (MRI) reliably detects clinically significant prostate cancer [1,2], it lacks sensitivity for diagnosing extraprostatic disease extension (EPE) [3]. For example, a recent meta-analysis pooling data from 9796 patients yielded a sensitivity of just 57% [4]. It must be noted, however, that for most patients in this analysis only a limited MRI protocol was acquired (i.e., T2-weighted sequences only) and that recent 'multiparametric' MRI protocols yielded higher pooled sensitivities for EPE in subgroup analyses [4]. A lack of awareness of this limited diagnostic precision may even result in adverse clinical outcomes, as was suggested by studies conducted earlier in the history of clinical prostate MRI, where the acquisition of pre-operative MRI was associated with a higher rate of positive surgical margins [5,6]. Although subsequent prospective studies [7,8] and a meta-analysis [9] have since contradicted those unfavorable findings, MRI alone cannot be used to reliably exclude or diagnose EPE. Nevertheless, MRI does offer high specificity for EPE (e.g., 91% in the above-mentioned meta-analysis [4]). Integrating MRI-derived information with clinical information might therefore result in more precise clinical staging, as demonstrated in multiple single-institution studies of American [10][11][12][13], European [14][15][16], and Asian [17] populations, as summarized in Table 1. The single-center methodology of all these prior studies, however, limits their generalizability. This is so not only because radiologists from different institutions might recognize and interpret MRI findings differently [18], but also because patient selection and management may differ between institutions. In fact, the encouraging results found in the single-institution studies Cancers 2021, 13, 2627 3 of 14 cited above [10][11][12][13][14][15][16][17] failed to be reproduced [19,20] or were only partially reproduced [21] by other research groups (Table 1). Inspired by discussions held at the Global Summit for Prostate Cancer (organized by the AdMeTech Foundation), we compiled an international, multi-site dataset of patients who had undergone pre-prostatectomy MRI and used this dataset to develop a new nomogram for the side-specific prediction of EPE based on clinical, biopsy-, and MRIderived information. We then compared the performance of this nomogram to that of established, non-MRI-inclusive models for predicting EPE of prostate cancer.

Materials and Methods
This was a multi-site retrospective study of consecutive patients with biopsy-proven prostate cancer who underwent dedicated multi-parametric MRI before radical prosta- tectomy. Each participating institution was invited to provide anonymized data on up to 100 consecutive patients going backwards in time from 31 December 2017. The study design and submission of anonymized patient data were approved by the institutional review boards of all participating institutions. Demographic and clinical variables were retrospectively extracted from the medical records and included patient age, clinical stage on digital rectal examination, serum level of prostate-specific antigen (PSA), and PSA density. Biopsy data was collected separately for the left and right sides and included the number of cores taken, the number of positive cores, the highest Gleason grade group, as well as the maximum absolute and relative cancer extent in a single core. MRIs were acquired and interpreted at the respective institution according to the Prostate Imaging Reporting and Data system version 2.0 (PI-RADSv2.0), which has been described in detail previously [22]. The diameter and capsular contact length of the largest and/or most suspicious MRI-visible lesion was measured by the radiologist. The likelihood of EPE was scored by the interpreting radiologist on a 5-point Likert scale separately for the left and right sides of the gland according to previously published criteria [23]. To mitigate potential inter-site variabilities in the assignment of Likert scores for EPE, we reduced the original 5-tiered EPE Likert score to a 3-point scale for the statistical analyses as follows: EPE Likert scores of 1 and 2 were classified as "negative" for EPE on MRI, scores of 3 and 4 as "equivocal," and scores of 5 as "positive". Side-specific presence or absence of EPE on prostatectomy specimens as documented in the pathology reports served as the reference standard. After anonymization, all data were submitted to the leading institution; no central MRI or pathology review was performed.

Statistical Considerations
Multivariate imputation was performed by chained equations for the variables, and a logistic regression model was used to predict the side-specific presence of EPE on prostatectomy specimens. A series of regression models were run whereby each variable with missing data was modeled conditional upon the other variables in the data. The modeling process was repeated for a number of cycles, with the imputation being updated at each cycle. At the end, the final imputations were retained, resulting in one imputed dataset [24]. All predictors of interest were added in the starting full model before model selection. A reduced model was created using a stepdown model reduction technique that identifies the best parsimonious model using the concordance index as a stopping criterion. Variables for which more than 50% of data points were missing were excluded from the analysis. Restricted cubic splines were used for PSA. To evaluate the performance of the proposed model, bootstrap resampling with 1000 repetitions was adopted to assess 95% confidence interval (CI) of the area under the receiver operating characteristic (AUROC) curve before the calibration curves and decision analysis curves [25] were assessed. For cross validation, each time one center was used as validation data and the other centers as development data.

Benchmark Comparisons
The performance of the MRI-inclusive nomogram developed in the present study was benchmarked against established models for the prediction of EPE that are based on clinical and biopsy data, i.e., the Memorial Sloan Kettering Cancer Center (MSKCC) "Pre-Radical Prostatectomy" nomogram, which is derived from the data of 11,552 patients treated at MSKCC and considers: patient's age, PSA levels, clinical tumor stage, biopsy Gleason grades/scores, and the proportion of positive biopsy cores [26]; the updated Partin tables, which are based on data from 5629 men who underwent surgery at the Johns Hopkins Hospital and integrate PSA levels, biopsy Gleason score, and clinical stage [27]; a prospectively developed and validated multi-institutional model based on data from 6823 patients collected by the Belgian Cancer Registry which is based on PSA levels, clinical cancer stage, biopsy Gleason score, and the proportion of positive biopsy cores [28]; and a side-specific nomogram (PSA, clinical stage, biopsy Gleason sum, percent positive cores, Cancers 2021, 13, 2627 5 of 14 percent cancer in biopsy core) developed in Germany with data from 1118 prostatectomy patients [29]. The first three benchmark models were developed and intended for prediction of EPE for the whole prostate, and we applied them in our dataset on a whole-gland basis.

Study Population
Data on 848 patients were submitted from 10 institutions (three in the United States of America; two each in France and Germany; one each in Denmark, Italy, and Spain). Eight cases were excluded due to incomplete data regarding EPE on prostatectomy specimens, leaving data from 840 individuals for the final analyses. The median time from prostate biopsy to prostatectomy was 86 (IQR: 63-118) days and the median time from MRI to prostatectomy was 76 (IQR: 40-113) days. The MRI was performed before biopsy in 393 patients (46.8%; median interval: 26 days, IQR: 10, 45) and after the biopsy in 340 (40.5%; median interval: 53 days, IQR: 33, 77). One-hundred-and-seven patients (12.7%) underwent MRI and biopsy on the same day. Systematic transrectal ultrasound-guided prostate biopsies were performed in 819/840 patients (97.5%) and the median number of systematic biopsy cores was 12 (IQR: 10-12) per patient and 6 (IQR: 5-6) per prostate side. In 189/840 (22.5%) individuals, targeted biopsies were taken from the right side of the prostate, in 219/840 (26.1%) from the left, and in 98/840 (11.7%) from both sides, the midline prostate, or an unspecified location. Because the aim of this study was side-specific prediction of EPE and the side-specific data completeness for targeted biopsies was less than 50%, these biopsies were not included in the statistical analyses. All MRI scans comprised T1-, T2-, and diffusion-weighted sequences; additional dynamic contrast-enhanced sequences were acquired in 687/840 cases (81.8%) and MRI spectroscopy was performed in 96/840 cases (11.4%). EPE was present in 320/840 prostatectomy specimens (38.1%), and the side-specific prevalence of EPE on histopathology was 365/1680 (21.7%). Detailed descriptive statistics on demographic, clinical, biopsy, and MRI data, as well as the proportion of missing data, are listed in Table 2.

Inter-Site Variabilities
We observed significantly different distributions of demographic, clinical, and biopsy parameters between institutions, including patient age, PSA levels, PSA density, and cancer stage on digital rectal examination, percentage of positive cores, maximum tumor extent in a single core, and biopsy Gleason grade groups (p < 0.001 for all, Table 2). On MRI, different institutions reported significantly different distributions of PI-RADSv2 scores, median lesion size, and lengths of capsular contact (p < 0.001 for all). The proportion of patients classified as "negative for EPE" on MRI ranged between 25.0% and 78.6%; "equivocal" findings for EPE were reported in 15.3% to 60.4%; and EPE was thought to be "definitely present" in 2.5-15.9% of individuals (p < 0.001). This data is detailed for every institution in Table 2. We did not observe significant inter-institutional differences in the prevalence of EPE on prostatectomy specimens (range: 29.3-47.7%) or the diagnostic accuracy for EPE on MRI (AUROC range: 0.65-0.83). Table 2. Clinical, biopsy, and MRI-derived data for the entire study cohort and every participating institution. Continuous variables are presented as medians (interquartile range); # for inter-institutional comparisons; * only data on relative core involvement submitted, # p-values for inter-institutional comparisons.

Nomogram and Benchmarks
The initial model included patient age, PSA, PSA density, clinical tumor stage, sidespecific biopsy data (i.e., percentage of positive systematic biopsy cores, highest Gleason grade group, largest tumor extent), and side-specific MRI data (i.e., presence of a PI-RADS 4/5 lesion, lesion diameter, level of suspicion for EPE, length of capsular contact). Clinical tumor stage, PSA, and lesion diameter on MRI were dropped through stepwise selection; the resulting nomogram included patient's age, PSA density, as well as side-specific biopsy and MRI data, and detailed in Figure 1. Performance analysis yielded an AUROC of 0.828 for this model (bootstrap-validated 95% confidence limits: 0.805-0.852). Cross validation analyses, where each center was used as validation data and the other nine centers as development data, resulted in an average AUROC of 0.820 (range: 0.735-0.883). In our dataset, this new, MRI-inclusive model predicted EPE significantly more accurately than did any of the benchmark statistical models (p < 0.001 for all), as detailed in Table 3. Decision curve analyses of the proposed MRI-inclusive nomogram and benchmark models are displayed in Figure 2, and a calibration plot for all models in Figure 3.
"definitely present" in 2.5-15.9% of individuals (p < 0.001). This data is detailed for every institution in Table 2. We did not observe significant inter-institutional differences in the prevalence of EPE on prostatectomy specimens (range: 29.3-47.7%) or the diagnostic accuracy for EPE on MRI (AUROC range: 0.65-0.83).

Nomogram and Benchmarks
The initial model included patient age, PSA, PSA density, clinical tumor stage, sidespecific biopsy data (i.e., percentage of positive systematic biopsy cores, highest Gleason grade group, largest tumor extent), and side-specific MRI data (i.e., presence of a PI-RADS 4/5 lesion, lesion diameter, level of suspicion for EPE, length of capsular contact). Clinical tumor stage, PSA, and lesion diameter on MRI were dropped through stepwise selection; the resulting nomogram included patient's age, PSA density, as well as side-specific biopsy and MRI data, and detailed in Figure 1. Performance analysis yielded an AUROC of 0.828 for this model (bootstrap-validated 95% confidence limits: 0.805-0.852). Cross validation analyses, where each center was used as validation data and the other nine centers as development data, resulted in an average AUROC of 0.820 (range: 0.735-0.883). In our dataset, this new, MRI-inclusive model predicted EPE significantly more accurately than did any of the benchmark statistical models (p < 0.001 for all), as detailed in Table 3. Decision curve analyses of the proposed MRI-inclusive nomogram and benchmark models are displayed in Figure 2, and a calibration plot for all models in Figure 3.    [26] 0.675 (0.638, 0.712) * Belgian Cancer Registry Nomogram [28] 0.679 (0.641, 0.716) * Updated Partin Tables [27] 0.601 (0.563, 0.640) * Side-Specific Clinical Nomogram [29] 0.650 (0.619, 681) * * p-value < 0.001 for comparison with the MRI-inclusive nomogram.
To further validate the diagnostic performance of the proposed nomogram, we performed additional analyses by using data from six institutions as training set and data from the others as validation set. This process was repeated on all 210 possible permutations and yielded similar results as the bootstrap-validated model (AUROC: 0.821 vs. 0.828) ( Table S1). The analyses were repeated without imputation of missing data and the AUROC of the MRI-inclusive nomogram was slightly lower compared with the bootstrap-validated model with imputed data (AUROC 0.799 vs. 0.828) ( Table 4). The y-axis depicts the benefit of each nomogram to identify EPE correctly, and the x-axis refers to how clinicians appraise different outcomes in a given clinical context. A detailed guide for the interpretation of decision curves can be found in [30].
To further validate the diagnostic performance of the proposed nomogram, we performed additional analyses by using data from six institutions as training set and data from the others as validation set. This process was repeated on all 210 possible permutations and yielded similar results as the bootstrap-validated model (AUROC: 0.821 vs. 0.828) ( Table S1). The analyses were repeated without imputation of missing data and the AUROC of the MRI-inclusive nomogram was slightly lower compared with the bootstrapvalidated model with imputed data (AUROC 0.799 vs. 0.828) ( Table 4). Table 3. Performance statistics of the MRI-inclusive nomogram and the benchmark models.

Figure 2.
Decision curve analyses for the proposed MRI-inclusive nomogram and the benchmark models. The y-axis depicts the benefit of each nomogram to identify EPE correctly, and the x-axis refers to how clinicians appraise different outcomes in a given clinical context. A detailed guide for the interpretation of decision curves can be found in [30].
Cancers 2021, 13, x FOR PEER REVIEW 10 of 15 Table 4. Validation of the diagnostic performance of the MRI-inclusive and benchmark nomograms by using data from six institutions as training set and data from the others as validation set. This process was repeated on all 210 possible permutations, both with and without imputation of missing data.

Figure 3.
Calibration plot for the proposed MRI-inclusive nomogram and the benchmark models illustrating the actual frequency of EPE on the y-axis and the predicted probability on the x-axis (a more in-depth explanation can be found in [31]).

Discussion
In this international, multi-site study, we developed an MRI-inclusive nomogram for the side-specific prediction of EPE of prostate cancer. The nomogram integrates demographic, clinical, biopsy-, and MRI-derived variables and offers two advantages over established prediction models: First, it provides side-specific information about EPE, which Calibration plot for the proposed MRI-inclusive nomogram and the benchmark models illustrating the actual frequency of EPE on the y-axis and the predicted probability on the x-axis (a more in-depth explanation can be found in [31]). Table 4. Validation of the diagnostic performance of the MRI-inclusive and benchmark nomograms by using data from six institutions as training set and data from the others as validation set. This process was repeated on all 210 possible permutations, both with and without imputation of missing data.

Discussion
In this international, multi-site study, we developed an MRI-inclusive nomogram for the side-specific prediction of EPE of prostate cancer. The nomogram integrates demographic, clinical, biopsy-, and MRI-derived variables and offers two advantages over established prediction models: First, it provides side-specific information about EPE, which is particularly useful for surgical or radiation therapy planning. Secondly, it predicts EPE more accurately than established statistical models and may help clinicians better assess a given patient's risk for disease progression.
Our results corroborate findings from prior single-institution studies where the addition of MRI-derived information to clinical and biopsy data led to more precise predictions of EPE, both on a side-specific basis [11,14,15,17], as well as for the whole prostate [10,13,16]. However, clinical side-specific nomograms without MRI data were also found to be highly accurate in external validation cohorts [32], and other single-center studies did not reproduce the promising results of MRI-inclusive models. For example, in a retrospective study of 236 patients, the integration of MRI findings did not significantly increase the precision of the MSKCC pre-radical prostatectomy nomogram [19]. One of the reasons for the inconsistencies of published data might lie in the well-documented variability of radiologists' performance levels in identifying EPE on MRI [33], a skill that strongly depends on dedicated training [34] and the degree of sub-specialization [35]. In fact, a prospective study including three radiologists showed that while two of them added incremental precision to clinical prostate cancer staging with their MRI interpretations, the third failed to do so [21]. Inter-site differences in patient selection and management strategies may also limit the reproducibility of single-center studies. In the above-cited studies [10][11][12][13][14][15][16][17][19][20][21], for example, the proportion of patients with extraprostatic disease extension on digital rectal examination (i.e., cT3) and on prostatectomy specimens (i.e., pT3a) ranged between 0-21% and 16-55%, respectively, and the percentage of patients with a Gleason score of 8 or higher on biopsy ranged from 2.7% to 44.0%. These data closely resemble data from our study cohort, where the frequency of cT3 disease ranged between 0% and 10%, and the proportion of patients with a biopsy Gleason score ≥8 ranged from 9.0% to 44.8% among institutions. These figures highlight the substantial inter-site differences in patient selection for prostatectomy. Consequently, single-institution cohorts may lack representativeness, making statistical models derived from them challenging to reproduce. This challenge points to the importance of pooling international multi-site data-as was done in the present study-for creating more comprehensive and representative datasets and thus increasing the robustness and external applicability of any deduced statistical model.
The performance comparisons between the nomogram developed in this study and the established benchmark models must be interpreted carefully. The proposed MRIinclusive nomogram might be overfitted and its performance might be overestimated despite bootstrap-and cross-validation. The benchmark models performed similarly or slightly worse in our study cohort than in their respective training cohorts. While the MSKCC pre-radical prostatectomy nomogram performed equally well in our study cohort as it did in its development cohort (i.e., AUROC 0.675 vs. 0.657), the updated Partin tables model was slightly less accurate in our cohort than in its development cohort (i.e., AUROC 0.601 vs. 0.702); this might have been due to differences in the two patient cohorts, with the Johns Hopkins prostatectomy cohort having, on average, lower-risk disease, as exemplified by the lower prevalence of biopsy Gleason score ≥8 (i.e., 8% vs. 19%) and EPE on prostatectomy specimens (i.e., 23% vs. 38%) [27]. Similarly, the Belgian Cancer Registry model performed slightly less well in the present cohort than in its development cohort (AUROC 0.679 vs. 0.773); this model was also developed in a population with lower-risk disease than the present cohort (e.g., biopsy Gleason score ≥8: 12% vs. 19%; EPE on prostatectomy: 19% vs. 38%) [28]. The side-specific clinical benchmark model by Steuber et al. performed less accurately in the current than the original cohort (AUROC: 0.650 vs. 0.840) [29]. Here again, the risk profile was different between the cohorts, as exemplified by the proportion of patients with T1c disease (82% vs. 69%) or pathological EPE (27% vs. 38%) [29]. Even taking all these potential confounders into account, we infer that the separation of the ROC curves is wide enough to conclude that the MRI-inclusive nomogram presented herein predicts EPE more accurately than do established clinical prediction models.
The current study corroborates the high specificity and positive predictive value of prostate MRI for the diagnosis of EPE [3], as reflected by the relative weight of a 'positive MRI' in our nomogram. It also highlights the similar relative importance of clinical and biopsy-derived metrics, as exemplified by the high statistical weight of PSA density and biopsy tumor amount in the nomogram. In high-risk patients with unequivocal EPE on MRI, the nomogram might therefore provide only limited additional information. However, in less definitive cases, for example those with equivocal MRI findings and/or low/intermediate risk features, the statistically appropriate integration of multiple datapoints through this nomogram might help to more accurately assess the likelihood of EPE. The utility of MRI as a component of local staging tools might be further increased by the extraction of radiomic data in combination with machine learning or artificial intelligence algorithms, as suggested by recent studies [36][37][38][39]. The main limitation of our study is its retrospective design and the fact that all patients underwent prostatectomy introduces a selection bias. Although it is possible that the identification and interpretation of MRI findings differed between institutions [18], our pooled data likely provides a balanced representation of current radiology practice patterns from multiple countries. The incompleteness of side-specific data on targeted biopsies is another limitation of this study as MRI-targeted biopsies increase the detection rate of high-grade cancer [40]; given the association of high-grade cancer and EPE in our and multiple previously published cohorts, this higher detection rate would likely translate into a more accurate prediction of EPE. As discussed above, our statistical model might be overfitted and might perform worse in external validation cohorts. MRIs and prostatectomy specimens were reviewed by a single radiologist/pathologist at each institution and we did not assess for inter-reader variability, which is documented in the literature for radiologists [41,42] and pathologists [43,44].

Conclusions
This study produced a new MRI-inclusive nomogram for side-specific prediction of EPE of prostate cancer based on data from ten sites in six countries. The nomogram integrates demographic, clinical, biopsy, and MRI data and outperforms clinical benchmark models.