1. Introduction
Prostate cancer is responsible for 13% of all male cancer deaths in the UK, yet this is contrasted by 10-year survival rates approaching 84% [
1]. This dichotomy has led to uncertainty for clinicians in how best to diagnose and predict the outcome for prostate cancer patients to minimise overdiagnosis and overtreatment whilst appropriately treating men with aggressive disease [
2]. More accurate discrimination of disease state in biopsy naïve men would mark a significant development compared to current standards and impact large numbers of patients suspected of harbouring prostate cancer. The development of such a pre-biopsy screening test would provide a convenient checkpoint along the clinical pathway for patients to exit without the need for further invasive and stressful follow-up.
Under current guidelines patients are selected for further clinical investigations for prostate cancer if they have an elevated prostate specific antigen (PSA) (≥4 ng/mL) and/or an adverse finding on digital rectal examination (DRE) or lower urinary tract symptoms, whilst other factors such as age and ethnicity are also considered alongside patient preference [
3,
4,
5]. More recently multiparametric MRI (mpMRI) has been used as a triage tool to reduce negative biopsy rates since its validation in the PROMIS clinical trial [
6]. However, as it has gained more widespread adoption, mpMRI has shown a higher rate of inter-operator and inter-machine variability than reported in controlled clinical trials; up to 28% of clinically significant disease is missed in practice [
5,
7,
8,
9]. Coupled with the relative expense, time and expertise required to undertake an mpMRI meeting the current clinical guidelines, there is a need to improve on current clinical practices.
Biomarkers utilising tissue samples taken at the time of diagnosis for the detection of aggressive or significant prostate cancer requiring clinical attention are relatively plentiful [
10,
11,
12,
13]. Many of these markers are good tests, whether that be for discerning the most aggressive disease [
11,
14], or for predicting disease-free survival following radical prostatectomy [
15]. However, requiring tissue means a biopsy must already have been performed, making these tests incompatible with reducing the rates of unnecessary biopsy that come at considerable economic, psychological and societal cost to patients and healthcare systems alike [
2,
16,
17].
As a secretory organ directly interacting with the male urinary tract, the prostate is well-placed as a candidate for non-invasive liquid biopsy from urine samples [
18]. Single- or few-biomarker panels such as Engrailed-2 (EN2) protein expression [
19], the SelectMDx [
20] and ExoDx Prostate (IntelliScore) [
21] tests have published promising results for the non-invasive detection of significant disease (Gleason score (Gs) ≥ 7). However, they are in various stages of clinical validation and none are currently implemented in the UK healthcare system [
5]. Most urinary biomarkers developed to date for the prediction of biopsy outcome are unimodal; considering a singular fraction of urine (such as the cell-pellet or cell-free fractions) or biological aspect of cancer to appraise disease status. Whilst these tests have shown promising clinical use and accuracy, for the majority it has not yet been explored whether extra predictive value could be derived by integrating multiple streams of information from other sources.
Since initial development, the SelectMDx model has been updated to include clinically available parameters of serum PSA, patient age and DRE alongside urinary
HOXC6 and
DLX1 mRNA, adding significant predictive ability for patients with a PSA < 10 ng/mL [
22]. We have also recently shown the benefit of such a holistic approach, presenting the development of the multivariable ExoMeth risk prediction model integrating clinical parameters, hypermethylation within the urinary cell pellet and urinary cell-free RNA expression data that displayed improved clinical utility over any single mode [
23].
EN2 is a homeodomain-containing transcription factor that has an essential function in early development, which in mammals includes the delineation of the midbrain/hindbrain border [
24]. For a transcription factor it has a number of unusual properties, including the ability to be secreted from cells and taken up by others [
25]. Indeed, a recent study indicated that prostate cancer cells can secrete EN2 protein through vesicles which are then taken up by other non-EN2 expressing cells, where it can directly influence the transcription of target genes [
25].
This secretory behaviour of EN2 makes it a potential biomarker for prostate cancer, and indeed EN2 protein can be detected in the urine of men with prostate tumours [
19]. The original and subsequent studies have generally supported a diagnostic role for urinary EN2, including a relationship between urinary EN2 concentration and tumour volume [
19,
26]. More recently, a lateral flow-based test for EN2 has been described that could potentially allow point-of-care testing [
27].
In this study, we report the utility of a predictive model produced by the integration of clinically available parameters, urinary EN2 protein levels and targeted cell-free RNA transcriptomics. The data were collected within the Movember Global Action Plan 1 (GAP1) study that explored a range of biomarkers in urine for PCa diagnosis and prognosis. The clinical utility of this model is determined by the ability to predict the presence of Gs ≥ 7 and Gs ≥ 4 + 3 disease on biopsy, both critical distinctions in clinical settings, where patients with Gs ≥ 7 are recommended radical therapy [
5], whilst patients with Gs 4 + 3 have significantly worse outcomes than Gs 3 + 4 patients [
28]. Aware that most cancer biomarkers and predictive models fail to reach clinical adoption, we have adhered to the guidelines for the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) whilst developing the models and results presented here [
29].
2. Materials and Methods
2.1. Patient Population and Characteristics
The full Movember GAP1 urine cohort comprises 1257 first-catch post-DRE urine samples collected between 2009 and 2015 from urology clinics at multiple sites, as described in Connell et al. (2019). As a diverse range of techniques was applied to samples from this cohort and restricted amounts of urine, the number of experiments that could be performed on any one sample was limited. Samples within the Movember cohort that were quantified for both EN2 levels by ELISA and cell-free-RNA (cf-RNA) expression by NanoString (Seattle, WA, USA) were eligible for selection for model development in the current study (n = 218).
Exclusion criteria for model development included a recent prostate biopsy or trans-urethral resection of the prostate (<6 weeks) and metastatic disease (confirmed by a positive bone-scan or PSA > 100 ng/mL), resulting in a cohort of 207 samples, deemed the ExoGrail cohort (
Table 1). All samples analysed in the ExoGrail cohort were collected from the Norfolk and Norwich University Hospital (NNUH, Norwich, UK). Sample collections and processing were ethically approved by the East of England REC.
2.2. Sample Processing and Analysis
Urine samples were processed according to the Movember GAP1 standard operating procedure (Supplementary Methods). In brief, within 30 min of collection, urine was centrifuged (1200×
g 10 min, 6 °C) to remove cellular material. Supernatant extracellular vesicles were harvested by microfiltration and cell-free mRNA extracted (RNeasy micro kit, #74004, Qiagen, Hilden, Germany) on the same day that they were provided by the patient. RNA was amplified as cDNA with an Ovation PicoSL WTA system V2 (Nugen, Redwood City, CA, USA, #3312-48). Urinary EN2 protein concentration was quantified by ELISA from whole urine using a monoclonal anti-mouse EN2 antibody, as described by Morgan et al. (2011) [
19]. Cell-free mRNA was quantified from urinary extracellular vesicles using NanoString technology, with 167 gene-probes (
Table S1), as described in Connell et al. (2019), with the modification that NanoString data were normalised according to NanoString guidelines using NanoString internal positive controls, and log
2 transformed. Clinical variables serum PSA, age at sample collection, DRE finding, and urine volume collected were considered.
2.3. Statistical Analysis
All analyses, model construction and data preparation were undertaken in R version 3.5.3 [
30], and unless otherwise stated, utilised base R and default parameters. All data and code required to reproduce these analyses can be found at the UEA Cancer Genetic GitHub repository [
31].
2.4. Feature Selection
In total, 172 variables were available for prediction (cf-RNA (
n = 167), clinical variables (
n = 4) and urinary EN2 (
n = 1); for full list see
Table S1), making feature selection a key task for minimising model overfitting and increasing the robustness of trained models. To avoid dataset-specific features being positively selected [
32], we implemented a robust feature selection workflow utilising the Boruta algorithm [
33] and bootstrap resampling. Boruta is a random forest-based algorithm that iteratively compares feature importance against random predictors, deemed “shadow features.” Features that perform significantly worse compared to the maximally performing shadow feature at each permutation, (
p ≤ 0.01, calculated by Z-score difference in mean accuracy decrease) are consecutively dropped until only confirmed, stable features remain.
Boruta was applied on 1000 datasets generated by resampling with replacement. Features were only positively selected for model construction when confirmed as stable features in ≥90% of resampled Boruta runs.
2.5. Comparator Models
To evaluate potential clinical utility, additional models were trained as comparators using subsets of the available variables across the patient population: a clinical standard of care (SoC) model was trained by incorporating age, PSA, T-staging and clinician DRE impression; a model using only the values from the EN2 ELISA (EN2, n = 1); and a model only using NanoString gene-probe information (NanoString, n = 167). The fully integrated ExoGrail model was trained by incorporating information from all of the above variables (n = 177). Each set of variables for comparator models were independently selected via the bootstrapped Boruta feature selection process described above to select the most optimal subset of variables possible for each predictive model.
2.6. Model Construction
All models were trained via the random forest algorithm [
34], using the
randomForest package [
35] with default parameters except for resampling without replacement and 401 trees being grown per model. Risk scores from trained models are presented as the out-of-bag predictions; the aggregated outputs from decision trees within the forest where the sample in question has not been included within the resampled dataset [
34]. Bootstrap resamples were identical for feature selection and model training for all models and used the same seed for random number generator.
Models were trained on a modified continuous label, based on biopsy outcome and constructed as follows: samples were scored on a continuous scale (range: 0–1) according to the dominant Gleason pattern: where 0 represented no evidence of cancer, Gleason scores 6 & 3 + 4 were assigned to 0.5 and Gleason scores ≥ 4 + 3 are set to 1. Following this categorisation, the score is treated as a continuous variable by the Random Forest algorithm described above. This process was designed to recognise that two patients with the same TRUS-biopsy Gleason score will not share the exact same proportions of tumour pattern, or overall disease burden. This scale was solely used for model training and was not represented in any endpoint measurements, or for determining the predictive ability and clinical utility.
2.7. Statistical Evaluation of Models
Area Under the Receiver-Operator Characteristic curve (AUC) metrics were produced using the
pROC package [
36], with confidence intervals calculated via 1000 stratified bootstrap resamples. Density plots of model risk scores, and all other plots were created using the
ggplot2 package [
37]. Partial dependency plots were calculated using the
pdp package [
38]. Cumming estimation plots and calculations were produced using the
dabestr package [
39] and 1000 bootstrap resamples were used to visualise robust effect size estimates of model predictions.
Decision curve analysis (DCA) [
40] examined the potential net benefit of using PUR-signatures in the clinic. Standardised net benefit (sNB) was calculated with the
rmda package [
41] and presented throughout our decision curve analyses as it is a more directly interpretable metric compared to net benefit [
42]. In order to ensure DCA was representative of a more general population, the prevalence of Gleason scores within the ExoGrail cohort were adjusted via bootstrap resampling to match those observed in a population of 219,439 men that were in the control arm of the Cluster Randomised Trial of PSA Testing for Prostate Cancer (CAP) Trial [
43], as described in Connell et al. (2019). Briefly, of the biopsied men within this CAP cohort, 23.6% were Gs 6, 8.7% Gs 7 and 7.1% Gs ≥ 8, with 60.6% of biopsies showing no evidence of cancer. These ratios were used to perform stratified bootstrap sampling with a replacement of the Movember cohort to produce a “new” dataset of 197 samples with risk scores from each comparator model. sNB was then calculated for this resampled dataset, and the process repeated for a total of 1000 resamples with replacement. The mean sNB for each risk score and the “treat-all” options over all of the iterations were used to produce the presented figures to account for variance in resampling. Net reduction in biopsies, based on the adoption of models versus the default treatment option of undertaking biopsy in all men with PSA ≥ 4 ng/mL was calculated as:
where the decision threshold (
Threshold) is determined by accepted patient/clinician risk [
40]. For example, a clinician may accept up to a 25% perceived risk of cancer before recommending biopsy to a patient, equating to a decision threshold of 0.25.
4. Discussion
Discriminating disease status in patients before a diagnostic biopsy with higher accuracy than current standards could bring about a sizeable change in treatment pathways and reduce the number of men sent forward for ultimately unnecessary biopsy. Given that up to 75% of patients are negative for prostate cancer when presenting with serum PSA levels ≥ 4 ng/mL [
5,
43,
44], a concentration of research efforts has been made to address this problem. To date, several biomarker panels have been successfully developed to non-invasively detect prostate cancer using urine samples, Gleason ≥ 3 + 4 disease with superior accuracy to current clinically implemented methods, including the PUR model developed by ourselves [
20,
21,
45,
46]. However, as only a single aspect of urine, assay method or biological process are assessed by these examples, the heterogeneity of prostate cancer may not be entirely accounted for [
47], requiring an approach to be taken that provides a more holistic insight into disease status.
Recent analyses, including those presented here, have demonstrated the added value of integrating multiple prognostic biomarkers within the process of fitting risk models for determining patient risk upon an initial biopsy [
23,
48]. Urine clearly contains a wealth of useful information concerning the disease status of the prostate through the quantification of cf-RNA transcripts, circulating and cell-free DNA, hypermethylation of DNA, and protein biomarker levels [
19,
46,
49,
50,
51,
52].
Our results show that an improved multivariable risk prediction model can be developed from the careful consideration of information from multiple different urine fractions in men suspected to have prostate cancer. Urinary levels of EN2 protein were quantified by ELISA, whilst the transcript levels of 167 cell-free mRNAs were quantified using NanoString technology. The final model integrating information from those assays with serum PSA levels was deemed ExoGrail. Markers selected for the model include well-known genes associated with prostate cancer and proven in other diagnostic tests, such as
PCA3 [
45],
HOXC6 [
20], and the
TMPRSS2/ERG gene fusion [
53]. An interaction between urinary EN2 protein levels and quantified transcripts of
SLC12A1 was observed, further demonstrating the benefit of considering information from multiple biological sources (
Figure S4).
ExoGrail was able to accurately predict the presence of significant (Gs ≥ 7) prostate cancer on biopsy with an AUC of 0.89, comparing favourably to other published tests (AUCs for Gs ≥ 7: PUR = 0.77 [
46], ExoMeth = 0.89 [
23], ExoDX Prostate IntelliScore = 0.77 [
21], SelectMDX = 0.78 [
20], epiCaPture Gs ≥ 4 + 3 AUC = 0.73 [
49]). Furthermore, ExoGrail resulted in accurate predictions even when serum PSA levels alone proved inaccurate; patients with a raised PSA but negative biopsy result possessed ExoGrail scores significantly different from both clinically benign patients and those with low-grade Gleason 6 disease, whilst still able to discriminate between more clinically significant Gleason ≥ 7 cancers (
Figure 4). The adoption of ExoGrail into current clinical pathways for reducing unnecessary biopsies was considered, showing the potential for up to 32% of patients to safely forgo an invasive biopsy without incurring excessive risk (
Figure 6).
ExoGrail was developed with the explicit goal of being robust to potential overfitting and bias, using strong internal validation methods in bootstrap resampling and out-of-bag predictions. Nonetheless, ExoGrail was developed in a relatively small dataset and so requires external validation in an independent cohort before it can be considered for use as a clinical risk model. To this end, we are currently collecting samples from multiple sites in the UK, EU and Canada using an updated ‘At-Home’ Collection system [
54]. The At-Home collection system enables biomarker analysis to be performed on urine samples provided by patients at home, which they send in the post to a centralised laboratory. This collection and analysis system will sidestep the need for a visit to the clinic and lead to a postal screening system for prostate cancer diagnosis and prognosis. In this study, we will also assess the potential utility of supplementing MP-MRI with ExoGrail, as MP-MRI can misrepresent disease status, even with rigorous controls in place [
6]. The NanoString expression analysis system used in the ExoGrail signature is a rapid and cost-effective analysis system that is also used in the FDA-approved Prosigna Pam50 test for breast cancer aggressiveness [
55], making ExoGrail well-positioned for implementation for patient benefit.