A Machine Learning Approach to Predict an Early Biochemical Recurrence after a Radical Prostatectomy

: Background: Approximately 20% – 50% of prostate cancer patients experience biochemical recurrence (BCR) after radical prostatectomy (RP). Among them, cancer recurrence occurs in about 20% – 30%. Thus, we aim to reveal the utility of machine learning algorithms for the prediction of early BCR after RP. Methods: A total of 104 prostate cancer patients who underwent magnetic resonance imaging and RP were evaluated. Four well-known machine learning algorithms (i.e., k-nearest neighbors (KNN), multilayer perceptron (MLP), decision tree (DT), and auto-encoder) were applied to build a prediction model for early BCR using preoperative clinical and imaging and postoperative pathologic data. The sensitivity, specificity, and accuracy for detection of early BCR of each algorithm were evaluated. Area under the receiver operating characteristics (AUROC) analyses were conducted. Results: A prediction model using an auto-encoder showed the highest prediction ability of early BCR after RP using all data as input (AUC = 0.638) and only preoperative clinical and imaging data (AUC = 0.656), followed by MLP (AUC = 0.607 and 0.598), KNN (AUC = 0.596 and 0.571), and DT (AUC = 0.534 and 0.495). Conclusion: The auto-encoder-based prediction system has the potential for accurate detection of early BCR and could be useful for long-term follow-up planning in prostate cancer patients after RP.


Introduction
After radical prostatectomy for localized prostate cancer, the rate of biochemical recurrence (BCR) approaches approximately 20%-50% [1]. Among the patients with BCR, clinically evident recurrence occurs in about 20%-30% of cases [2,3]. In addition, BCR is also accepted as an indicator of postoperative progression and may also be associated with increased prostate cancer mortality [4]. The median time from BCR after radical prostatectomy to the development of metastases is eight years, and the median time from metastases to death is five years [2]. Hence, awareness of the probability of BCR following curative therapy is important for guiding treatment pathways, longterm follow-up planning, and patient counseling.
Several studies were conducted for developing risk prediction scoring systems with more accurate risk stratification of prostate cancer. Tumor characteristics such as grade, extent, and prostate-specific antigen (PSA) level have revealed independent factors for prediction of an oncologic outcome. Besides, magnetic resonance imaging (MRI) has proven to be effective for the detection, treatment, exploration, and follow-up examination of prostate cancer [5]. Several studies reported that quantitative apparent diffusion coefficient (ADC) data and the degree of tumor visibility on multi-parametric MRIs play a role in predicting BCR following radical prostatectomy (RP) [6][7][8][9] or radiation therapy [10,11]. The Prostate Imaging Reporting and Data System (PI-RADS) version 2, which was released in 2012, also showed promising results in terms of the risk stratification for predicting BCR [12].
Thus, the purpose of this study is to reveal the utility of machine learning algorithms for developing prediction models, facilitating postoperative follow-up and aiding in the clinical management of patients with prostate cancer after RP.

Patients
Our Institutional Review Board approved this retrospective study and waived the requirement for informed consent. Between January 2017 and January 2019, 178 patients were diagnosed with prostate cancer through transrectal prostate biopsies. Of these patients, we identified 118 patients who underwent a prostate MRI and a radical prostatectomy at our institution. After excluding 14 patients who met the following exclusion criteria: (a) A history of preoperative hormone therapy (n = 9), (b) inadequate imaging quality for assessment (n = 5); 104 patients were finally included in our study. Early BCR was defined as having a postoperative PSA level >0.2 ng/ml on two separate occasions, or patients who had to undergo salvage therapy at their one-year follow-up [13,14].

Acquisition of MR Images
Prostate MRI examinations were performed using 3-T MRI scanners (Skyra, Siemens Healthcare) with pelvic phased-array coils. An intramuscular injection of 1 mg glucagon (Buscopan, Samil Pharmacy) was given as an antispasmodic agent. The MRI protocol included axial, sagittal, and coronal T2-weighted image , axial T1-weighted imaging, single-shot echo-planar diffusion weighted image, and dynamic contrast-enhanced imaging . ADC maps were derived from the DWI with bvalues of 0, 500, and 1000 s/mm 2 , and a calculated high b-value image set was reconstructed at a bvalue of 1500 s/mm 2 .

Clinical, imaging, and surgical parameters
For preoperative parameters, clinical data consisting of age, initial PSA, biopsy Gleason score (GS), the greatest percentage of biopsy cores, and the number of positive re-cores were collected using an electronic medical record (EMR) search, and MRI images were interpreted by a radiologist with greater than seven years of experience who was unaware of clinical and surgical information. Based on the scoring method of PI-RADSv2, the score was recorded per patient using the five-point scale. The probability of clinically significant prostate cancer in a particular patient according to PI-RADS score was defined as follows: 1, very low (clinically significant cancer is highly unlikely to be present); 2, low (clinically significant cancer is unlikely to be present); 3, intermediate (the presence of clinically significant cancer is equivocal); 4, high (clinically significant cancer is likely to be present); and 5, very high (clinically significant cancer is highly likely to be present). We also evaluated extracapsular extension (ECE) and seminal vesicle invasion (SVI) on MRIs using the following 3-point scale scoring system: 1, definitely negative; 2, probably positive; 3, definitely positive.

Statistical analysis and machine learning model development
To compare the age of patients, initial PSA, greatest percentage of biopsy core, number of positive biopsy cores, and tumor volume between BCR and non-BCR groups, the Mann-Whitney test was used. To compare the biopsy and surgical GS, PI-RADS score, probability of ECE and SVI on MRI, and pathologic results such as PSM, LVI, PNI, ECE, and SVI between the two groups, the Fisher-exact test was conducted. These statistical analyses were performed using SPSS (version 21.0; SPSS, Inc., Chicago, IL, USA). A P-value < 0.05 was considered statistically significant.
In addition, in order to build an early-prediction BCR model, we applied several well-known machine learning algorithms, k-nearest neighbors (KNN), multi-layer perceptron (MLP), decision tree (DT), and auto-encoder [15]. These machine learning algorithms provide accuracy, sensitivity, specificity, and area under receiver operating characteristic curve (AUROC). We developed an early BCR prediction model using Matlab 2018b with the statistical machine learning toolbox and the deep learning toolbox.

Prediction of early BCR using classification methods
In order to predict patients with early BCR after RP, supervised learning algorithms, which mean classification methods, are used in a previous study of Lee et al. [16]. Supervised learning algorithms analyze the characteristics of patients in two groups statistically, and based on these statistical differences, it can provide a prediction model that classifies patients with prostate cancer that are likely to experience BCR after RP. In this paper, we use KNN and DT to build a prediction model. In KNN, we apply 3-NN with a Euclidean distance measure, and 4 layers with 20 hidden neurons MLP, and DT with gain index (GINI) are applied.
To check the robustness of prediction performance, we perform cross-validation. The data that are composed of 80 non-BCR patients and 24 BCR patients are divided into 10 groups. Eight randomly selected groups are used for training and the other two groups are used for verification. A total of 10 such trainings and verifications are conducted to verify performance.

Prediction of early BCR using auto-encoder methods
As explained in Section 2.4.1, prediction of early BCR can be done by supervised learning with classification capabilities [16]. However, in some special cases, supervised learning may show its limitations. One example, such as our BCR data set, is an imbalanced data set in which data are concentrated in only certain classes, whereas other classes are rare. In the case of an unbalanced set of data, as input data are more likely to be classified into classes of training data on many sides, the classification accuracy of classes with a small number of training data becomes worse. Namely, it could provide a biased prediction model with good classification performance of a specific class and a huge imbalance sensitivity and specificity. In our case, the ratio of early BCR patients and non-BCR patients is about 8 to 2, which is a somewhat imbalanced data set. This may lead to imbalanced diagnostic performance if we use a general supervised learning method. In order to overcome the imbalanced prediction performance, we apply an auto-encoder for identification of early BCR patients with prostate cancer after RP.
An auto-encoder is one of the famous deep unsupervised learning methods [15]. As you can see in Figure 1, it has two parts, the encoder and the decoder. The role of the encoder part compresses the input data information to extract the essential features, and the number of neurons in the encoder part gradually decreases. In addition, the decoder part restores compressed data through an increasing number of neurons. The main uses of the auto-encoder are noise reduction and identification. 1) As data are compressed in the encoder part, meaningless information in the original data such as noise are eliminated. In this point of view, the auto-encoder can be used for noise reduction, such as denoising of medical imaging [ref]. 2) The final output of the auto-encoder is trained to be the original input data, so if the test input differs from the training data set, the automatic encoder cannot recover the original input data well. That means we can find abnormal data that differ from training data, and we can then use the auto-encoder as identification with abnormal detection. For example, like in this paper, we first train the auto-encoder using only one class of data that is easy to acquire [ref]. When new data is put into the trained auto-encoder, if the result data are similar to the original data, it can be determined to be the same class as that used in the auto-encoder for training, and if not similar, it can be determined to be a different class of training data. Namely, it is possible to distinguish between the same and different data used in training, and we call this method identification. In this paper, we use an auto-encoder as a perspective of identification. For the classification by an auto-encoder to predict early BCR, when data of the same class used for training are entered, it can be restored to the same value as the original data, but when data not used for training are entered, it is restored to a different value from the input data. Thus, we make an auto-encoder with a non-BCR patient data set. When a new input is applied to an auto-encoder that has completed training, if the non-BCR patient data are entered, the output is almost the same as the original input data, but if the early BCR data are entered, the output has different values from the original input data. Based on input-output difference, we can classify the early BCR and non-BCR patients.
Therefore, we can classify the data based on the difference between the input and output as below.
As this method has the advantage of using only one group for training, we can achieve unbiased performance.
The details of our auto-encoder are as follows: First, we train three layers of the auto-encoder, input layer, hidden layer, and output layer. The number of neurons in the input and output layer are the same, and seven neurons are used in the hidden layer. The data are divided into the training and test data sets, where 80% of the non-BCR patient data are for training, and the remaining data of the non-BCR patient data and the whole early BCR patient data are for verification.

Patient demographics and distribution
The characteristics of all patients are described in Table 1. We identified 20 patients with early BCR at one year after their RP. The initial PSA, greatest percentage of biopsy cores, number of positive biopsy cores, biopsy GS, and tumor volume were significantly higher in the early BCR group than in the non-BCR group (all p < 0.05). PSM, LVI, and PNI were more frequently observed in the early BCR group (0.027, 0.020, and 0.016, respectively). Early BCR rates were significantly different according to

Prediction models
We developed prediction models using four machine learning algorithms to identify patients with early BCR after RP. The predictive performance and AUROC of each algorithm are presented in Table 2 and Figures 2 and 3. Utilizing all parameters, the auto-encoder algorithm showed the highest predictive ability for early BCR with AUC values of 0.638, followed by MLP (AUC = 0.607), KNN (AUC = 0.596), and DT (AUC = 0.534). Utilizing only pre-operative clinical and imaging parameters, the auto-encoder method also showed the highest predictive ability among the four predictive algorithms with AUC values of 0.656, an even higher AUC than using all parameters.

Discussion
The present study found that preoperative parameters including initial PSA, biopsy GS, greatest percentage of biopsy core, number of positive biopsy cores, MRI finding of PIRADS score, probability of ECE, and postoperative pathologic parameters including tumor volume, PSM, LVI, and PNI were significantly associated with early BCR. Utilizing not only all these variables but also preoperative clinical and imaging variables, the prediction algorithm based on an auto-encoder showed a higher prediction ability of early BCR after RP than those of other algorithms, with an AUC of 0.638 with all parameters, and 0.656 with preoperative parameters.
Prior studies suggested several prediction models using various combinations of variables including initial PSA, greatest percentage of biopsy cores, number of positive biopsy cores, GS, and cancer stage for successful risk stratification, as well as oncologic outcomes including metastatic progression and prostate cancer-specific mortality following treatment [1,[16][17][18][19][20][21][22]. In our study, variables related to early BCR were in accordance with those in prior studies.
We defined early BCR as postoperative PSA levels greater than 0.2 ng/ml occurring on two separate occasions, or patients who had to undergo salvage therapy at their one-year follow-up. Notably, BCR does not necessarily result in clinical recurrence or cancer-specific mortality due to the indolent nature of prostate cancer and the remaining benign prostatic tissue after RP [14]. Nevertheless, several groups have investigated prognostic factors among men with BCR after RP and revealed that the time to BCR was one of the predictors of prostate cancer-related mortality [20,23].
Prior studies demonstrated the importance of information obtained from the time of diagnosis [17,19,21]. They focused on the PSA level at the time of diagnosis and biopsy results. In our study, an auto-encoder-based prediction model using only the preoperative parameters as input data shows moderate diagnostic performance, even higher than when all parameters are used as input data. The Gleason grading system consistently shows the prognostic impact in patients with prostate cancer [24]. In particular, we identified that the higher the biopsy Gleason grading, the more often the early BCR occurred, albeit not surgical Gleason grading. Several studies have investigated the association between initial biopsy characteristics and postoperative pathological features or oncologic outcomes. However, results from various series have been mixed and inconsistent [25][26][27]. This may be due to operator dependency, although a transracial ultrasound-guided prostate biopsy is the standard procedure for diagnosing prostate cancer and is a well-established standard method [28]. Nevertheless, the higher the greatest percentage of biopsy cores or number of positive cores, the more frequently early BCR occurred in our population.
Interestingly, while SVI, and ECE, representing pathologic T staging, were not different between early BCR and non-BCR patients, the proportion of definite or probable findings of SVI and ECE on MRI was significantly higher in early BCR patients (13/20 vs. 19/84; p < 0.001). Moreover, PIRADS scores differed between groups. Only one patient with PIRADS ≤ 3 showed early BCR after RP. The PI-RADS released in 2012 by the European Society of Urogenital Radiology [29], updated to v2 in 2015, and updated to v2.1 in 2019 [30,31], has become a standardized tool for the early detection of clinically significant prostate cancer in patients with good diagnostic performance. Reportedly, both diffusion restriction and enhancement patterns could reflect tumor aggressiveness in prostate cancer; the lower the ADC or the earlier the enhancement was observed, the more aggressive tumors were found pathologically [32,33]. The PIRADS system, incorporated with DWI and dynamic enhanced imaging, may have the potential to serve as a tumor biomarker, and in fact, the PIRADS score shows a linear correlation with GS [34]. In conjunction with our results, PI-RADS may serve as an imaging biomarker for BCR following RP, as well as a scale for tumor visibility. Given the moderate predictive ability with an AUC of 0.656 for early BCR in using only the preoperative parameters as input data, our prediction algorithm may be helpful for preparing postoperative follow-up planning and aiding in the clinical management of patients with prostate cancer after RP in advance.
Machine learning is the semi-automated extraction of knowledge and insight from data, allowing the training of algorithms that can discover and identify complex patterns and relationships within various parameters. In medicine, developed algorithms can be directly applied to patient care to improve the accuracy of predicting diseases and subsequent outcomes [35]. Recently, machine learning has been used to predict BCR using clinical and pathologic data and showed high predictive performance [16,35]. Our models also showed promising results in terms of predicting early BCR using not only clinical and pathologic data but also imaging data. Although the diagnostic performance of our prediction model was lower than those of previous studies, it may result from a small population. However, the prediction model using our auto-encoder method showed a higher AUC as compared to MLP, KNN, and DT. Even in some cases, such as DT in the postoperativeparameters-only case, predictive performance could be found to be poorer than random prediction.
The reason for this difference in performance is that the dataset we use is small and unbalanced. The general classification algorithm for the build of the predictive model conducts a training process to reduce the prediction error of the data used in the training. That means, in an unbalanced data set, training to predict all data as a class of large amounts of data tends to increase in terms of overall accuracy, rather than training to predict both classes. However, as a small number of data are incorrectly predicted, the precision and recall values are unbalanced, and the AUC value would be poor.
In order to overcome a drawback of the classification algorithm, the under-sampling method is generally used. The under-sampling method randomly extracts data from large numbers of classes, matches the number of data from small numbers of classes, and then trains based on this data-set. In our study, if we want to show the good AUC value using the general prediction model, we have to use under-sampling when considering our small number of BCR patients, according to the general methodology that the number of two groups for training should be similar. However, if only 48 patients enrolled for training fit the 24 early BCR patients by under-sampling, it would be too small for a training prediction model. Or, if we do not adjust the number of the training dataset in our study, sensitivity and specificity would show very unbalanced performance and a subsequent poor AUC value. However, although our dataset is imbalanced, unlike classical classification methods, the auto-encoder method can utilize most of our dataset without performance degradation in training. Thus, our method performed well with a small set of unbalanced data, as compared to conventional classification methods.
Our study has several limitations. First, this study is a retrospective, single-institution investigation, and thus carries potential selection bias. Furthermore, the models were generated from information obtained from a small population. As a rule of thumb, at least 10 outcomes for each independent variable are required for a logistic regression or proportional hazards model [36,37]. Nevertheless, we generated an acceptable prediction model using an auto-encoder that shows potential of dealing with small datasets. However, further experiments are still needed to compare the performance difference between the auto-encoder and traditional classification approach prediction models with a large population. Finally, we focused on the early detection of BCR, which could be a surrogate for prostate cancer-specific mortality instead of analysis of long-term results such as three or five-year BCR-free survival. Thus, further studies with long-term follow-up are needed.

Conclusions
In this study, we developed a prediction model utilizing a machine learning algorithm for prediction of early BCR after RP in patients with prostate cancer. Of machine learning algorithms, the auto-encoder shows moderate diagnostic performance for prediction of early BCR and potential in handling small datasets. Utilizing a prediction model using a machine learning algorithm, we can be aware of potential early BCR groups even at the time of diagnosis of prostate cancer and facilitate postoperative follow-up and aid in the clinical management of patients with prostate cancer after radical prostatectomy. In the future, we will conduct additional experiments to compare the performance difference between the auto-encoder and traditional classification algorithms models with a large population.