The Application of Artiﬁcial Intelligence in Prostate Cancer Management—What Improvements Can Be Expected? A Systematic Review

: Artiﬁcial Intelligence (AI) is progressively remodeling our daily life. A large amount of information from “big data” now enables machines to perform predictions and improve our healthcare system. AI has the potential to reshape prostate cancer (PCa) management thanks to growing applications in the ﬁeld. The purpose of this review is to provide a global overview of AI in PCa for urologists, pathologists, radiotherapists, and oncologists to consider future changes in their daily practice. A systematic review was performed, based on PubMed MEDLINE, Google Scholar, and DBLP databases for original studies published in English from January 2009 to January 2019 relevant to PCa, AI, Machine Learning, Artiﬁcial Neural Networks, Convolutional Neural Networks, and Natural-Language Processing. Only articles with full text accessible were considered. A total of 1008 articles were reviewed, and 48 articles were included. AI has potential applications in all ﬁelds of PCa management: analysis of genetic predispositions, diagnosis in imaging, and pathology to detect PCa or to di ﬀ erentiate between signiﬁcant and non-signiﬁcant PCa. AI also applies to PCa treatment, whether surgical intervention or radiotherapy, skills training, or assessment, to improve treatment modalities and outcome prediction. AI in PCa management has the potential to provide

There is a subtype of ANN that showed its effectiveness for segmenting, classifying, and recognizing complex patterns (e.g., animals, objects, cells) on digitized images that is called Convolutional Neural Network (CNN) [10]. Identification of the prostate on MRI images [11] or classification of skin cancer from digital images are examples of CNNs abilities [12]. Over the pas decade, a growing amount of research in the field of AI has been applied to prostate cancer (PCa). A has been proposed to help clinicians in the management of PCa and it may be difficult to have a global view of the actual situation [13][14][15].

Search Strategy
The inclusion criteria for this systematic review were published full articles, written in English concerning humans, and related to AI and PCa. Non-original research (review, meta-analysis editorial, conference abstract, response, or opinion to the editor), ex vivo studies, and articles published before 2009 were excluded. Two authors (R.T. et K.K.) independently reviewed the titles and abstracts. A third review (by R.M.) and a full-text screening were required in case of differen findings and discrepancies during this step. Additional informative publications were selected by cross-referencing the bibliography of previously selected articles. All the selected studies then underwent a full-text screening. Agreement was completed by consensus between the two reviewers In accordance with PRISMA guidelines [16], Figure 2 summarizes our articles selection process. This study was registered in the International Protective Register of Systematic Reviews (PROSPERO, ID 158454). Information on first author, journal, publication year, AI method, size of the population, aim of the study, and prediction accuracy was collected. There is a subtype of ANN that showed its effectiveness for segmenting, classifying, and recognizing complex patterns (e.g., animals, objects, cells) on digitized images that is called Convolutional Neural Network (CNN) [10]. Identification of the prostate on MRI images [11] or classification of skin cancer from digital images are examples of CNNs abilities [12]. Over the past decade, a growing amount of research in the field of AI has been applied to prostate cancer (PCa). AI has been proposed to help clinicians in the management of PCa and it may be difficult to have a global view of the actual situation [13][14][15].

Search Strategy
The inclusion criteria for this systematic review were published full articles, written in English, concerning humans, and related to AI and PCa. Non-original research (review, meta-analysis, editorial, conference abstract, response, or opinion to the editor), ex vivo studies, and articles published before 2009 were excluded. Two authors (R.T. et K.K.) independently reviewed the titles and abstracts. A third review (by R.M.) and a full-text screening were required in case of different findings and discrepancies during this step. Additional informative publications were selected by cross-referencing the bibliography of previously selected articles. All the selected studies then underwent a full-text screening. Agreement was completed by consensus between the two reviewers. In accordance with PRISMA guidelines [16], Figure 2 summarizes our articles selection process. This study was registered in the International Protective Register of Systematic Reviews (PROSPERO, ID 158454). Information on first author, journal, publication year, AI method, size of the population, aim of the study, and prediction accuracy was collected.

Diagnosis
PCa diagnosis is based on the pathological evaluation of prostate biopsies. Historical factors that guide the decision to propose biopsies are blood levels of Prostate Specific Antigen (PSA), age of the patient, co-morbidities, and presentation of the prostate at digital rectal examination. The use of PSA remains controversial because of its lack of specificity. Therefore, new serum, urine, and tissue-based biomarkers have emerged but are not yet widespread in daily practice [17]. Family history of PCa and genetic predisposition also plays an important role in the decision to propose PCa early detection. Magnetic Resonance Imaging (MRI) is now part of the PCa diagnosis pathway and is performed before biopsies. However, despite the constant improvement of its abilities to detect PCa, MRI remains a dependent operator and a time-consuming assessment. The same applies to pathology at the time of biopsy interpretation. Therefore, AI may play a crucial role at several steps of the diagnostic pathway.

Genomics
In an era of personalized medicine, genomics is one of the most promising paths. With "DEPTH", a ML algorithm, MacInnis et al. [18] identified several PCa risk-associated regions through 541,129 germline Single Nucleotide Polymorphism (SNP) from 1854 PCa patients and 1894 controls. Only 14 of them were already known, and 112 novel putative susceptibility areas were proposed. Besides the identification of genomic regions of PCa susceptibility, AI also permitted the discovery of new genes involved in PCa. Hou et al. [19] collected public data of 466 PCa patients from the Gene Expression Omnibus and The Cancer Genome Atlas dataset and proposed a genetic algorithm with ANN to establish a diagnostic and prognostic prediction model. The authors identified C1QTNF3 as

Diagnosis
PCa diagnosis is based on the pathological evaluation of prostate biopsies. Historical factors that guide the decision to propose biopsies are blood levels of Prostate Specific Antigen (PSA), age of the patient, co-morbidities, and presentation of the prostate at digital rectal examination. The use of PSA remains controversial because of its lack of specificity. Therefore, new serum, urine, and tissue-based biomarkers have emerged but are not yet widespread in daily practice [17]. Family history of PCa and genetic predisposition also plays an important role in the decision to propose PCa early detection. Magnetic Resonance Imaging (MRI) is now part of the PCa diagnosis pathway and is performed before biopsies. However, despite the constant improvement of its abilities to detect PCa, MRI remains a dependent operator and a time-consuming assessment. The same applies to pathology at the time of biopsy interpretation. Therefore, AI may play a crucial role at several steps of the diagnostic pathway.

Genomics
In an era of personalized medicine, genomics is one of the most promising paths. With "DEPTH", a ML algorithm, MacInnis et al. [18] identified several PCa risk-associated regions through 541,129 germline Single Nucleotide Polymorphism (SNP) from 1854 PCa patients and 1894 controls. Only 14 of them were already known, and 112 novel putative susceptibility areas were proposed. Besides the identification of genomic regions of PCa susceptibility, AI also permitted the discovery of new genes involved in PCa. Hou et al. [19] collected public data of 466 PCa patients from the Gene Expression Omnibus and The Cancer Genome Atlas dataset and proposed a genetic algorithm with ANN to establish a diagnostic and prognostic prediction model. The authors identified C1QTNF3 as a good predictor for PCa diagnosis (GSE6956: Area Under the Curve (AUC) = 0.791; GSE8218: AUC = 0.868; GSE26910: AUC = 0.972) and established a 15-gene signature. The prediction model based on this 15-gene signature also showed an AUC of 0.953 for PCa diagnosis and an AUC of 0.808 for prognosis.

Imaging and Radiomics
Radiomics is a method that extracts many features from medical imaging using datacharacterization algorithms, with the potential to reveal disease characteristics that are impossible to appreciate by the naked-eye. Many studies assessed the potential benefits of AI in PCa imaging, especially with multi-parametric MRI. These studies are summarized in Table 1. The two main outcomes regarding imaging PCa diagnosis that addressed these studies are detection of malignant lesion and differentiation of significant and non-significant cancers.
From MRI images, ML software can be trained with labeled, anatomical, or morphological data. Ishioka et al. [20] demonstrated that CNN algorithms can automatically diagnose PCa by analyzing thousands of MR images in a matter of seconds. From the prostatic MR imaging, several authors [21][22][23][24] have shown that the use of SVM models allows a PCa diagnosis with accuracies that ranged from 0.85 to 0.91. With another AI approach, two studies [25,26] used CNN algorithms to bring complementary results which echo those from an SVM approach, with similar performance. Matulewicz et al. [27] showed that anatomical differentiation between prostatic areas (peripheral zone vs central zone) could significantly increase the classification capabilities of ANN models, highlighting the important role of anatomic segmentation. The authors used two ANN algorithms, one trained with information on zonal anatomy of the prostate and the other without to demonstrate that prostate anatomic segmentation improve the performance of PCa diagnosis significantly (96.8% vs. 94.9%; p = 0.03).
MRI provides a large amount of quantitative data through textural, morphological analysis. These data are available for analysis after the definition of a region of interest, a segmentation, and an extraction. The most frequently used descriptors are the average and median of grey levels, variance, skewness, or kurtosis. Recent studies included radiomics based on the integration of quantitative and texture features analysis, to help PCa detection on imaging. The work of Zhao et al. [28] led to the identification of radiomic features significantly associated with PCa in PZ. Betrouni et al. [29] showed the incorporation of these radiomic features may be a useful tool in PCa diagnosis if the lesion was more than 0.5cc. Nevertheless, some studies did not report any benefit from ML model trained with radiomic parameters. Bonekamp et al. [30] found that radiomic machine learning had similar but not superior performance than mean Apparent Diffusion Coefficient (ADC) assessment in distinguishing malignant from benign prostate lesion. Considering all these different approaches, Viswanath et al. [31] empirically compared different machine classifiers. Within the framework of multicentric data from different MRI, with different acquisition protocols or resolution, boosted Quadratic Discriminant Analysis (QDA) was the more robust and efficient among the models tested for PCa detection, even if it was not the more accurate.
Several studies also reported the ability of AI to help radiologists distinguish significant lesions. Le et al. [32] used CNN architecture to reach this objective with 91.5% accuracy. Li et al. [33] trained SMV algorithms with MRI variables to classify NS (non-significant) PCa and CS (clinically significant) PCa into two groups in the central gland (CG). Wang et al. [34] demonstrated that adjunction of radiomic features in a ML algorithm could significantly improve (AUC = 98% vs. 88%) the performance of Prostate Imaging-Reporting and Data System (PI-RADS), a score related to clinically significant cancer [35]. In addition, a ML model trained with radiomic features was better than PI-RADS to distinguish transitional zone (TZ) from PCa tumor. Fehr et al. [36] also used a trained ML algorithm with radiomic features to differentiate among NS PCa and CS PCa. The performance of the created model was significantly enhanced by combining T2-W and ADC MRI-based texture features, whether the tumor was in the PZ or in the TZ. Similarly, Min et al. [37] found consistent results by combining T2-W, ADC, and DWI MRI-based texture features.  The mean sensitivity and specificity were 65% (0-100%) and 81% (50-100%), respectively. The poorer scores were for lesions with volumes less than 0.5 cc (value actually use to define CS PCa on MR images). With only tumors with a significant size, the mean sensitivity and specificity grew to 70% and 88%, respectively.   [38]. Regarding detection of PCa metastases, Koizumi et al. reported the ability of an ANN algorithm, to identify skeletal metastases with an accuracy of 80% on bone scintigraphy. However, regarding the detection of non-skeletal metastasis, the performance of the model decreased to 22%, [39]. After treatment, metastatic lesions stand sclerotic and it can be difficult to identify recurrence. Acar et al. [40] trained three ML algorithms with radiomic features based on PMSA PET and CT scanner images to help clinicians distinguish active lesions among sclerotic lesions after treatment. This study identified important textural parameters to allow such detection reaching an accuracy of 76%.

Biopsies
In 2011, Lawrentschuk et al. [41] compared the performance of ANN, trained with clinical and pathological data from 3025 patients, and Logistic Regression (LR) to predict Trans Rectal Ultrasound (TRUS) biopsy outcomes. No difference was observed between both models (55 vs. 57%, respectively), to discriminate histological groups (LR-ANN respectively): benign outcomes were correctly identified (86-88%) and CS PCa was well classified in 65-66%. However, more recently, an ANN outperformed a model based on LR to predict PCa at biopsies [42]. Using this model, clinicians may avoid biopsies for one in two patients. However, the ANN missed 16% of all PCa of which 6% were clinically significant.
ANN has also wide applications in Natural-Language Processing (NLP), to extract data from reports. Kim et al. [43] have shown how this technological breakthrough could be useful with pathological reports. In their study, using 100 RARP pathology reports, extraction of pathological or staging information was feasible and reached an accuracy of nearly 100% for all characteristics: histologic subtype (99.0%), perineural invasion (98.9%), TNM stage (98.0%), surgical margin (97.0%), and dominant tumor size (95.7%). The execution time for NLP was under 1 s, while manual reviewers took more than 3 min per report.

Pathology
Several studies assessed the ability of AI to help pathologists in PCa identification and evaluation, with a gain in time and reduction of inter-observer variability ( Table 2). Nir et al. [44] showed that a ML algorithm reported good accuracy in diagnosing (90.5%) or classifying (79.2%) PCa compared to pathologists. Working speed of the machine allowed the analysis of hundreds of slides in a few hours. With different AI models, [45][46][47] found identical results with a good concordance between pathologist and a CNN model, with high accuracy, sensibility, and specificity in diagnosing and differentiating low and high-grade PCa. Arvaniti et al. [48] provided additional information. Once again, kappa score between CNN and pathologists was identical to that between pathologists. However, interestingly, when there was a disagreement with pathologists, the stratification made by the CNN model improved prediction regarding disease-specific survival outcomes. Kim et al. [49] compared the pre-operative prediction of advanced PCa by using and comparing an ANN algorithm and a SVM model. Both have good results in predicting the pre-operative probability for > pT3a PCa, but SVM slightly outperformed ANN in this study. Partin tables are used to estimate the distribution of the risk of pathological stages. The Partin tables use clinical features, such as Gleason score, PSA, and clinical stage, to predict whether the tumor will be confined to the organ. Tsao et al. [50] compared the performance of these nomogram tables to an ANN model. Their ANN algorithm, STATISTICA, trained with total serum PSA, TNM stage, and biopsy Gleason score, had a significantly higher efficiency than Partin tables. Two clinical variables were significantly associated with a non-organ-confined disease: PSA and BMI. Wang et al. [51] optimized Partin tables by combining conventional parameters with those from a SMV model. This model significantly outperformed conventional Partin tables for organ-confined PCa (AUC = 0.891 vs. 0.730). SMV significantly outperformed ANN with an AUC of 0.805 and 0.719, respectively (p = 0.020). Pre-operative probability for > pT3a: for SMW, the accuracy was 77%, and the sensitivity and specificity were respectively 67% and 79%. For ANN, the sensitivity, specificity, and accuracy were 63%, 81%, and 78%, respectively.

Treatment
Similar to its development in PCa diagnosis, AI may also play an important role in the treatment pathway (Table 3). After PCa diagnosis, AI can help patients and physicians in treatment decisions by predicting oncological outcomes, complications, or biochemical recurrence. It also supports the practitioner in the improvement and optimization of his practices.

Decision Making
Auffenberg et al. [52] used a ML algorithm, trained with clinical and pathological variables such as age, medical history, PSA, Gleason scores, or number of positive cores, to guide patients in their choice of treatment and to help them understand their disease. This application also allowed urologists to guide their patients towards treatment or further examination by calculating the probability of organ-confined disease with a good accuracy (81%).

Surgery
In the field of surgery, ML algorithms can be used to improve surgical capacities. Ukimura et al. [53] used Virtual Reality 3D surgical navigation to perform robot-assisted radical prostatectomy (RARP) among 10 patients. This technology allowed the surgeon to identify anatomic structures while performing surgery. In three works, Hung et al. [54][55][56] developed an objective method to assess surgical performance, and used parameters from this method as training data for ML algorithms. In their first work [54], the authors created "dVLogger" to record automated performance metrics (APM) for expert and novice urological surgeons. Experts significantly outperformed novices for all criteria of bimanual dexterity and most of the movements efficiency criteria. The authors illustrated this higher efficiency with a 3D instrument-path tracing that demonstrates lesser instrument-path length with higher velocity on dominant and non-dominant instrument in the group of experts. Based on these APM data, a ML model was built to predict clinical outcomes with over 85% accuracy, such as length of hospital stay (LoS) and Foley catheter duration [55] or urinary continence [56], after RARP. Ranasinghe et al. [57] used DeepSurv, a NLP algorithm, to compare RARP and open radical prostatectomy (ORP) outcomes from individualized patient-reported information. Based on this model, the authors reported more negative emotions until 3 months after ORP due to discomfort and pain than patients who underwent RARP. The anxiety of PSA failure and sexual side effects were also higher at 9 months with ORP.

Radiotherapy
As in surgery, radiotherapists used AI algorithms to optimize their treatment. Thus, Nicolae et al. [58] demonstrated that compared to brachytherapists, ML helped to reduce radiotherapy planning time from a mean time of 18 min to shorter than 1 min with no significant difference regarding irradiated prostate volume. Kajikawa et al. [59] built two CNN algorithms for optimization radiotherapy planning. With a CNN model trained with anatomical structure label dataset, the algorithm led to more contrasting results, with 70% accuracy for predicting the dosimetric eligibility of patients treated with intensity-modulated radiation therapy (IMRT).
Finally, some authors aimed at predicting patient susceptibility to side effects after radiotherapy by associating AI with genomics. Based on genome-wide association studies (GWAS), Lee et al. [60] and Oh et al. [61] tried to predict lower urinary tract symptoms (LUTS) and rectal bleeding or erectile dysfunction after radiation therapy, respectively. Both ML algorithms reached 70% accuracy. Pella et al. [62] compared SVM and ANN outcomes prediction after conformal radiotherapy. Both demonstrated similar prediction performance of 70% for gastro-intestinal and genito-urinary toxicity after radiotherapy.  NLP algorithm (PRIME-2) 5157 RARP and 579 ORP Pre-and post-operative clinical data Surgeon experience and erectile function preservation (p < 0.01) were important factors in treatment choice. There were no significant differences in urinary, sexual, or bowel symptoms between RARP and ORP during the 12-month follow-up period. Emotions expressed by patients who underwent RARP were more positive while ORP expressed more negative emotions immediately and 3 months post-surgery (p < 0.05), due to pain and discomfort, and during 9 months due to fear and anxiety of pending PSA tests and sexual side effects. For the ANN model with structure labels, the accuracy was 70%, with sensitivity and specificity of 94.6% and 31% respectively. The ANN with planning CT had an accuracy of 56.7%, with 70% and 11.3% of sensitivity and specificity, respectively. These models had moderate performance to predict dosimetric eligibility. The predictive accuracy of the ML model differed across the urinary symptoms. Only for the weak stream endpoint did it achieve a significant AUC of 0.70 (95% CI 0.54-0.86; p = 0.01). 7 interconnected proteins were highlighted by gene ontology analysis and were already known to be associated with LUTS.  For all patient died from metastatic PCa, except for 2, the level of pain increased drastically in the last 2 years of life. Severe pain was associated with opioids prescription (OR = 6.6, p < 0.0001) and palliative radiation (OR = 3.4, p = 0.0002). 5 factors were significantly associated with severe pain: receipt of chemotherapy, opioids, or palliative radiotherapy, being in the last year of life and the number of medical appointments. The 5 African American patients clustered at the high end of the pain index spectrum, but non-significant.   To predict patient with biochemical recurrence at 1 year, the 3 models were K-nearest neighbor, random forest tree, and logistic regression with an accuracy prediction scores of 0.976, 0.953 and 0.976, respectively. All 3 ML models were better than conventional statistical regression model AUC 0.865, vs. 0.903, 0.924 and 0.940, respectively, to predict early biochemical recurrence after RARP.

Radiomic features
For GS prediction, T2-w radiomic models, with a mean AUC of 0.739 had better efficiency. To predict stages, ADC models were more effective with an AUC of 0.675.
For treatment response after IMRT, 22 radiomic features were strongly correlated, with a wide range achievement from 0.55 to 0.78.
AI artificial intelligence, ML machine learning, PCa prostate cancer, ADT androgen deprivation therapy, AUC area under the curve, RARP robot-assisted radical prostatectomy, SVM support vector machine, LR logistic regression, RP radical prostatectomy, RT radiotherapy, APM automated performance metrics, LoS length of stay, NLP natural-language processing, LDR low dose rate, IMRT intensity-modulated radiotherapy, SNP single nucleotide polymorphism, GWAS genome-wide association studies, LUTS lower urinary tract symptoms, BCR biochemical recurrence, GS Gleason score, ANN artificial neural network.

Medication
Support by the NLP method, Heintzelman et al. [63] extracted clinical data related to pain and related drug prescription from medical reports of metastatic PCa patients. A correlation was observed between a significant increase in pain in the last two years of life, and a rising number of medical appointments, rising opioid prescription, and palliative radiotherapy.

Oncological Outcomes
Zhang et al. [64] compared the ability of an SVM model and LR analysis to predict PCa biochemical recurrence (BCR). Trained with clinicopathological data, SVM had a significantly higher accuracy. The model increases in performance when implemented with MRI-derived variables. However, the only significant imaging predictor of BCR was ADC. Expectedly, positive surgical margin, surgical non-OCD PCa, and GS were clinicopathological predictors of time to failure. More recently, Wong et al. [65], using training three ML models, also pointed out that ML techniques could outperform traditional Cox regression analysis. AUC of the ML models could reach more than 0.95, to predict early BCR after RARP.
Radiotherapists also used an AI algorithm to predict treatment response or outcomes. In 2019, Abdollahi et al. [66] built a ML model trained with several radiomic features extracted from MRI data before and after intensity-modulated radiotherapy (IMRT). The model could predict the early treatment response with credible performance, up to 78%, after IMRT based on pre-treatment MRI.

Discussion
AI is on the rise and seems to hold a predominant place in our daily lives. There is presently a competitive battle for innovation, involving countries and private companies in this field. We hereby confirm the extensive work regarding the development of AI in PCa. We demonstrated AI now applies to almost all aspects of PCa management.
We are now entering an era of personalized healthcare. The "one treatment fits for all" is no longer considered to be appropriate. This is largely due to the progress in genomics. Genomics can predict genetic predispositions to a pathology, such as diabetes or cancer [67,68]. The characterization of the individual genome is becoming more accessible, but also generates a massive amount of data. Therefore, ML is a resource perfectly adapted for the use of genomics [69,70]. AI has already helped discoveries in molecular [71] and genetic medicine. With an unsupervised ML algorithm, Theofilatos et al. had promising results in the prediction of over 5000 protein complexes and gene ontology function [72]. Moreover, there are new computational methodologies under development to detect DNA variants, as SNP, as predictive factors of diseases [73]. In PCa, the combination of these two technological advances have already led to the identification of new genetic susceptibility regions and gene signatures [18,19,60,61] hitherto unknown via conventional logistic regression.
In the past few years, the number of prostate biopsy sections has increased significantly [74,75]. With 12 to 16 biopsies, each procedure results in several slides. Each slide must be analyzed, even if it does not contain cancer. Automated diagnosis for prostate cancer could allow the exclusion of these normal sections to save time for pathologists [76]. Some CNN models have already outperformed pathologists in the Barrett's esophagus inter-observer agreement [77] or in the detection of breast cancer metastasis in the lymph nodes with a time constraint [78]. In terms of time constraint, the contributions from Kim [43] and Heintzelman [63] have shown that the deployment of AI can enable the instantaneous tasks completion that are burdensome for humans with nearly perfect accuracy.
Radiomics appears to be another promising field of study in the scope of personalized healthcare. Radiomics can help through the entire continuum of healthcare, from the diagnosis to the prediction of therapy response. Based on image acquisition and segmentation from different modalities (MRI, CT, PET), radiomics allows the extraction of a great amount of radiomic imaging features. These features correlated with patient information, and may be clinically relevant to define cancer aggressiveness and prognosis but also treatment outcomes [79][80][81][82]. New fields of research are now emerging from these developments such as radiogenomics, which study the correlation between gene expression and imaging or radiomic features [79]. With its large amount of numerical data, radiomics also benefits substantially from coupling with ML algorithms. Radiomics models and ML approaches now represent non-invasive methods to equip physicians to provide personalized healthcare through prognostic information. In radiation oncology, Lambin [83] already highlighted the importance of tailoring the radiotherapy on intrinsic tumor properties for treatment decisions and personalized healthcare [84]. However, the contributions of radiomics still need to be assessed prospectively and with a standardization of methods.
AI also provides us with insightful support in the treatment field. Robotic surgery is now widespread. To improve the time and safety of surgery, augmented reality seems to be promising. In RARP, Ukimura [53] and Porpiglia [85,86], have shown that the use of this feature was safely feasible. By 3D reconstructing a virtual model based on two-dimension imaging, anatomical structures and tumor location are highlighted to guide surgeons during the operation and could improve surgical and oncologic outcomes. This new technology is also usable in open surgery [87], in simulation and training [88]. In addition to urological surgery [89], it is also used in hepatobiliary [90][91][92], otolaryngology [93,94] or orthopedic surgery [95]. Hung [54][55][56] has developed an interesting tool combining robotics and AI. By recording the surgeon's movements while operating, AI provide automated performance metrics and determine global movement features [96][97][98]. ML now allows the extraction and exploitation of surgical skills. With these data, new opportunities are opening. It is now feasible to evaluate surgical skills, train them efficiently, establish as yet unknown correlations with post-operative or oncologic outcomes, and set a new standardization of practices [99].
These new discoveries are achieved by the ability to recognize complex patterns through nonlinear relationships, with high accuracy, even when data are missing or contaminated [100]. These methods raise some concerns and bring up issues. First, to be legitimate or acceptable, or even to be rejected as irrelevant, a ML algorithm's decisions must be understandable and therefore explainable. The number of micro-thought processes carried out by the system is beyond our understanding capacities. As previously stated, there is a hidden layer of neurons in ANN software. Therefore, calculations that result in outcomes are not visible for the programmer who created ANNs, responsible for opacity [3,101]. It is the black-box effect. The data goes in the black box and the prediction comes out, but without any accompanying justification [102]. It is the difference between prediction and explanation. Failure to understand the algorithm decision can lead to misinterpretation. One other concern is related to the training sample used to propose ML algorithm. To be functional, a ML algorithm needs a training step, using usually a training sample from the study population. However, this sample, which is often limited in size, may not represent the general population. In a systematic review of skin cancer classification using CNN, Brinker [16] highlights the lack of dark-skinned people in the different training samples to be representative and be applied on world population. For PCa, it is known that ethnic origin is source of some predisposition. PCa does not have the same characteristics and severity in an Afro-American or Asian population [103]. Yet, most of the previously reviewed studies are based on Asian populations [20,26,28,37,42,50,59] and are probably not applicable to an African American population. Heintzelman [63] emphasizes this difference by showing that Afro-American patients in their study were clustered at the upper end of the pain index spectrum, even if these results were not significant.
As already mentioned, a ML algorithm performance is related to the training step, cohort parameters, and optimization processes. Each model is therefore different from another, created and trained with another set of data. In addition, programmers and researchers often keep to themselves their code source code and data. There is an emergence of many AI models, which differs in the functioning, with complicated reproducibility to establish. Therefore, there is a critical need to propose external validation at these algorithms before any use in daily practice. In addition to this lack of external validation, AI algorithms are insufficiently compared to current references and comparing ML software with human experts is not a common practice [104], even in PCa. In this review, only 6 studies have been compared to current reference prediction methods, such as Partin tables [105,106], or expert opinion [44,47,48]. Therefore, the comparison between AI and human experts is still underdeveloped in PCa while in other fields, AI has achieved a confirmed expert level such as in classification of skin cancer [107], retinal diseases [108,109], or colorectal polype classification [110,111].
A systematic review of the literature [112] and a methodology review [113] have been performed to assess the benefits of ML and ANN on logistic regression, respectively. They both found no evidence of superior performance of ML and ANN over logistic regression for clinical prediction modeling. However, they highlighted the necessity of improvements in methodology, such as model validation or corrections for imbalanced outcomes. However, they did not investigate which factors influencing the difference in performance, such as the sample size. In this article, some of the reviewed studies have small sample size. Even though the sample size affects the performance of the ML model created [114], it appears possible to design a robust ML algorithm with a limited sample size [115] by using "synthetic dataset". Partin tables use clinical parameters of PCa to predict whether the tumor will be organ-confined [105,106]. In our review, NLP software [43,63] have outperformed the working and data-mining abilities of humans. Systematically, ML models had results comparable at expert levels. The overall agreements between pathologists or radiotherapists was equivalent to the AI. In [48], there was a slight difference in the groups made by the CNN model compared to pathologist, with a better disease-specific prediction. This further underlines the fact that an AI algorithm can reveal complex patterns.
This black-box effect is also responsible for a lack of reproducibility [70,116,117]. These reproducibility issues are also related to the lack of sharing of assets and numerous modular components. There is an emergence of many AI models, which differs in the functioning, with complicated reproducibility to establish. However, in all studies reviewed, a major part of them cited their AI software or the method they used.
Finally, AI models are also used in other cancers. In skin cancer classification, CNNs already display a high-performance skin lesion classification [16]. For breast cancer, interest in the application of AI models to breast MRI is growing worldwide [118]. Such applications are still in development and are not used in clinical practice, but computer-aided diagnosis could be useful for breast cancer [119,120]. As for PCa, AI has been used in many cancers with mixed results such as lung cancer [121], neuro-oncology [122,123], or esophagus cancer [124].

Review Limitation
There are limitations to our review. First, we decided to select only published studies from 2009. We considered that articles had to be recent to be relevant, since AI has undergone a major expansion over the past decade. Secondly, since most of us are urologists, and not computer scientists, we focused on the use of AI models and clinical results obtained, rather than detailing techniques or methodological parts of these same models. As we developed previously, this review highlights the retrospective and heterogeneous nature of these studies. Each study has a different ML algorithm. Larger validation datasets, with different populations and environments, are needed to achieve external validation and promise a universal tool usable by everyone. Until now, the clinical relevance of these algorithms has not been deeply explored from a clinical and a statistical point of view, leading to a slowdown of the applicability of these results.

Conclusions and Perspectives
Worldwide, 1.4 million new cases of PCa were diagnosed in 2016 and nearly 400,000 people died, which is twice the rate of 1990 [125]. Predictive, preventive, and personalized medicine [126] has the potential to provide a useful role by predicting PCa more accurately, using a multiomic approach [127] and risk-stratifying patients to provide personalized medicine. In PCa, AI can already help through the entire continuum of healthcare, from the detection of predictive factors to the identification and prediction of risks after treatment. AI models can be superior to conventional statistical techniques while emphasizing complex patterns and reliable prediction. To use these new models, external validation is critical and still poor at that time. To continue to expand and improve, AI models will require a large amount of well-annotated and representative data from various sources and many forms. The challenge for the future is therefore to collect and share data, develop and combine different approaches, and, above all, propose reliable validations of any model. Only in this way will AI promptly enter in the daily management of PCa.
Author Contributions: All authors researched data for the article, made substantial contributions to discussions of content and reviewed and edited the manuscript before submission. R.T. wrote the manuscript.