Current Status of Biparametric MRI in Prostate Cancer Diagnosis: Literature Analysis

The role of multiparametric MRI (mpMRI) in the detection of prostate cancer is well-established. Based on the limited role of dynamic contrast enhancement (DCE) in PI-RADS v2.1, the risk of potential side effects, and the increased cost and time, there has been an increase in studies advocating for the omission of DCE from MRI assessments. Per PI-RADS v2.1, DCE is indicated in the assessment of PI-RADS 3 lesions in the peripheral zone, with its most pronounced effect when T2WI and DWI are of insufficient quality. The aim of this study was to evaluate the methodology and reporting in the literature from the past 5 years regarding the use of DCE in prostate MRI, especially with respect to the indications for DCE as stated in PI-RADS v2.1, and to describe the different approaches used across the studies. We searched for studies investigating the use of bpMRI and/or mpMRI in the detection of clinically significant prostate cancer between January 2017 and April 2022 in the PubMed, Web of Science, and Google Scholar databases. Through the search process, a total of 269 studies were gathered and 41 remained after abstract and full-text screening. The following information was extracted from the eligible studies: general clinical and technical characteristics of the studies, the number of PI-RADS 3 lesions, different definitions of clinically significant prostate cancer (csPCa), biopsy thresholds, reference standard methods, and number and experience of readers. Forty-one studies were included in the study. Only 51% (21/41) of studies reported the prevalence of csPCa in their equivocal lesion (PI-RADS category 3 lesions) subgroups. Of the included studies, none (0/41) performed a stratified sub-analysis of the DCE benefit versus MRI quality and 46% (19/41) made explicit statements about removing MRI scans based on a range of factors including motion, noise, and image artifacts. Furthermore, the number of studies investigating the role of DCE using readers with varying experience was relatively low. This review demonstrates that a high proportion of the studies investigating whether bpMRI can replace mpMRI did not transparently report information inherent to their study design concerning the key indications of DCE, such as the number of clinically insignificant/significant PI-RADS 3 lesions, nor did they provide any sub-analyses to test image quality, with some removing bad quality MRI scans altogether, or reader-experience-dependency indications for DCE. For the studies that reported on most of the DCE indications, their conclusions about the utility of DCE were heavily definition-dependent (with varying definitions of csPCa and of the PI-RADS category biopsy significance threshold). Reporting the information inherent to the study design and related to the specific indications for DCE as stated in PI-RADS v2.1 is needed to determine whether DCE is helpful or not. With most of the recent literature being retrospective and not including the data related to DCE indications in particular, the ongoing dispute between bpMRI and mpMRI is likely to linger.


Introduction
Prostate cancer (PCa) is the second-leading cause of cancer-related deaths among individuals born biologically male [1]. Multi-parametric MRI (mpMRI) has been recognized as an important tool in the detection, localization, staging, and management of prostate cancer [2]. mpMRI demonstrates high sensitivity and specificity for identifying clinically significant prostate cancer (csPCa), with cancer detection rates up to 80-90% [3][4][5]. Depending on the patient selection, mpMRI has also demonstrated a negative predictive value (NPV) of 63-98% and could reduce unnecessary biopsies by more than 27% [6,7]. The high incidence of PCa in addition to the strength of mpMRI necessitates widespread adoption of prostate MRI. As a response to mpMRI's growth, the Prostate Imaging Reporting and Data System (PI-RADS) guidelines were introduced in 2012 as PI-RADS v1 to standardize prostate mpMRI acquisition, interpretation, and reporting [8].

Role of I.V. Contrast (DCE Imaging) in Prostate Cancer Imaging and Controversies
Since the genesis of PI-RADS, there have been updates and refinements, with PI-RADS v2.0 being released in 2015 and the latest guidelines being PI-RADS v2.1, introduced in 2019 [9]. The updates/refinements, introduced in response to observed inter-/intrareader variability in PI-RADS v1, include changing the roles/responsibilities of the various mpMRI sequences (T2WI, DWI, ADC, DCE). One of the many important recommendations in PI-RADS v2.1 details the uses of anatomical T2-weighted imaging combined with functional dynamic contrast enhancement (DCE) and diffusion-weighted imaging (DWI), with DCE producing the most controversy. The debate surrounding I.V. contrast in prostate imaging is centered on the utility of DCE imaging. The concept of dominant sequences remained unchanged in the latest PI-RADS v2.1 update, with roles for DWI in the peripheral zone and T2WI in the transition zone [10]. The role of DCE in PI-RADS v2.1 is to serve as a modifier to upgrade peripheral zone lesions from PI-RADS 3 to category 4 in the presence of early focal enhancement [6,9]. It was also mentioned in PI-RADS v2.1 that when T2WI and DWI are of insufficient diagnostic quality, DCE utilizing I.V. contrast can assist in determining the PI-RADS assessment category [10,11]. In summary, the PI-RADS v2.1 indications for DCE, and thus for I.V. contrast, are: (1) identifying PI-RADS 3 lesions that include clinically significant prostate cancer; (2) assisting in the readout of MRIs with suboptimal diagnostic quality for T2WI and DWI sequences resulting from noise/artifacts; and (3) assisting radiologists with relatively low experience in reading prostate MRIs.
The role of DCE in detecting csPCa has been a continuing controversy due to it being more time-consuming, having potentially increased risks associated with gadoliniumbased contrast agents, and having increased costs, as well as the apparently minor contribution/role of DCE defined by the PI-RADS v2.1 guidelines [12][13][14]. Biparametric MRI (bpMRI) has been proposed to alleviate the limiting factors/controversies of mpMRI by omitting DCE from the examination. A major caveat for the use of bpMRI is that it does not apply to the settings involving tumor recurrence after radiation therapy, focal therapy, or radical prostatectomy. In these scenarios, DCE plays a more dominant role, since contrast enhancement is one of the most reliable features of a disease in the context of therapy-induced prostatic changes, which make PI-RADS inapplicable [15]. Another caveat related to the indications [2,3] for the use of bpMRI is the variability in diagnostic accuracy, which is significantly influenced by the experience of the readers and the quality of the MRI scan. It is claimed that less-experienced readers benefit more from DCE, with one study highlighting the more robust nature of DCE in meeting diagnostic quality, with T2WI and DWI combined only meeting the diagnostic threshold in 60% of cases compared to DCE in 93% [16]. It has been suggested that the more robust nature of DCE can be explained by the higher spatial resolution and it being less prone to motion or susceptibility artifacts. One review including 77 articles showed that reading experience and biopsy experience were the main factors that influenced diagnostic accuracy. They found that the use of bpMRI appears to be most effective with experienced readers and when good image quality is available, but DCE MRI should be used as a backup for those with less experience [17]. In summary, bpMRI has been shown to suffer in non-expert, low-volume, lower-field-strength scanners, suggesting further prospective studies be performed to specifically test these indications for DCE [18]. Owing to the heterogeneity of the studies and complexity of the disease, the impact of removing DCE on diagnostic accuracy remains to be determined.

Prior Reviews Comparing mpMRI against bpMRI (without DCE)
Previously published literature reviews and meta-analyses comparing pre-biopsy mpMRI versus bpMRI head-to-head have provided variable conclusions. Most studies investigating whether bpMRI can replace mpMRI directly compare the clinically significant cancer detection rates, most often by retrospectively removing DCE from mpMRI assessment. Notably, these reviews pooled studies for patients with all PI-RADS lesions ranging from category 1-5 but did not provide any insight into individual pooled PI-RADS 3 patients/lesions or any sub-analyses of scan quality or reader experience, which are the primary indications for DCE in PI-RADS v2.1. One systematic review and meta-analysis by Woo et al. included 20 studies (from January 2008-2017) and found that the performance of bpMRI was similar to that of mpMRI in the diagnosis of prostate cancer [2]. Another meta-analysis by Niu et al. included 33 studies (from January 2000-July 2017) and found that mpMRI had better pooled sensitivity (mpMRI; 0.85, bpMRI; 0.80) but similar pooled specificity (mpMRI; 0.77, bpMRI; 0.80) [19]. A meta-analysis of a similar size by Alabousi et al. included 31 studies (from January 2012-2018) and found no significant differences in the pooled sensitivity (mpMRI: 0.86, bpMRI: 0.90) or specificity (mpMRI: 0.73, bpMRI: 0.70) [20]. A larger meta-analysis by Bass et al. included 44 studies (from January 2017-June 2019) and the meta-regression revealed no differences in the pooled diagnostic estimates between bpMRI and mpMRI. The Bass et al. review was very similar to the Niu et al. and Alabousi et al. reviews, except this study added seven more papers to the final pooling, with Bass et al. mentioning that the heterogeneity of the data did not allow definitive recommendations to be made [21]. A meta-analysis by Liang et al. of 45 studies (from 2007-2019) found a slightly significant difference in sensitivity (mpMRI: 0.84, bpMRI: 0.77) but no significant difference in specificity (mpMRI: 0.82, bpMRI: 0.81) [22]. A special note should be added that most of these reviews, except for that by Bass et al., pooled studies using different PI-RADS versions because of the years the studies were published. Since the roles of the various sequences (T2, DWI, ADC, and DCE) changed between versions and evidence of inter-/intrareader variability has been shown to exist between the PI-RADS versions, studies from different PI-RADS eras are more difficult to pool and directly compare.
Only a handful of prior reviews gave PI-RADS 3 lesion-specific advice. Of these, a smaller review of ten studies that found no significant differences in sensitivity or specificity defended the use of biopsy for all PI-RADS 3 lesions [23]. Other narrative reviews proposed the adoption of simplified PI-RADS scoring or the reservation of contrast medium for PI-RADS 3-4 lesions to offer improved management with fewer biopsies [24,25].
All these prior reviews mention study heterogeneity as a limitation but a more detailed analysis of what is missing and what should be included is still needed. Additionally, the documented dependency of the use of DCE on experience and MRI quality is underreported and under-investigated in the reviews discussed above. This observation also matches emerging proposals for quality metrics in the prostate cancer diagnostic pathway, such as explicitly reporting information on each PI-RADS category in addition to explicitly commenting on image quality using the Prostate Imaging Quality (PI-QUAL) system [26][27][28]. In response to these observations, this review sought to understand and describe the extent of the study heterogeneity and the reporting on quality metrics by looking at the latest original research (within the past 5 years), published after the role of DCE was updated in 2015, that sought to answer the primary research question: Can bpMRI replace mpMRI, via the omission of DCE, in the screening and assessment of clinically significant prostate cancer without diminishing diagnostic sensitivity/accuracy? Additionally, this review sought to identify the key indications of DCE as reported in the PI-RADS v2.1 guidelines and assess the frequency with which these key indications and emerging quality metrics are reported in studies trying to answer the primary research question.

Paper Eligibility and Selection
The key search terms used in medical databases to find eligible papers included the following: "prostate", "MRI", "biparametric", and "DCE". MEDLINE, Web of Science, Google Scholar, and PubMed were used to search for eligible papers published between January 2017 and April 2022. The inclusion criteria for papers included being an original manuscript, publication in the last five years, focus on bpMRI or mpMRI, and matching key words. All papers underwent title, abstract, and full-text screening. Papers were excluded if they had a different focus, were written in a language other than English, if incorrect outcome measures were reported, or if the paper was a review article or letter ( Figure 1). Three reviewers independently searched for and screened all records.

Data Collection
Data from each study were reviewed and recorded in an Excel spreadsheet (Microsoft, Seattle, WA, USA). All information was captured either from full-text reading or

Data Collection
Data from each study were reviewed and recorded in an Excel spreadsheet (Microsoft, Seattle, WA, USA). All information was captured either from full-text reading or by referencing Supplementary Materials if provided. In addition to general study details (authors, year of publication), the following were recorded: study design (prospective or retrospective), diagnostic test used in the study (i.e., bpMRI, mpMRI, clinical parameters, biomarkers), sample size (i.e., total number of patients, number of prostate cancer patients, and number of clinically significant prostate cancer patients), characteristics of the sample (i.e., age, Prostate Specific Antigen (PSA), PSA density, prostate volume), definition of clinically significant cancer according to the authors in the study, number of radiologists/trainees and their collective experience in the field in years, type of scoring system used by the readers (i.e., PI-RADS, Likert), the number of PI-RADS 3 lesions detected in bpMRI and/or mpMRI, preference of reference standard (biopsy or prostatectomy), exclusion of MRI images of low quality, and MRI technique (i.e., 3T or 1.5T MRI scanner, endorectal coil usage). In cases where data could not be obtained due to limitations in reporting, the value "did not indicate" (DNI) was assigned. Data extraction was conducted by three readers independently. Due to high variability in the studies, readers consulted with each other during the process regularly and came to an agreement in cases of discrepancies. All data were processed using Python and the following packages: matplotlib, pandas, numpy, scipy, and statsmodels. The method used to tabulate and visually display results was the plot function within the pandas module.

Results
A total of 132 unique records were identified by searching for the keywords in Google Scholar, MEDLINE, and Web of Science. These records then went through a screening process examining the titles and abstracts, after which 42 studies were excluded for the following reasons: n = 30 review articles, n = 11 letters/proposals/methods papers, n = 1 lack of accessibility. For the 92 remaining studies, full-text screening filtered out an additional n = 56 studies for the following reasons: n = 32 beyond the focus of this analysis, n = 19 combined bpMRI with machine-learning model or extra clinical variables, n = 2 different population of interest, n = 2 zone-specific studies, n = 1 no outcome of interest. Moreover, n = 7 additional studies were included during the revision. After all studies were fully screened, 41 studies remained for the final analysis. Table 1 presents the clinical characteristics of the studies included in this review. Table 2 Table 3 presents the technical characteristics of the studies included in this review. Of the included studies (n = 41), 12 were prospective and 29 were retrospective analyses. Regarding the study cohorts, most were biopsy-naive (n = 26), two were repeat biopsy patients, five were patients with proven cancer, three had mixed populations, and five did not indicate the biopsy status of the patients. Nearly 46% (19/41) of the studies excluded MRI scans with insufficient quality due to artifacts caused by motion or rectal gas (Figure 2). The studies varied in their focus, with 32 providing head-to-head comparisons of bpMRI and mpMRI, 7 comparing bpMRI to historic mpMRI standards, and 2 focusing purely on mpMRI.      With respect to the definition of csPCa, 4 studies did not indicate one, 31 used the Gleason score (GS) ≥ 7 (3 + 4), and the remaining 6 used permutations of GS cutoffs ranging from 6-7 combined with the presence of an extra-prostatic extension (EPE), index lesion volume > 0.5 mL, max cancer core lengths (CCLs) ranging from 4-6 mm and Gleason grade group (GG) cutoffs ranging from 2-3. In terms of reference standard methods, 23 studies used biopsies, 6 preferred radical prostatectomies, and 10 utilized both procedures to confirm the presence of clinically significant cancers, with 2 studies not reporting the method of confirmation. With respect to MRI strength, 28 studies used 3T, 8 used 1.5T, and 4 used both 1.5T and 3T; 1 did not indicate. With respect to endorectal coil (ERC) use, only 5 studies used them while the remaining 36 omitted them from examinations. Across all studies, the median number of cumulative years of experience per study was 17 years (Range: 5-45) and the median number of readers per study was 2 (Range: 1-13). Variations in clinical parameters and technical characteristics are collectively presented in Table 4.

Discussion
This review analyzed papers from the last 5 years that had the goal of answering the following research question: Can bpMRI replace mpMRI, via the omission of DCE, in the screening and assessment of clinically significant prostate cancer without diminishing diagnostic sensitivity/accuracy? According to PI-RADS v2.1, the role of the DCE sequence in the staging of prostate cancer is primarily to assist in the appropriate grading of PI-RADS 3 lesions in the peripheral zone. Lesions that display early enhancement from DCE are upgraded to PI-RADS 4, potentially changing the course of patient treatment and care. It was also mentioned in PI-RADS v2.1 that when T2WI and DWI are of insufficient diagnostic quality, DCE plays a larger role in determining the PI-RADS assessment category. Therefore, any study trying to answer this research question should test for the experimental conditions for which DCE and thus I.V. contrast are indicated. The PI-RADS v2.1 indications for DCE are: (1) determining PI-RADS 3 lesions that include clinically significant prostate cancer; (2) assisting in the readout of MRIs with suboptimal diagnostic quality for T2WI and DWI sequences; and (3) assisting radiologists with relatively low experience in reading prostate MRIs. These indications help to assess the claims that DCE is a safety net for when T2WI/DWI sequences are unhelpful and to evaluate the reader experience-dependency of DCE, which has been demonstrated in the literature. These indications are also emerging as proposed quality metrics for prostate MRI that specifically endorse explicitly reporting information on each PI-RADS category in addition to explicitly commenting on image quality utilizing the Prostate Imaging Quality (PI-QUAL) system [26][27][28].

Studies Meeting the Majority of the Defined DCE Indications
The studies discussed in detail below are those that comprehensively reported PCa and csPCa rates within their equivocal lesion populations and either (1) also included multiple readers or (2) utilized 1.5T or did not explicitly mention removing bad quality scans. None of the studies met all the DCE indications defined previously.
Bao et al. [30] carried out a two-center, retrospective study of 638 individuals with the primary aim of comparing the diagnostic accuracy of bpMRI, mpMRI, and what they defined as optimized (Op) MRI, a combination of bpMRI and mpMRI based on PI-RADS 3 lesions. Their study utilized 3T MRI and included six radiologists (all with ≥1000 prostate MRI scans read), with two reading bpMRI (10 and 3 years of experience) according to PI-RADS v2.1 and two reading mpMRI (12 and 5 years of experience) using DCE. PI-RADS 3 lesions were assigned in 18.2% of cases (116/638) for bpMRI and 11.3% (72/638) for mpMRI. Using PI-RADS 3 as the diagnostic criterion, both methods had similar clinically significant cancer detection rates for both junior and senior radiologists. In this study, the interrater agreement between junior and senior readers was high and the interrater agreement between bpMRI and mpMRI was high. This study very elegantly showed the results of different protocols on the classification of equivocal lesions and provided evidence that junior readers may perform similarly to seniors with and without DCE. It is worth noting that this study specifically excluded MRI scans with poor image quality and imaging exams from outside institutions.
Bosaily et al. [33] carried out an extension of the PROMIS study, which is a large, multi-center, prospective study of 497 biopsy-naive individuals with the primary aim of assessing the diagnostic accuracy of pre-biopsy mpMRI using standard 1.5T machines without endorectal coil. This study extension was not statistically powered to evaluate differences in sequences. The authors found that the addition of DCE was helpful in correctly identifying any csPCa (GS ≥ 7 (3 + 4) irrespective of cancer core length) lesions compared to both T2WI and T2WI + DWI. The addition of DCE reduced the number of equivocal scores (3/5) slightly, with 28% of patients classified as equivocal compared with 32% using T2WI + DWI alone. Their findings did not show a meaningful difference between bpMRI and mpMRI depending on the definition of clinically significant prostate cancer, with other considered definitions including cancer core (GS ≥ 7 (3 + 4) or cancer core length ≥ 4) or without cancer core (GS ≥ 7 (4 + 3) irrespective of cancer core). However, their overall conclusions mentioned that the addition of DCE did not significantly improve the diagnostic accuracy of T2WI + DWI. This study shows some of the variation that arises from different definitions of clinically significant prostate cancer and how the conclusions on the utility of DCE depend on a thorough understanding and evaluation of equivocal lesions. This study did not evaluate the dependence of reader experience or scan quality, but their detailed analysis of equivocal lesions is invaluable.
Cereser et al. [37] carried out a retrospective study on a prospectively collected database of 108 individuals with the primary aim of comparing multiple mpMRI-derived protocols in detecting csPCa. This study incorporated two readers with varying experience levels (R1 600 cases vs. R2 250 cases) and evaluated each imaging sequence in series with DCE being the last addition. The strengths of this study include its histopathological mapping of whole-mount histopathology sections to MRI and comprehensive reporting on their equivocal lesion population. The interrater agreement was found to be highest for mpMRI compared to bpMRI; however, these protocols showed comparable cancer detection rates with no significant interrater differences considering a threshold of PI-RADS ≥ 3. DCE influenced the final PI-RADS score in 8% (11/137) of observations for R1 and 7.7% (9/117) of observations for R2. The csPCa rates in the upgraded category 3 to category 4 lesions for each reader were 54.5% (6/11) for R1 and 88.9% (8/9) for R2. Their overall conclusions were that including DCE imaging has the potential to minimize PI-RADS v2 category 3 observations while prompting appropriate biopsies.
Brancato et al. [34] carried out a retrospective study of 111 patients with 117 lesions with the primary aim of measuring the added value of DCE-MRI in combination with T2WI + DWI with respect to both reproducibility and diagnostic accuracy. This study included three separate radiologists all with similar years of experience (7-10 years). They found that the best overall results for interrater agreement were reached when considering only csPCa (GS ≥ 7 (3 + 4) irrespective of cancer core length) and an mpMRI-based PI-RADS classification. However, the overall findings related to diagnostic accuracy revealed that the PI-RADS scoring in bpMRI protocols was comparable to that assigned in the mpMRI protocol. This study did not perform a sub-analysis for equivocal lesions or mention the quality of the scans included.
Han et al. [46] carried out a retrospective study of 123 individuals with the primary aim of comparing the performance of bpMRI and mpMRI combined with PSAD in detecting clinically significant prostate cancer. This study only included patients with a PSA between 4-10 ng/mL. For image analysis, this study included two separate radiologists both with >5 years of experience, and both radiologists first read the scans as bpMRI, took one-month washout, and then read the scans again as mpMRI (including DCE). This study comprehensively analyzed the detection rates for each PI-RADS category and found that in approximately 10.6% (13/123) of their population, DCE influenced the final PI-RADS score. Looking at their PI-RADS category 3-specific findings, there were eight lesions upgraded to category 4 and 62.5% (5/8) of these DCE-positive lesions were upgraded from the added DCE findings, which were from prostatitis. There were some discrepancies between category 3 and 4 lesion assignments in the csPCa detection rates for category 3 lesions (mpMRI: 3/7 csPCa, bpMRI: 4/18 csPCa) and category 4 lesions (mpMRI: 16/29, bpMRI: 15/21). Although there were multiple readers, this study did not assess the level of agreement or individual assignment mismatch. This study also did not mention the quality of the MRI scans or removal of low-quality scans. The overall results from this study suggest bpMRI achieves better performance than mpMRI in detecting csPCa, considering a significance threshold of PI-RADS ≥ 3.
Junker [48] et al. carried out a retrospective study of 236 patients with the primary aim of investigating if and how omitting DCE influences diagnostic accuracy and tumor detection rates. This study utilized 1.5T and 3T scanners with endorectal coil. Image interpretation was carried out by one experienced uro-radiologist who first retrospectively reviewed MRI datasets without DCE using PI-RADS v2, then conducted an evaluation with DCE in the exact same reading session. DCE influenced the final PI-RADS v2 score in 9.75% (23/236) patients. There were a total of 135 PCa lesions and utilizing bpMRI led to the downgrading of 5.93% of PCa lesions (8/235) from PI-RADS category 4 to category 3 and 62.5% (5/8) of these PCa lesions were GS = 7 (3 + 4), or clinically significant cancer. This study also defined another cutoff for clinically significant cancer (GS ≥ 7 (4 + 3)), for which they found no significant differences between bpMRI and mpMRI, largely due to the exclusion of PCa lesions, primarily pattern 3. No PCa lesions were downgraded from higher scores to a score < 3; therefore, no additional PCa was scored as benign or completely missed. Limitations related to this study involved the interpretation of bpMRI and mpMRI in the same session, and there was no sub-analysis or mention of scan quality. The overall results from this study suggest that omitting DCE did not lead to significance differences in the diagnostic accuracy of tumor detection rates; however, this was with respect to a clinically significant cancer definition of GS ≥ 7 (4 + 3).
Pesapane et al. [54] carried out a retrospective analysis of 431 individuals with the primary aim of comparing the performance of mpMRI and bpMRI in PCa detection in individuals with elevated PSA levels. This study utilized a 1.5T scanner and ERC, excluding patient scans with artifacts and scans without ERC. This study included two radiologists with different amounts of experience (3 years vs. 5 years) in the interpretation of prostate MRIs. bpMRI readouts were performed first and then, after a one-month washout period, mpMRI readouts with DCE were conducted. Intrareader agreement was found to be substantial. DCE influenced the PI-RADS score of 6% (25/431) of scans for reader 1 and 8% (35/431) for reader 2. For high-grade PCa cases, for bpMRI, R1 and R2 had a sensitivity of 84% and 80%, and the specificity was 77% and 74% for R1 and R2, respectively. For mpMRI, the sensitivity was 86% and 80% and specificity was 78% and 74% for R1 and R2, respectively. The overall results from this study suggest there is no significant reduction in diagnostic performance of bpMRI compared to mpMRI. This study did not perform a sub-analysis and explicitly described removing scans with artifacts and those without endorectal coil.
Tamada [59] et al. carried out a retrospective analysis of 103 patients (with 165 suspected PCa lesions) with the primary aim of comparing the interobserver reliability and diagnostic performance of bpMRI compared to mpMRI using PI-RADS v2.1. This study utilized a 3T scanner with a pelvic phased-array coil and included three radiologists with widely ranging experience levels (8 years R1, 12 years R3, and 22 years R2). The bpMRI and mpMRI reading sessions were conducted in the same session with the traditional flow of bpMRI first and mpMRI second. The interrater reliability was shown to have good agreement for both bpMRI and mpMRI but was lowest for PZ lesions. This study had two separate definitions of clinically significant cancer: (1) tumor with GS ≥ 7 and tumor diameter ≥ 5 mm or (2) tumor with GS = 3 + 3 and tumor size ≥ 0.5 mL (tumor diameter ≥ 8 mm). When comparing the diagnostic performance for csPCa detection between bpMRI and mpMRI, they found that diagnostic sensitivity was significantly higher in all readers for mpMRI, but diagnostic specificity was significantly lower in all readers compared to bpMRI. For their PI-RADS categories 3 and 4-specific sub-analysis, the false-positive rate for upgrading PI-RADS category 3 to category 3 + 1 (category 4) for the readers was 62% R1 (12/21), 73% R2 (19/26), and 57% R3 (12/21). The high false-positive rate may have resulted from the enhancement effect on DCE-MRI in benign prostatic conditions such as prostatitis and fibrosis. The overall conclusion for this study suggests that bpMRI may be acceptable for detecting csPCa; however, solutions need to be sought to improve diagnostic sensitivity. This study did not perform a sub-analysis based on scan quality nor mention any effects of quality on interpretation.
The studies above represent those that comprehensively reported on their PI-RADS category 3 lesion population and they give many valuable insights into the potential utility of DCE and its limitations. For the studies with multiple readers that reported on their csPCa PR3 population, many of these studies showed good interrater agreement between junior and senior readers, with the highest interrater agreements generally being ascribed to mpMRI. The impact of removing DCE on diagnostic accuracy in the studies above largely depended on their definition of clinically significant prostate cancer, with studies using a definition of GS ≥ 7 (3 + 4) tending to show superiority for mpMRI, in contrast to studies that used other definitions (GS ≥ 7 (4 + 3), which tended to show equivalent performance. The impact of DCE was also heavily dependent on the significance threshold for biopsies, which in many studies was identified to be PI-RADS category ≥ 3. Due to the comprehensive reporting on their PI-RADS equivocal populations, it is easy to understand how changing this significance threshold would significantly impact results. It is also noteworthy that this significance threshold for biopsies was not uniformly agreed on and is another area of controversy and active research. All studies that comprehensively reported on their PI-RADS equivocal populations found that DCE was required for the final PI-RADS assessment in approximately 6-10% of their cohorts due to upgrading from category 3 to category 4. Additionally, all these studies either did not mention scan quality as a variable or they explicitly removed MRI scans with low quality. This last point emphasizes that the current results and conclusions regarding the utility of DCE must be understood within the special context in which all images are of higher diagnostic quality. However, as mentioned previously, in the context of decreased scan quality, DCE has shown evidence of having more robust quality compared to DWI or ADC maps [10,16,17].

DCE Indications and Frequency of Reporting
Although 100% (41/41) of the included papers sought to answer the primary research question, only 71% (29/41) of the included papers reported the number of PI-RADS 3 lesions in their respective patient cohorts. An even smaller percentage (51%, 21/41) reported what proportion of these PI-RADS 3 lesions included clinically significant prostate cancer. Thus, most of the papers that were trying to answer the primary research question did not report key elements of indication #1 and, as a result, their findings are much harder to interpret for the population of interest. Indication #1 is extremely important to report since the prevalence of PI-RADS 3 lesions and of csPCA within these lesions is hospital/institutiondependent. The papers described below are those in the minority that reported their PI-RADS 3 population. The results of one of the included studies [7] emphasize the importance of understanding the population, as they provide tumor burden-dependent suggestions. They suggest bpMRI for patients with average risk and PSA < 10 ng/mL for whom EPE is minimal but the risk of cancer mortality is not marginal. They then suggest that patients with higher PSA, among whom EPE is more common, use the full mpMRI examination for assessment of cancer. These disease burden-dependent suggestions rely heavily on understanding and reporting the population under study, and if studies attempting to answer the primary research question do not address this transparently, they will be severely limited. Another study [49] that was in the minority of papers that reported their PI-RADS 3 population investigated bpMRI and concluded that PI-RADS 3 lesions are expected to require additional tools to supplement bpMRI based on their results as compared to other PI-RADS lesions. This emphasizes the importance of reporting PI-RADS 3 prevalence in a study to enable more accurate research conclusions. Another study [36] that reported the PI-RADS 3 population in detail found that when the high-risk threshold was low (0.1 to 0.45), bpMRI was superior to mpMRI. However, when the risk interval was larger than 0.5, PI-RADS v2 with incorporation of DCE was better than bpMRI. This paper found that for lesions that were in category 3 in bpMRI, the application of DCE via PI-RADSv2 improved cancer detection. This study provides further support for the importance of reporting the PI-RADS 3 lesion population and thus highlights the limits of papers making conclusions about the utility of DCE without talking at all about PI-RADS 3 lesions.
Further, most (68.3%, 28/41) of the included papers used high-quality 3.0 Tesla scanners, minimizing the possibility of low-quality T2WI and DWI sequences. With regard to image quality, 46% (19/41) of studies made explicit statements about removing low-quality MRI patient scans resulting from MRI noise, motion, or artifacts. These key areas of low-image quality are where DCE has been indicated to be a strength. On assessing the utility of DCE for patient cases where the T2WI and DWI sequences are low quality (indication #2), 0% (0/41) of the included papers included any details about their MRI specifics or performed any type of sub-analysis or result stratification based on image sequence quality. If studies only include scans with high-quality T2WI and exclude scans with motion/artifacts/noise that make the T2WI and DWI non-diagnostic, then these studies will not accurately reflect real-world scenarios. Thus, every included paper trying to answer the primary research question omitted analyzing indication #2 and thus their findings are much harder to interpret for the population of interest.
Finally, only a small number of the studies included head-to-head comparisons of multiple readers with varying experience. Prior research has found that experienced readers can maintain high diagnostic accuracy when reading bpMRIs while less experienced readers strongly benefit from the DCE sequence [44,65]. The accuracy of detection of prostate cancer is higher for less experienced readers when DCE is included [65]. Thus, most of the included studies trying to answer the primary research question additionally omitted analyzing indication #3 and thus their findings are much harder to interpret for the population of interest.
In summary, without complete and transparent reporting of information concerning indications #1-3 of DCE, it is impossible to stratify studies, pool results in meta-analyses, and confidently answer whether bpMRI can replace mpMRI via the omission of DCE. Future meta-analyses and original studies are encouraged to analyze data and include papers that address indications 1-3 for DCE.

Study Heterogeneity
It is a challenging task to compare these diagnostic studies solely based on the statistical metrics as most studies used different definitions of clinically significant cancer. Moreover, since there is a lack of a uniform approach before deciding on a biopsy, it would be impractical to comment on the diagnostic performance of the aforementioned methods. However, it could be concluded from the vast ranges of the metrics that variation is huge, which might be explained by the difference in the experience of the readers and the diversity of the biopsy and clinically significant cancer criteria adopted among the studies. It should also be kept in mind that some of the cancers could have gone unnoticed by the studies that used biopsies (n = 23) as a reference standard method rather than radical prostatectomies. Therefore, we can conclude that high variability exists in both diagnostic methods in terms of sensitivity, specificity, and AUC value. Implementing more standardized approaches in the future might make it possible to compare the diagnostic performance of bpMRI and mpMRI.

Limitations
Our study has four main limitations. First, using a 5-year cutoff as an inclusion criterion prevented the addition of older studies who might have answered the primary research question better through more transparent reporting. However, the concept of bpMRI was introduced into the prostate MRI research world relatively recently and the roles/responsibilities of the mpMRI sequences have changed over the years. Thus, including studies using PI-RADS v1 would have been a limitation. Second, because of the inclusion and exclusion criteria, the sample size for the studies was relatively small (n = 41). Third, many of the studies did not explicitly state the proportion of PI-RADS 3 patients, which was instead calculated by percentages given in the tables or within the text. This limitation was also pertinent when searching for the proportion of the PI-RADS 3 patients that had PCa or csPCa. Fourth, there was also the possibility of overlap between the patient cohorts of the included studies, but our aim was to demonstrate the heterogeneity of the papers matching our inclusion criteria.

Conclusions
This review sought to more thoroughly investigate study heterogeneity and reporting regarding primary DCE indications in studies trying to answer the primary research question: Can bpMRI replace mpMRI, via the omission of DCE, in the screening and assessment of clinically significant prostate cancer without diminishing diagnostic sensitivity/accuracy? The PI-RADS v2.1 indications for DCE are: (1) determining PI-RADS 3 lesions that include clinically significant prostate cancer; (2) assisting readout of MRIs with suboptimal diagnostic quality for T2WI and DWI sequences; and (3) assisting radiologists with relatively low experience in reading prostate MRIs. These indications have been endorsed as emerging quality metrics for prostate MRI and are especially important when discussing the utility of DCE [26][27][28]. In summary, most papers that try to answer the primary research question omit results/discussions relating to indications 1-3 and as a result their conclusions and interpretations, taken both independently and combined in systematic reviews, are weakened. For the studies that fulfilled most of the DCE indications 1-3, it can be concluded that the utility of DCE is very definition-dependent. The definition of csPCa as GS ≥ 7 (3 + 4) and a cancer significance threshold of PI-RADS > 3 tend to show the benefit of DCE, as compared to other definitions of csPCa or thresholds. Additionally, none of these studies mentioned including scans of low quality, with some explicitly excluding low-quality MRIs, which is a potential area of strength for DCE. Thus, the conclusions regarding the utility of DCE in most published studies should be taken with special consideration of the population under study, with current evidence supporting its equivalent use specifically for images of all-good diagnostic quality. Prospective evaluation that incorporates and reports on these indications and emerging quality metrics is needed to truly answer the question under debate. It is the authors' view that future reviews and studies aiming to answer the primary research question should transparently report all data related to indications 1-3 for the most complete and accurate conclusions regarding the utility of DCE. Moreover, there is a need for a standardized reporting system designed for bpMRI; without it, such heterogeneity is likely to persist in future studies.
Author Contributions: Ensuring the integrity of the entire study: all authors. Concept and design: all authors. Study conception/study design: all authors. Data analysis: all authors. Writing and revision: all authors. Statistical analysis: all authors. All authors have read and agreed to the published version of the manuscript.

Funding:
The full study protocol can be accessed in the online supplementary material. This project has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. Additional research support was provided by the NIH Medical Research Scholars Program, a public-private partnership supported jointly by the NIH and contributions to the Foundation for the NIH from the Doris Duke Charitable Foundation, the American Association for Dental Research, and the Colgate-Palmolive Company.