The Diagnostic Performance of Tumor Stage on MRI for Predicting Prostate Cancer-Positive Surgical Margins: A Systematic Review and Meta-Analysis

Purpose: Surgical margin status in radical prostatectomy (RP) specimens is an established predictive indicator for determining biochemical prostate cancer recurrence and disease progression. Predicting positive surgical margins (PSMs) is of utmost importance. We sought to perform a meta-analysis evaluating the diagnostic utility of a high clinical tumor stage (≥3) on magnetic resonance imaging (MRI) for predicting PSMs. Method: A systematic search of the PubMed, Embase databases, and Cochrane Library was performed, covering the interval from 1 January 2000 to 31 December 2022, to identify relevant studies. The Quality Assessment of Diagnostic Accuracy Studies 2 method was used to evaluate the studies’ quality. A hierarchical summary receiver operating characteristic plot was created depicting sensitivity and specificity data. Analyses of subgroups and meta-regression were used to investigate heterogeneity. Results: This meta-analysis comprised 13 studies with 3924 individuals in total. The pooled sensitivity and specificity values were 0.40 (95% CI, 0.32–0.49) and 0.75 (95% CI, 0.69–0.80), respectively, with an area under the receiver operating characteristic curve of 0.63 (95% CI, 0.59–0.67). The Higgins I2 statistics indicated moderate heterogeneity in sensitivity (I2 = 75.59%) and substantial heterogeneity in specificity (I2 = 86.77%). Area, prevalence of high Gleason scores (≥7), laparoscopic or robot-assisted techniques, field strength, functional technology, endorectal coil usage, and number of radiologists were significant factors responsible for heterogeneity (p ≤ 0.01). Conclusions: T stage on MRI has moderate diagnostic accuracy for predicting PSMs. When determining the treatment modality, clinicians should consider the factors contributing to heterogeneity for this purpose.


Introduction
According to the latest data for 2021, prostate cancer (PCa) is the most prevalent cancer among men, making it the second leading cause of death [1]. Although radical prostatectomy (RP) is the recommended treatment for prostate cancer, 20% of individuals who have RP surgery experience positive surgical margins (PSMs) that are detected by pathology [2]. This is a recognized negative prognostic indicator for PCa.
PSMs are generally defined as tumor cells reaching the inked surgical margin of the prostatectomy specimen [3]. PSMs are associated with local disease recurrence and distant

Materials and Methods
This meta-analysis was conducted and written according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. The study protocol was registered with INPLASY (INPLASY202370012).

Literature Search
Two researchers employed a systematic search strategy using PubMed, the Cochrane Library, and Embase to find articles reporting on studies evaluating the value of MRI to predict PSMs. Articles published between 1 January 2000 and 31 December 2022 were included in the searches. We applied the search strategy based on medical topic headings, free words, and their variants. The literature retrieval process had no language restrictions. The detailed retrieval strategy is described in the Supplementary Materials. The reference lists for articles and reviews containing combinations of search strings were checked.

Inclusion Criteria
We included articles reporting on diagnostic accuracy studies if (1) accuracy was assessed for PSMs using non-organ localized diseases observed on MRI as the index test among PCa patients, (2) if the histopathology of RP specimens served as the reference standard, (3) if studies had enough information to develop a 2 × 2 table to evaluate the diagnostic accuracy, and (4) the if article type was an "original article" or equivalent.

Exclusion Criteria
The exclusion criteria were (1) studies with a variety of topics, such as the diagnostic usefulness of other MRI findings for PSM prediction, (2) the reference standard was not RP specimen, (3) insufficient necessary data for meta-analytic pooling, and (4) publication type other than the original article.

Data Extraction
The extracted data include the first author, year of publication, study characteristics, demographic characteristics, imaging characteristics, numbers of true/false positives, and true/false negatives.

Methodologic Quality Assessment
Study quality was evaluated utilizing the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) tool [15]. The risk of bias for each study was assessed in four ways: (1) patient selection, (2) index test, (3) reference standard, and (4) flow and timing. Bias risk was rated as low, high, or unclear.

Data Synthesis and Analysis
Diagnostic tables including true positives, false negatives, false positives, and true negatives were used to calculate sensitivity and specificity. The bivariate random effects model was used to assess the diagnostic efficacy indices including pooled sensitivity, specificity, and their 95% confidence intervals (CIs) [16]. The hierarchical receiver operating characteristic (HSROC) summary curve analysis with the area under the receiver operating characteristic curve (AUC) was plotted to exhibit the diagnostic precision [17]. Deeks' funnel plot asymmetry test was used to identify publication bias [18]. Significant heterogeneity was indicated by the following criteria: p < 0.05 in Cochrane's Q test and an I2 ratio >50%. For evaluation of the heterogeneity between studies, the potential effects of several covariates were investigated by subgroup analysis. The covariates included (1) study design (retrospective vs. prospective), (2) area (Asia vs. non-Asia), (3) use of minimally invasive techniques (laparoscopic vs. non-laparoscopic), and robot assistance (robot-assisted vs. not robot-assisted), (4) prevalence of high Gleason scores (≥7) on the biopsy (≥50% vs. <50%), (5) magnetic field strength (3 T vs. not 3 T), (6) use of endorectal coils (ERCs), (7) functional MRI technology (MRI sequences using apparent diffusion coefficients and dynamic contrast enhancement (DCE) vs. not), (8) the number of radiologists (multiple vs. single), and (9) the number of cases (≥150 vs. <150).

Literature Search and Article Selection
The PRISMA flow diagram depicts the article screening process, as shown in Figure 1. Initially, after systematic searching, 544 articles were obtained in the PubMed, Embase, and Cochrane Library databases. Fifty-seven duplicates were removed. The remaining 487 titles and abstracts were filtered, leaving 54 potentially eligible papers for full-text screening. After the full-text review, 13 articles [4,10,12,[19][20][21][22][23][24][25][26][27][28] were acceptable for this systematic review. The reasons for exclusion were (1) insufficient data to generate 2 × 2 tables (n = 2), (2) evaluation of other MRI findings as predictors of PSMs (n = 25), and (3) the reported study did not assess the outcomes of interest (n = 14). Ultimately, 13 articles (3924 patients) were included in this meta-analysis. Table 1 summarizes the demographic characteristics, first author, sample size, year of publication, area, age, prostate-specific antigen levels, number of patients with high biopsy Gleason score and proportion and mode of biopsies, and DRE results. Study characteristics included study design, consecutive patient selection, NeuroSAFE techniques, whether minimally invasive techniques were used, reference standard, the interval between imaging and surgery, and surgeons ( Table 2). Table 3 summarizes imaging characteristics including magnetic field strength, number and experience of radiologists, blinding, use of ERCs, MRI vendors, and imaging sequence details, including functional DCE MRI and diffusion-weighted imaging (DWI), as well as T2-weighted imaging (T2WI).  Table 1 summarizes the demographic characteristics, first author, sample size, year of publication, area, age, prostate-specific antigen levels, number of patients with high biopsy Gleason score and proportion and mode of biopsies, and DRE results. Study characteristics included study design, consecutive patient selection, NeuroSAFE techniques, whether minimally invasive techniques were used, reference standard, the interval between imaging and surgery, and surgeons ( Table 2). Table 3 summarizes imaging characteristics including magnetic field strength, number and experience of radiologists, blinding, use of ERCs, MRI vendors, and imaging sequence details, including functional DCE MRI and diffusion-weighted imaging (DWI), as well as T2-weighted imaging (T2WI)

Study Characteristics
The sample sizes of individual studies ranged from 48 to 1145. One study was prospective in design [25]; the remaining twelve were retrospective. ERCs were used in three studies [19,24,28]. For blinding, two papers reported that the radiologists were aware that patients had biopsy evidence of PCa [19,24], and three articles did not specify blinding [4,12,20]. In four studies, only 1.5 T scanners were used [12,23,24,28], compared with six      The sample sizes of individual studies ranged from 48 to 1145. One study was prospective in design [25]; the remaining twelve were retrospective. ERCs were used in three studies [19,24,28]. For blinding, two papers reported that the radiologists were aware that patients had biopsy evidence of PCa [19,24], and three articles did not specify blinding [4,12,20]. In four studies, only 1.5 T scanners were used [12,23,24,28], compared with six studies wherein MRI was performed using 3.0 T scanners. One study used both 1.5 T and 3.0 T scanners [10]. The use of the NeuroSAFE technique was mentioned in four studies [12,20,21,24].

Quality Assessment
In terms of study quality, the overall quality was moderate. The majority of research in the field of patient selection had retrospective designs, which were seen as having an unclear risk of bias. Due to the reader's exposure to the reference tests during the interpretation of the index tests in two trials, the index domain was of high concern [19,24]. Six studies were labeled as unclear regarding the applicability of index tests due to a lack of information presented about index test characteristics, interpretation, and blinding [4,10,12,20,22,28]. All studies were deemed to have a low risk of bias in the reference test domain since they all used RP specimens as the reference standard. For the flow and timing domain, all of the included studies were found to have a high risk of bias. The quality of the included investigations determined by QUADAS-2 rules is shown in Figure 2.

Diagnostic Performance of Non-Organ-Confined Disease on MRI for the Detection of PSMs
According to forest plots (Figure 3), the pooled sensitivity and specificity were 0.40 (95% CI, 0.32-0.49) and 0.75 (95% CI, 0.69-0.80), respectively. Figure 4 shows HSROC plots with an AUC of 0.63 (95% CI, 0.59-0.67). Heterogeneity was found, according to the Q test (p < 0.05). The Higgins I2 statistics revealed significant heterogeneity for specificity (I2 = 86.77%) and moderate heterogeneity for sensitivity (I2 = 75.59%).    Table 4 illustrates the results of the investigation of the causes of the pooled variability using subgroup analysis. In general, sensitivity was comparable (I2 = 75.59%), but specificity (I2 = 86.77%) showed significant variations.  Table 4 illustrates the results of the investigation of the causes of the pooled variability using subgroup analysis. In general, sensitivity was comparable (I2 = 75.59%), but specificity (I2 = 86.77%) showed significant variations. Among all the investigated covariates, eight covariates (all but study design and area) were revealed as significant factors contributing to heterogeneity (p < 0.01).

Discussion
We performed a meta-analysis focusing on the diagnostic value of the high clinical T stage on MRI as a predictive indicator for PSMs. As far as we are aware, this is the first assessment of the diagnostic precision of preoperative MRI staging to predict PSMs. There were 13 articles included with a total of 3924 participants. Our meta-analysis demonstrated the T3 stage on MRI to have a moderate diagnostic performance for predicting pathologically positive margins with suboptimal sensitivity but relatively high specificity. The pooled sensitivity and specificity were 0.40 (95% CI, 0.32-0.49) and 0.75 (95% CI, 0.69-0.80), respectively, with an AUC of 0.63 (95% CI, 0.59-0.67). Overall, MRI had a relatively low diagnostic performance for the presence of PSMs on specimens after RP. To some extent, this reflects the resolution limitations of multiparametric MRI pulse sequences and our practice of conservative readings [29]. The T3 stage is a significant pathological characteristic of PCa, which raises the likelihood of PSMs and biochemical recurrence [30]. However, identifying the T3 stage from MRI is often considered to be subjective. The criteria depend on the reader's expertise, especially the extracapsular extension score-which is mainly based on qualitative and subjective features-so relatively few reproducible imaging criteria are involved [12,31]. A weak interobserver agreement may also contribute to suboptimal diagnostic performance.
Notably, the T3 stage on preoperative MRI is not linearly related to the presence of histopathological PSMs. However, the T3 stage might be important because interpreting these findings for urologists may alter the anatomical plane and surgical approaches. Knowledge of the location and extent of a tumor on MRI may enable a surgeon to more precisely decide the treatment modality. When nerve sparing is the goal, rates of PSMs tend to be higher, so surgeons prefer to perform non-nerve-preserving techniques to achieve negative surgical margins when non-organ-localized diseases are observed on MRI [10,19,24]. Therefore, the benefit of MRI staging lies in the correct selection of patients for neurovascular bundle (NVB)-preserving surgery, rather than technically preventing PSMs [23]. When selecting the best patients for active surveillance or selecting RP candidates, high sensitivity is required to preserve NVBs. On the other hand, taking a conservative approach could result in PCa patients being unnecessarily excluded from NVB-sparing surgery; the relatively high specificity determined by our analysis may provide some auxiliary information for clinicians.
Our meta-analysis demonstrated that prospective experimental designs appear to be more sensitive and specific than retrospective designs. The use of high field strength (3.0 T) appears to be useful for sensitivity improvement. Unsurprisingly, high field strengths may theoretically reflect extracapsular tumors better than low field strengths due to the detection of subtle capsular irregularities or small extraprostatic tumors that demand a high spatial resolution for detection, and this is more available at higher magnetic strengths, despite the claim by May et al. (38) that using ERC can significantly increase the signalto-noise ratio and may lead to improved discrimination of surgical margin status [26]. Some preliminary studies have shown that MRI with ERC MRI cannot accurately forecast PSM, it is only effective for a small subset of high-risk high-vascular tumors, for which radiological interpretations vary widely [32][33][34]. Similarly, our analysis indicated that ERCs do not appear to be beneficial for increased sensitivity or specificity compared with surface magnets. Among the studies included in our analysis, six of the seven that used field strengths of 3.0 T did not use ERCs. Notably, 3.0 T phased-array MRI has been shown to be equivalent to 1.5 T intrarectal MRI with no significant loss of image quality [35]. Meanwhile, long patient preparation means significant time-to-scan protocols [36]. These factors undermine the utility of ERCs, leading many centers to no longer routinely use ERCs for diagnostic protocols.
We analyzed the influence of MRI-specific functional sequences. When additional functional technology (including DWI and DCE) was used, sensitivity was significantly improved compared with when only T2WI was used (0.47 vs. 0.29). Possible causes might be that mpMRI provides additional details regarding gland anatomy and tumor location and allows for a thorough assessment of anatomical obstacles and unfavorable tumor features that raise the likelihood of PSM [37]. By using a combination of T2W-MRI with functional sequences, the delineation of the tumor and normal prostate boundaries can be improved [31].
Laparoscopic and robot-assisted minimally invasive techniques have grown incredibly popular during the past 10 years. With robot-assisted surgery, the use of high-resolution cameras with three-dimensional imaging and robotic arms, which enable surgeons to execute more precise dissection of anatomic structures, may improve the preservation of functional structures and reduce PSMs [38]. However, when minimally invasive techniques are used, there is a lack of tactile examination of the prostate to assess the degree of aggressiveness of the malignancy, and excision may result in PSMs and lead to an increased reliance on imaging to determine staging. In the absence of tactile feedback, MRI becomes an effective technique for assessing the degree of nerve retention [39]. In the present analysis, minimally invasive techniques were observed to cause significant differences in specificity and sensitivity, which implied positive margins in laparoscopic RP, or robot-assisted RP may differ from the negative prognostic significance of positive margins associated with open RP.
The prevalence of high-risk tumors is another important factor in the causation of heterogeneity. Studies with more high-Gleason tumors (>50%) had significantly higher sensitivity and specificity values compared with those without (<50%). This could be because high-grade PCa was associated with lower prostate volume and longer diameters of lesions on MRI [40,41]. PCa is detected earlier in patients with enlarged prostates because of the elevated levels of prostate-specific antigen produced by the enlarged tissue, and PCa diagnosed in small glands may be more aggressive and associated with more unfavorable histopathological findings. Patients with small prostates had a higher rate of PSMs [42]. More chaotic gland lumen typically occurs in tissues with higher Gleason scores [43]. This means that the higher the Gleason score, the more disrupted the glandular structure will be, resulting in a more disorganized glandular lumen, which can be reflected by ADC and DWI [44].
A recent meta-analysis [45] noted that significantly lower PSM rates were reported after 50-60 cases, reaching a plateau at 150-350 cases. Compared to low-volume centers (cases <150), high-volume centers (cases ≥150) appear to have improved diagnostic sensitivity and specificity. This may prove that high-volume compared to low-volume centers are generally associated with more favorable surgical outcomes.
Our analysis included some drawbacks. First, many studies did not provide enough details about all study characteristics, and we could not adequately account for the heterogeneity. For example, information on blind and image interpretation methods is often not available when interpreting index tests. Second, subjectivity is unavoidable to some extent when interpreting the T3 phase on MRI because it is observer-dependent. Features that are more objective, quantitative, and repeatable might be more appropriate.

Conclusions
Despite the poor sensitivity of the T stage on preoperative MRI for predicting pathological PSMs, its relatively high specificity may have additional value in helping to select candidates for active surveillance or function-preserving therapy by avoiding underestimating the disease. Despite the reliability of MRI in identifying cancer lesions and boundaries, determining the aggressiveness of prostate cancer and predicting the outcome of the surgery remains challenging. Therefore, efforts should focus on further refining the characteristics of PSMs to help identify patients who are most prone to disease development and progression. Further studies are warranted to investigate more objective and reproducible variables as potential indicators of surgical margin status.