The Paris System for Reporting Urinary Cytology: A Meta-Analysis

The Paris System (TPS) for Reporting Urinary Cytology is a standardized, evidence-based reporting system, comprising seven diagnostic categories: nondiagnostic, negative for high-grade urothelial carcinoma (NHGUC), atypical urothelial cells (AUC), suspicious for high-grade urothelial carcinoma (SHGUC), HGUC, low-grade urothelial neoplasm (LGUN), and other malignancies. This study aimed to calculate the pooled risk of high-grade malignancy (ROHM) of each category and demonstrate the diagnostic accuracy of urine cytology reported with TPS. Four databases (PubMed, Embase, Scopus, Web of Science) were searched. Specific inclusion and exclusion criteria were applied, while data were extracted and analyzed both qualitatively and quantitatively. The pooled ROHM was 17.70% for the nondiagnostic category (95% CI, 0.0650; 0.3997), 13.04% for the NHGUC (95% CI, 0.0932; 0.1796), 38.65% for the AUC (95% CI, 0.3042; 0.4759), 12.45% for the LGUN (95% CI, 0.0431; 0.3101), 76.89 for the SHGUC (95% CI, 0.7063; 0.8216), and 91.79% for the HGUC and other malignancies (95% CI, 0.8722; 0.9482). A summary ROC curve was created and the Area Under the Curve (AUC) was 0.849, while the pooled sensitivity was 0.669 (95% CI, 0.589; 0.741) and false-positive rate was 0.101 (95% CI, 0.063; 0.158). In addition, the pooled DOR of the included studies was 21.258 (95% CI, 14.336; 31.522). TPS assigns each sample into a diagnostic category linked with a specific ROHM, guiding clinical management.


Introduction
Urine cytology is a safe and cost-effective diagnostic test showing suboptimal sensitivity yet high specificity to diagnose urothelial cancer [1]. Reasons to perform it include the initial evaluation of unexplained hematuria, a history of occupational exposure, or the follow-up of patients with previous diagnosis of urothelial cancer [2]. Bladder cancer is the most prevalent urothelial malignancy, whereas upper urinary tract cancers are relatively rare [3,4]. The former most often presents as a non-muscle invasive disease, either of low or high grade. Most patients recur after therapy, while some progress to muscle-invasive bladder cancer [5,6].
The Paris System (TPS) for Reporting Urinary Cytology is a standardized, evidencebased system that is applicable for either voided or instrumented specimens, and also for specimens sampled from both the lower and upper urinary tract. It was developed to standardize reporting, facilitating the communication among pathologists and between pathologists and clinicians [7,8]. TPS focuses on the diagnosis that is the most clinically important, the high-grade urothelial carcinoma (HGUC). It comprises seven diagnostic 2 of 15 categories: nondiagnostic, negative for high-grade urothelial carcinoma (NHGUC), atypical urothelial cells (AUC), suspicious for high-grade urothelial carcinoma (SHGUC), HGUC, low-grade urothelial neoplasm (LGUN), and other primary or secondary malignancies [7]. TPS also supports the use of ancillary techniques (e.g., UroVysion FISH) for indeterminate interpretations [7,9].
Since the implementation of TPS, no meta-analysis has been published to summarize the experience collected worldwide with this reporting system. The main outcomes of this study were to: 1.
Calculate the pooled risk of high-grade malignancy (ROHM) of each of the categories of TPS.

2.
Display the diagnostic accuracy of urine cytology reported with TPS, by: a. Creating a pooled summary ROC (sROC) curve and subsequently estimating the pooled sensitivity and false-positive rate. b.
Calculating the pooled Diagnostic Odds Ratio (DOR).

Search Strategy
This meta-analysis was performed following the guidelines set by the Preferred Reporting Item for Systematic Review and Meta-Analysis (PRISMA) Statement [10]. We comprehensively searched the literature for articles reporting on TPS on four databases (PubMed, Embase, Scopus, Web of Science) until 30 August 2020, using the following search term: "Paris system" AND (urin* OR cytopathology OR cytology)". The PubMed database search was updated to add any additional studies published until February 2021, using the same term. No filters were applied, such as text availability, article type, or publication date. Duplicates were removed using the Paperpile reference manager (https://paperpile.com/app) (accessed on 30 August 2020), while the remaining records were uploaded into the Rayan App (https://www.rayyan.ai/) (accessed on 30 August 2020) for title-abstract selection [11].

Study Selection
We constructed our review question using the mnemonic PIRD (Population; Index test, Reference test, Diagnosis of interest) [12], where the "diagnosis of interest" was HGUC or other malignancies. The following inclusion criteria were applied: Three authors (I.P.N, Z.K. and M.K.) independently selected all relevant articles, while any disagreements were resolved with a consensus. The study selection was first performed in a title-abstract fashion with Rayyan, followed by a full screening of all Rayyan-eligible articles.

Data Extraction
The following data were extracted on an Excel ® spreadsheet: first author, year, country, study design, study period, specimen type (voided, instrumental, or both), urine cytology location (upper, lower urinary tract, or both), cytopreparation type (conventional, liquidbased cytology (LBC), or both), time of TPS classification (at initial Dx, reclassification of cases reported with another system), clinical setting (initial Dx, surveillance, or both), reference standard (histology, follow-up cytology, or both), total number of included cases and cases with follow-up, and total number of included patients and patients with followup (Table 1). Data concerning the prevalence of high-grade malignancy were extracted for each of the categories of TPS; HGUC and other malignancies were grouped together under a single category, as many studies reported these results together. To calculate the ROHM, diagnoses of both HGUC and other malignancies with the reference standard were considered as positive outcomes. Lastly, true positive (TP), true negative (TN), false positive (FP), and false negative (FN) data were extracted from each study. For this analysis, "nondiagnostic" TPS interpretations were excluded. Cases with the interpretations "NHGUC", "AUC", and "LGUN" were considered as cytologically negative, whereas "SHGUC", "HGUC", and "other malignancies" were considered as cytologically positive. For the histologic follow-up, only high-grade malignancies (HGUC; other malignancies) were considered as positive outcomes. Thus, a case with a cytologic interpretation of "SHGUC" or "HGUC was regarded as TP when histology revealed HGUC or another malignancy (e.g., prostate carcinoma); if not (e.g., histology outcome was non-neoplastic or even LGUN), it was regarded as FP. Any disagreements of the authors were resolved by a consensus.

Study Quality Assessment
Study quality assessment was performed with the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool, under the following domains: patient selection; index test; reference standard; and flow and timing [12,41]. Risk of bias was assessed as low, unclear, or high. Results are shown in Table S1.

Statistical Analysis
We performed a prevalence and a diagnostic accuracy meta-analysis. In the first, we calculated the pooled ROHMs of each TPS category, while in the second, we constructed the sROC curve and assessed the pooled DOR. For the prevalence meta-analysis, a random intercept logistic regression model was applied. Heterogeneity was measured with tau 2 , Q, and I 2 . I 2 levels > 50% indicate at least moderate heterogeneity, while levels > 75% indicate high levels of heterogeneity [42]. In addition, a continuity correction of 0.5 was applied in studies with zero cell frequencies. The sROC curve was constructed using both a proportional hazards approach [43] and a bivariate model [44]; "sensitivity" was put on the vertical, while "false-positive rate" on the horizontal axis of the curve. The Area Under the Curve (AUC) was then calculated to evaluate the discriminatory power of urine cytology reported with TPS. AUC values normally range from 0.5 (no discrimination) to 1 (perfect test) [45]. The log DOD of the index test was also calculated using the extracted TP, TN, FP, and FN data from each eligible study, using a random effects model. To investigate potential causes of heterogeneity, subgroup analyses were performed for the variables "specimen type", "urine cytology location", and "cytopreparation type". Furthermore, sensitivity analyses were performed for the variables "study design", "time of TPS classification", and "follow-up type". The analysis was performed with R, version 4.0.3 (R Foundation for Statistical Computing, Vienna, Austria).

Literature Search
The flowchart of this meta-analysis is shown in Figure 1. The initial search identified 644 studies (PubMed, 116; Embase, 224; Scopus, 102; Web of Science, 202), of which 383 were duplicates. The additional PubMed search added 12 more studies, resulting in a total 273 articles for screening in a title-abstract fashion. Of them, 41 were considered as eligible for full-text evaluation. After excluding 13 more articles at this step, 28 articles were included in this review. Whereas all 28 studies were included in the ROHM analyses, only 23 of them-with adequate data to create 2 × 2 contingency tables-were used for the diagnostic accuracy analyses.

Characteristics of the Included Studies
The main characteristics of the included studies are shown in Table 1. All studies were published between 2016 and 2021, and were performed worldwide, most commonly in the USA (n = 11). All but one had a retrospective design. The study period ranged from 1 year to 10 years and 5 months. Most studies examined both voided and instrumented samples (n = 15), from both the lower and upper urinary tract (n = 11), while they were processed with LBC (n = 15) rather than conventional cytology (n = 10). Less studies used TPS at the time of initial diagnosis (n = 12), whereas most reclassified their initial reported results to TPS for their particular study (n = 16). Follow-up was mainly provided by histology (n = 25), while three studies used both histology and follow-up cytology (n = 3). In the risk of bias evaluation (Table S1), no study was considered of low risk in all four QUADAS-2 domains. For instance, in the "patient selection" domain, some of the studies

Characteristics of the Included Studies
The main characteristics of the included studies are shown in Table 1. All studies were published between 2016 and 2021, and were performed worldwide, most commonly in the USA (n = 11). All but one had a retrospective design. The study period ranged from 1 year to 10 years and 5 months. Most studies examined both voided and instrumented samples (n = 15), from both the lower and upper urinary tract (n = 11), while they were processed with LBC (n = 15) rather than conventional cytology (n = 10). Less studies used TPS at the time of initial diagnosis (n = 12), whereas most reclassified their initial reported results to TPS for their particular study (n = 16). Follow-up was mainly provided by histology (n = 25), while three studies used both histology and follow-up cytology (n = 3).
In the risk of bias evaluation (Table S1), no study was considered of low risk in all four QUADAS-2 domains. For instance, in the "patient selection" domain, some of the studies considered as having a high risk of bias used the number of cases with follow-up, rather than patients, for their analysis (some patients had more than one case). In the "reference standard" domain, the studies were considered to be of unclear bias, as histology was most likely performed with the knowledge of the index test (urine cytology) results. In addition, in the "Flow and Timing" domain, the three studies that used a different reference standard among their cases [13,14,40] were considered as having a high bias risk. Notably, when the risks were compared between studies that used LBC versus the ones used conventional cytology, no significant differences were found except for the category "nondiagnostic"; this had a ROHM of 6.41% (95% CI, 0.0181; 0.2035) in LBC and of 50.00% (95% CI, 0.3228; 0.6772) in conventional cytology (Tables S2 and S3).  Figure 2 shows the sROC of the included studies, constructed with both the proportional hazards model approach and the bivariate model, respectively. The AUC was 0.849, while the pooled sensitivity was 0.669 (95% CI, 0.589; 0.741) and the false-positive rate was 0.101 (95% CI, 0.063; 0.158). In addition, the DOR of the included studies was 21.258 (95% CI, 14.336; 31.522) (Figure 3). Of interest, the DOR of conventional cytology (21.805 (95% CI, 11.353; 41.881)) was almost identical with that of LBC (21.208 (95% CI, 11.180; 40.228)) (Figures 4 and 5).

Discussion
TPS is a standardized reporting system that facilitates communication among physicians and guides urology patients' clinical management [1,7]. From its implementation, it has been shown to enhance correlation with histology, especially when the low urinary tract is sampled, while decreasing the indeterminate diagnoses [46,47]. Indeed, a few studies have demonstrated that TPS has reduced the rate of atypical interpretations reported in their departments [48][49][50][51]. This finding has a great clinical significance, as before the implementation of TPS, many urologists were regarding atypical cases as negative [6]. However, to enhance its sensitivity, some points for future TPS improvement have been pointed out, including the description of the hypochromatic HGUC [52], low-n/c-ratio HGUC [53], and plasmacytoid and micropapillary HGUC variants [54], besides the redefining the diagnostic criteria for the upper urinary tract, as the current ones miss a few positive cases [53,55].
This study first aimed to calculate the pooled ROHM of the categories of TPS. We combined data from all eligible studies published until February 2021. The ROHM ranged from 13.04% (95% CI, 0.0932; 0.1796) for the NHGUC to 91.79% (95% CI, 0.8722; 0.9482) for the HGUC and other malignancies. Notably, the ROHM for the AUC category was calculated at 38.65% (95% CI, 0.3042; 0.4759), prompting a close follow-up and potential ancillary testing with FISH or other modalities, such as UroSEEK, to better stratify such cases [1,9,56,57]. One reason why the ROHM of the SHGUC and HGUC categories was not closer to 100% could be the tendency of cytopathologists to overestimate the N/C ratio, as has been reported in the literature [58,59].
Our study also aimed to assess the diagnostic accuracy of urine cytology using TPS. We used the ROC method as our primary analysis, from which we calculated the AUC, in addition to the pooled sensitivity and false-positive rate. The AUC was 0.849, while the pooled sensitivity was 0.669 (95% CI, 0.589; 0.741). Two meta-analyses concerning the diagnostic performance of urine cytology have been published, combining the data published before the publication of TPS [7]. Xie et al. reported the pooled sensitivity of cytology detecting bladder cancer was 0.37 (95% CI, 0.35; 0.39), while the AUC was 0.80 [60]. Luo et al. specified their analysis on LBC and noted the pooled sensitivity was 0.58 (95% CI, 0.51; 0.65) and AUC 0.83 [61]. Both these meta-analyses pooled data from studies published before the implementation of TPS; in contrast, we included only TPS-based articles. We also found that the DOR of conventional cytology was 21.805 (95% CI, 11.353; 41.881),

Discussion
TPS is a standardized reporting system that facilitates communication among physicians and guides urology patients' clinical management [1,7]. From its implementation, it has been shown to enhance correlation with histology, especially when the low urinary tract is sampled, while decreasing the indeterminate diagnoses [46,47]. Indeed, a few studies have demonstrated that TPS has reduced the rate of atypical interpretations reported in their departments [48][49][50][51]. This finding has a great clinical significance, as before the implementation of TPS, many urologists were regarding atypical cases as negative [6]. However, to enhance its sensitivity, some points for future TPS improvement have been pointed out, including the description of the hypochromatic HGUC [52], low-n/c-ratio HGUC [53], and plasmacytoid and micropapillary HGUC variants [54], besides the redefining the diagnostic criteria for the upper urinary tract, as the current ones miss a few positive cases [53,55].
This study first aimed to calculate the pooled ROHM of the categories of TPS. We combined data from all eligible studies published until February 2021. The ROHM ranged from 13.04% (95% CI, 0.0932; 0.1796) for the NHGUC to 91.79% (95% CI, 0.8722; 0.9482) for the HGUC and other malignancies. Notably, the ROHM for the AUC category was calculated at 38.65% (95% CI, 0.3042; 0.4759), prompting a close follow-up and potential ancillary testing with FISH or other modalities, such as UroSEEK, to better stratify such cases [1,9,56,57]. One reason why the ROHM of the SHGUC and HGUC categories was not closer to 100% could be the tendency of cytopathologists to overestimate the N/C ratio, as has been reported in the literature [58,59].
Our study also aimed to assess the diagnostic accuracy of urine cytology using TPS. We used the ROC method as our primary analysis, from which we calculated the AUC, in addition to the pooled sensitivity and false-positive rate. The AUC was 0.849, while the pooled sensitivity was 0.669 (95% CI, 0.589; 0.741). Two meta-analyses concerning the diagnostic performance of urine cytology have been published, combining the data published before the publication of TPS [7]. Xie et al. reported the pooled sensitivity of cytology detecting bladder cancer was 0.37 (95% CI, 0.35; 0.39), while the AUC was 0.80 [60]. Luo et al. specified their analysis on LBC and noted the pooled sensitivity was 0.58 (95% CI, 0.51; 0.65) and AUC 0.83 [61]. Both these meta-analyses pooled data from studies published before the implementation of TPS; in contrast, we included only TPS-based articles. We also found that the DOR of conventional cytology was 21.805 (95% CI, 11.353; 41.881), being almost identical with that of LBC (21.208 (95% CI, 11.180; 40.228)). Morphology of HGUC has been reported to be similar between conventional cytology and LBC [62]. Furthermore, they have not shown a significant difference concerning their sensitivity and specificity for diagnosing SHGUC or HGUC [63].
This study has some important limitations. Most studies were of small size, retrospective in nature, and with variability in their follow-up periods. A few of the eligible studies showed high risk of bias, especially in the "patient selection" domain of the QUADAS-2 tool. In addition, there was verification bias as the reference test was histology, which most likely enhanced the sensitivity and the ROHM in the nondiagnostic, NHGUC, and AUC categories [64,65]. As with most meta-analyses of diagnostic accuracy, our study also exhibited significant heterogeneity [12]. We applied subgroup and sensitivity analysis to assess the effect of a few variables, yet were unable to define its cause.
Academic cytopathologists have studied and debated the use of TPS, which is also a common topic at society meetings. However, general pathologists signing out cytopathology as well as clinicians may question the value of this new classification, since it seemingly has few differences compared to the conventional four-tiered system ("negative"; "atypical", "suspicious", and "positive") most often used before the implementation of TPS. This metanalysis-the first one evaluating the diagnostic performance of urine cytology with TPS and assigning a pooled ROHM for each one of its reporting categories, guiding clinical management-could help them understand the general benefit of this evidence and consensus-based classification system. For example, many urologists before the implementation of TPS tended to regard "atypical" urine cytology as negative, as this interpretation was being used very often by pathologists [6]. Nevertheless, TPS focuses on what is more important, which is the detection of HGUC [1,7]. Thus, it has established strict criteria for each one of its categories, including AUC, aiming to identify HGUC rather than LGUN, resulting in a frequency reduction in the "atypical" interpretations compared to the pre-TPS era [48][49][50][51]. Of interest, the pooled ROHM of the AUC reporting category in our meta-analysis was found to be 38.65% (95% CI, 0.3042; 0.4759), which should warrant close clinical follow-up and/or the use of ancillary testing [1,7], rather than being regarded as negative.

Conclusions
We performed a meta-analysis to calculate a pooled ROHM for each TPS category and the diagnostic accuracy of urine cytology while applying this system. We hope our findings will be useful to pathologists and guide clinicians to select the best management plan for their patients.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/jpm12020170/s1. Table S1: Risk of Bias of the studies included in the meta-analysis, according to the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2). Table S2. Pooled risk of high-grade malignancy (ROHM) associated with each of the Paris System categories. Subgroup analysis of the studies using solely liquid-based cytology (LBC). Table S3. Pooled risk of high-grade malignancy (ROHM) associated with each of the Paris System categories. Subgroup analysis of the studies using solely conventional cytology.

Conflicts of Interest:
The authors declare no conflict of interest.