Diagnostic Performance of Biomarkers for Bladder Cancer Detection Suitable for Community and Primary Care Settings: A Systematic Review and Meta-Analysis

Simple Summary Bladder cancer (BC) is one of the most common cancers worldwide. Early-stage diagnosis is associated with better survival rates and, as such, the timely referral of suspected cases is paramount. Urinary biomarkers have been developed to aid diagnosis, and are largely tested in patients who have been referred for further investigation. Evidence, however, on their diagnostic performance for both detecting and ruling out BC, especially in the general population, is limited. In this review, we systematically identified studies reporting on the diagnostic performance of biomarkers suitable for use in primary and community care settings. Three biomarkers, with relatively little difference in diagnostic performance between them, and some novel biomarkers were identified showing potential to be used as a triage tool in such settings. While promising, further validation studies in the general population are needed. Abstract Evidence on the use of biomarkers to detect bladder cancer in the general population is scarce. This study aimed to systematically review evidence on the diagnostic performance of biomarkers which might be suitable for use in community and primary care settings [PROSPERO Registration: CRD42021258754]. Database searches on MEDLINE and EMBASE from January 2000 to May 2022 resulted in 4914 unique citations, 44 of which met inclusion criteria. Included studies reported on 112 biomarkers and combinations. Heterogeneity of designs, populations and outcomes allowed for the meta-analysis of three biomarkers identified in at least five studies (NMP-22, UroVysion, uCyt+). These three biomarkers showed similar discriminative ability (adjusted AUC estimates ranging from 0.650 to 0.707), although for NMP-22 and UroVysion there was significant unexplained heterogeneity between included studies. Narrative synthesis revealed the potential of these biomarkers for use in the general population based on their reported clinical utility, including effects on clinicians, patients, and the healthcare system. Finally, we identified some promising novel biomarkers and biomarker combinations (N < 3 studies for each biomarker/combination) with negative predictive values of ≥90%. These biomarkers have potential for use as a triage tool in community and primary care settings for reducing unnecessary specialist referrals. Despite promising emerging evidence, further validation studies in the general population are required at different stages within the diagnostic pathway.


Introduction
Bladder cancer is the tenth most commonly diagnosed cancer worldwide, with 573,000 new cases and 213,000 deaths in 2020, ranking 14th in terms of cancer-associated mortality [1]. About two-thirds of patients with bladder cancer present with haematuria, a cardinal symptom for urological tract cancers including bladder cancer [2]. However, only visible haematuria has a high positive predictive value for bladder cancer, with pooled incidence reported to be as high as 17-18% in some populations [3]. Patients presenting with non-visible haematuria and other urological symptoms, such as lower urinary tract symptoms, may cause diagnostic challenges. This is because these symptoms are common in the general population and are more likely to be due to benign causes rather than cancer [4]. Identifying tools to improve the diagnostic pathway may improve diagnostic timeliness, and therefore outcomes, for patients with bladder cancer.
Empirical evidence suggests that there is scope to improve timely diagnosis and reduce missed diagnostic opportunities in symptomatic patients who are subsequently diagnosed with bladder cancer [5]. The diagnostic pathway for bladder cancer involves a combination of investigations, from urine tests in the community and primary care to specialist investigations such as upper urinary tract imaging (including ultrasound and Computed Tomography (CT) scans), urine cytology and cystoscopy. The latter remains the gold standard for bladder cancer detection in patients investigated following haematuria [6,7]. Disadvantages of these tests, such as the poor sensitivity of ultrasounds, radiation exposure associated with CT, and the invasiveness of cystoscopy, can limit their use in the general population [8]. There is, therefore, an urgent need to identify new approaches for improving risk stratification of symptomatic patients to improve the early detection of bladder cancer and reduce the burden of unnecessary investigations for patients.
Urinary biomarkers have been developed to aid detection and early diagnosis of urinary tract cancers including bladder cancer [9]. These have largely been tested in patients presenting with symptoms suggestive of cancer and referred for further investigation, and who are therefore at a higher-than-average risk of having an undetected cancer. Several reviews have been conducted focusing on individual biomarkers, biomarker panels or certain biomarker categories (e.g., proteins) seeking to describe and explore their diagnostic performance and clinical utility for bladder cancer detection [10][11][12][13]. These findings demonstrate that urinary biomarkers have the potential to improve current diagnostic strategies. However, it is not clear exactly when and how they should be used along the diagnostic pathway.
This review aimed to update the evidence on existing biomarkers for bladder cancer detection, their diagnostic performance across different population groups, and explore their clinical utility in the populations and settings studied. The main focus was to identify novel biomarkers for bladder cancer detection that might be suitable for use in the general population in community and primary care settings, often the first point of contact for patients in the healthcare system, hereafter referred to jointly as community settings. For the purpose of this review, general population refers to patients with any baseline risk at the point of presentation, in contrast with patients referred to specialist settings, who are already at a higher risk of cancer.

Methods
This review has been conducted in line with the guidance provided in the Cochrane Handbook for Systematic Reviews of Interventions [14] and reported following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) Guidelines [15] ( Figure S1 in Supplementary Materials). The Review Protocol has been registered on PROSPERO (registration ID: CRD42021258754).

Data Sources
MEDLINE and EMBASE were systematically searched for studies reporting on primary data published in peer-reviewed journals between 1 January 2000 and 24 May 2022. Based on initial pilot searches and findings from previous reviews showing that most studies on novel biomarkers have been published since 2005, 2000 was set as the start date/year of database searches. The search strategy was developed in consultation with an experienced subject librarian. Data sources were supplemented by hand-searching of reference lists of included studies.

Inclusion & Exclusion Criteria
A comprehensive list of inclusion and exclusion criteria to guide searches and study selection (Table S1 in Supplementary Materials) was developed using the PICOS Framework (Population, Intervention, Comparators/Context, Outcomes and Study type) [16]. Any studies i. involving adult patients presenting with clinical signs or symptoms suggestive of bladder cancer, undergoing evaluation but not having received a diagnosis; ii. comprising at least 50 bladder cancer and 50 non-bladder cancer patients; iii. recruiting participants through any healthcare system and setting; iv. reporting on at least one measure of diagnostic performance of biomarkers (either individual, multiple/panels or combinations), were included. We were interested in any biomarker feasible to use in community and primary care settings, i.e., identified in non-invasive samples such as blood (serum or plasma), urine, feces, saliva or breath.
This review was informed by the CanTest Framework, a 5-phase translational pathway for diagnostic tests, from new test discovery to health system implementation in lowprevalence populations [17]. Only studies providing measures of diagnostic accuracy beyond discovery/development (that is, Phase 2 and beyond of this framework) were included ( Figure S2 in Supplementary Materials). As papers were set in different healthcare systems and settings, a diagram was developed to guide decisions on inclusion (adapted from Olesen et al. 2009) [18], using a cut-off point/boundary in the diagnostic pathway of a typical cancer patient. Studies were included up to the point where patients were referred for a first specialist visit but not diagnosed (see Figure 1).

Search Strategy
The search strategy (Table S2 in Supplementary Materials) included a list of key words and database specific subject headings (MeSH or Emtree) relating to each of the main target domains: biomarkers, performance measures, early diagnosis, and bladder cancer, combined and tailored to the relevant database. Searches were limited to include

Search Strategy
The search strategy (Table S2 in Supplementary Materials) included a list of key words and database specific subject headings (MeSH or Emtree) relating to each of the main target domains: biomarkers, performance measures, early diagnosis, and bladder cancer, combined and tailored to the relevant database. Searches were limited to include only outputs published from January 2000 onwards. No restrictions on language or methodological design were applied.

Study Selection
Following deduplication in EndNote 20 (Clarivate Analytics), unique citations were imported to Rayyan-Intelligent Systematic Review software [19]. Title and abstracts were independently screened against inclusion/exclusion criteria by two reviewers (any two of NC/CS/VS, and EP). Full texts of potentially eligible papers identified during title and abstract screening were independently assessed by EP and one other reviewer (VS/AC/CS/EdM/RB/DB/ST/HH/YZ). Any disagreements were resolved through discussion and consensus.

Data Extraction
Data extraction was performed using Microsoft Excel 2015. A data extraction template was developed to record information on study details, validation overview, biomarker characteristics, performance measures including measures for comparison to urine cytology (the standard non-invasive urological evaluation to diagnose bladder cancer), and suitability for community use. When studies reported on different phases of biomarker development, data were extracted only for eligible phases (i.e., biomarkers and measures beyond the discovery phase).
Data extraction was piloted for 10% of the included studies (EP and DB/RB) and the data extraction template was revised accordingly to ensure consistency and accuracy of the information extracted. Data extraction for the remaining 90% of studies was carried out independently by two reviewers (EP and VS/AC/CS/EdM/RB/DB/ST/HH/YZ).

Quality Assessment
QUADAS-2, a tool designed to assess the quality of primary diagnostic studies, was used to assess the risk of bias and applicability of included studies [20]. The tool covers four domains: i. patient selection; ii. index test; iii. reference standards; iv. flow and timing. Each domain comprises a series of signaling questions, aimed at identifying areas of potential bias or concerns over applicability, rated as "high", "low" or "unclear". Quality assessment was performed independently by two reviewers (EP and VS/AC/CS/EdM/RB/DB/ST/HH/YZ). Quality assessment ratings were compared, and any inconsistencies observed were resolved through discussion and consensus.

Data Analysis
Forest plots of accuracy measures (sensitivity and specificity) were produced for all biomarkers investigated in three studies or more, using the mada package in R [21]. For the biomarkers with their performance reported in five or more studies, a meta-analysis was conducted to calculate the Hierarchal Summary Receiver Operating Curve (HSROC) using the bivariate random-effects approach developed by Reitsma et al. 2005 [22] with a linear mixed model [23].
Narrative synthesis, selected for its potential to assess and synthesize heterogeneous and complex evidence in a rigorous and replicable way [24][25][26][27], was also performed for the meta-analysis biomarkers. Reported authors conclusions/recommendations were deductively synthesized, being thematically aligned to effects on patients and clinicians (focusing on acceptability, benefits, and harms), and effects on health care systems (focusing on referral patterns and costs), adapted from Phases 3 to 5 from the CanTest Framework ( Figure S2 in Supplementary Materials). Narrative synthesis was also used to assess and synthesize diagnostic performance of biomarkers examined in fewer than three studies but with a reported high negative predictive value (NPV ≥ 90.0%). This additional analysis aimed to identify novel biomarkers for bladder cancer detection that might be suitable for use in the general population, as a high NPV can provide reassurance that cancer is unlikely to be present [28,29].

Selection Process
Electronic database searches retrieved 6638 records, of which, following deduplication, title and abstract screening, 98 were assessed in full text for eligibility. Fifty-five records were excluded for either being outside the focus of the review (N = 8) or reporting on: (i). discovery-only findings (N = 8); (ii). numbers of cancer/non-cancer cases that were inadequate, unclear, or not possible to calculate (N = 24); (iii). populations already diagnosed with bladder cancer or under surveillance for bladder cancer recurrence (N = 14). One study was excluded due to integrity concerns (N = 1). One additional record was added, following the manual searching of reference lists of included studies, leading to a final sample of 44 studies included in the review ( Figure 2). No studies were excluded during quality assessment.

Study Characteristics
Included studies (N = 44 publications) originated from all continents (Table 1), predominantly Europe (N = 17) including Germany (N = 5), the UK (N = 4), Spain (N = 2), the Netherlands (N = 2), Belgium (N = 1), Denmark (N = 1), France (N = 1), and Sweden/Spain/Netherlands (N = 1). Ten studies (N = 10) originated from Africa, all of which were conducted in Egypt. Six studies (N = 6) were carried out in Asia, including five in China (N = 5) and one (N = 1) in Pakistan. There were four studies (N = 4) originating from the USA and two studies (N = 2) from Australia and/or New Zealand. Finally, for five studies (N = 5), information on country of research was not available. Studies were largely prospective (N = 34) and/or single-centered (N = 24), with different study designs including cohort (N = 15), trials (N = 3), cross-sectional (N = 2), case-control (N = 2), observational (N = 2), and evaluation (N = 1) studies. For nineteen studies (N = 19), there was no information on study design. The most common recruitment setting was in the hospital (N = 29) whereas four studies (N = 4) recruited from more than one setting (i.e., hospital and community). No information on recruitment setting was available for eleven studies (N = 11).

Risk of Bias
Potential sources of bias were identified in all four domains of QUADAS-2 during quality assessment. Flow and timing was the domain in which most studies were assessed as being at high risk (N = 33) for failing to include all recruited patients in their analysis or not specifying whether there was an appropriate interval between index test(s) (the diagnostic test(s) that is/are evaluated against the reference standard) and reference standards (the best available method of determining whether people have a condition). Key sources of bias identified in studies classified as high risk for index test (N = 30) and patient selection (N = 22) included failing to pre-define test thresholds and failing to avoid inappropriate exclusions or not specifying the type of sampling employed. In terms of reference standard, the risk of bias was assessed as being unclear (N = 25) in most studies due to no information being provided on whether the results of the reference standard were interpreted without knowledge of the results of the index test. Concerns over applicability were not identified for any of the included studies ( Figure S3 in Supplementary Materials).

Population Characteristics
Included studies (N = 44) reported on 28,527 participants including 9780 patients with cancer and 18,747 non-cancer patients (Table 1). Based on information from studies in which gender (N = 34) and age (N = 34) were reported, participants were predominantly male (63%), ranging from 18 to 110 years of age, with two studies including minors, one aged 14 and one 15 included as outliers in large samples. Of the 18,747 non-cancer patients, 9611 were further specified to include 2285 normal/healthy patients and 7326 with non-malignant or pre-malignant conditions such as cystitis, urolithiasis, dysuria, urethral stricture, and prostate hyperplasia. No such information was available for the remaining 9136 non-cancer patients. Clinical signs and/or symptoms at first presentation included haematuria (either visible or non-visible) reported in twenty-seven (N = 27) studies, and non-malignant or pre-malignant conditions reported in twenty-three (N = 23) studies. Finally, risks factors identified included smoking (N = 13), ethnicity/race (N = 4) and schistosomiasis, as well as an acute and chronic parasitic disease associated with bladder cancer (N = 9), reported in eighteen (N = 18) studies. Four (N = 4) studies reported on noncancer patients with benign bladder tumors, a history of bladder cancer or bladder cancer diagnosis (included as outliers in large samples according to the set inclusion/exclusion criteria), these being considered as risk factors to developing bladder cancer.    with nontransitional cell origin 3 UroVysion C cases, HC healthy controls, Con controls, NM non-malignant, NA not available, Y yes, N no, VH visible haematuria, NVH non-visible haematuria, UTIs urinary tract infections. 1 mean (SD) reported where applicable, unless otherwise stated, 2 numbers reported only for cases, 3 malignancies/tumors identified following examination, 4 number of participants with prior cancer diagnosis (less than half of included sample), 5 number of all participants recruited (cases and controls reported only for the group of interest), 6 number of all participants enrolled (prior exclusion), 7 also referred to as ImmunoCyt+.

Biomarker Characteristics
Included studies (N = 44) reported on 112 biomarkers (37 individual, 34 multiple/panels and 41 combinations) including biomarker/s and cytology OR biomarker/s and base models OR biomarkers/s and imaging (Table 2). Ninety-six of them (N = 96) were reported in only one study. In terms of category, 52 biomarkers were classified as proteins (including single proteins, combinations of proteins and combinations of proteins with prediction models, cytology, and other tests), 36 as DNAs and 18 as mRNAs (all following the same pattern as in proteins). There were also nine biomarkers combining proteins and mRNAs and six biomarkers combining proteins and DNAs. The discrepancy in the total number of biomarkers per category (N = 121) and the total number of biomarkers reported (N = 112) is due to different biomarker categories pertaining to the same biomarker being reported together (an example of this is Telomerase in Table 2). All biomarkers were sampled from urine-apart from one (CYFRA21-1), which was sampled from both urine and serum using a range of test platforms such as Enzyme-Linked Immunoassay (ELISA), Fluorescence In Situ Hybridization (FISH), lateral flow test, and different types of Polymerase Chain Reaction (PCR).
3.6. Meta-Analyses 3.6.1. Assessing Heterogeneity Forest plots for sensitivity and specificity were produced for all biomarkers reported in three or more studies (NMP-22, UroVysion, uCyt+ (also referred to as ImmunoCyt+), BTAstat and FGFR3) ( Figure S4 in Supplementary Materials). Variation in the accuracy measures (sensitivity and specificity) between studies may be explained by either the use of different thresholds/cut-off values to distinguish between a positive and negative result or heterogeneity in the study design (for example, differences in study setting or study design). The performance of the three biomarkers reported in five or more studies (NMP-22, UroVysion, uCyt+) was summarized by calculating the HSROC, which accounts for the variation in cut-off values (Figure 3).
The twelve studies reporting accuracy measures for NMP22 use four different thresholds (ranging from 3.6-10 IU/mL), so the large range of sensitivities (0.27-0.90) and specificities (0.31-0.98) reported are unsurprising. However, most of the studies fall outside of the 95% confidence region of the HSROC model (Figure 3a), which suggests that other differences between the studies are also causing the variation in performance. This could be due to the diverse study populations in which NMP-22 was tested, including the following categories: country (five in Germany, four in the UK, one in Pakistan, and one in Australia/New Zealand; country of research in three studies was not provided), population age (with some studies enrolling participants of much wider age range than others), ethnicity, symptoms at presentation, and the extent to which risk factors such as smoking were addressed ( Table 1).
The ten studies reporting performance for UroVysion all used the same platform (FISH), which indicates that the range of reported sensitivity (0.38-0.96) may be due to other differences between the studies (the range of reported specificities is, however, relatively small (0.76-0.99)). This is supported by the meta-analysis, which finds that several of the UroVysion studies fall outside the 95% confidence region of the HSROC model (Figure 3b). This could be due to variation in the study population between studies, in terms of the country of research (four in Germany, two in China, two in the USA, one in Belgium; country of research in one study was not provided), age range of included population and symptoms at presentation (Table 1).           The six studies reporting performance for uCyt+ all use the same threshold to determine positive test results ("at least one clear positive cell"), therefore, the relatively narrow range of sensitives (0.62-0.92) and specificities (0.72-0.81) reported is unsurprising. In the HSROC model, most of the studies fall in the 95% prediction region (Figure 3c). This suggests that the studies are relatively homogenous, likely carried out in similar settings and using comparable populations. This finding is supported by examination of the study characteristics (Table 1); all six studies were conducted in Europe (five in Germany and one in France), enrolled populations of similar age ranges (participants aged from 18 to 97) and patients had similar symptoms at first presentation. Forest plots for sensitivity and specificity were produced for all biomarkers reported in three or more studies (NMP-22, UroVysion, uCyt+ (also referred to as ImmunoCyt+), BTAstat and FGFR3) ( Figure S4 in Supplementary Materials). Variation in the accuracy measures (sensitivity and specificity) between studies may be explained by either the use of different thresholds/cut-off values to distinguish between a positive and negative result or heterogeneity in the study design (for example, differences in study setting or study design). The performance of the three biomarkers reported in five or more studies (NMP-22, UroVysion, uCyt+) was summarized by calculating the HSROC, which accounts for the variation in cut-off values (Figure 3).

Overall Performance and Sensitivity Analysis
The HSROC models for NMP-22, UroVysion and uCyt+ are compared in Figure 3d, and summary measures of discrimination (how well the test distinguishes between those with and without bladder cancer) are given. The estimated summary ROC curves show UroVysion has the best discrimination (AUC estimate: 0.876), slightly outperforming uCyt+ (AUC estimate: 0.827) and considerably outperforming than NMP-22 (AUC estimate: 0.748). However, the adjusted partial AUC estimates (accounting for the observed ranges of accuracy measures and normalized) have similar results for all three (0.650, 0.707 and 0.689 respectively). The overlap of the prediction regions (the 95% prediction region of the HSROC model estimates), further demonstrates that in this meta-analysis, no significant differences in discrimination are found between these three biomarkers.
The performance of NMP-22 was reported for two different platforms, ELISA (N = 10) and BladderChek (N = 3). In a sensitivity analysis ( Figure S5 in Supplementary Materials), the performance of NMP-22 across both platforms was compared to the performance for ELISA only (there were insufficient BladderChek studies (n < 5) for a separate HSROC analysis). Although the three BladderChek studies all report relatively high specificities (0.81-0.96)-compared to the ELISA studies (0.34-0.88)-the reported sensitivities are variable (0.26-0.76); the HSROC analysis finds only minimal differences in discrimination between the ELISA-only studies and all NMP-22 studies (adjusted partial AUC of 0.701 and 0.689 respectively).

For Biomarkers Reported in Three Studies or More
In terms of acceptability, two biomarkers (NMP-22, using the BladderChek platform, and BTAstat) were highlighted as operator-independent, simple, and fast to analyze during patient visits, and, therefore, are suitable for use in the outpatient clinic [32,48,59]. However, acceptability could be compromised, as the diagnostic performance of all identified biomarkers was reported to be widely dependent on the severity of haematuria as quantified by urine dipstick analysis (NMP-22 ELISA, UroVysion, uCyt+) [65] the presence or absence of haematuria (BTAstat) [32] and in cases with acute clot retention (FGFR3) [67].
Considering the benefits, all biomarkers were reported to either improve bladder cancer detection or reduce burden on patients and health care providers when used in conjunction with urine cytology [56,59,63]. FGFR3 was observed to efficiently detect bladder cancer in patients with low grade tumors [50] whereas NMP-22 (unspecified test platform) and BTAstat were shown to outperform cytology in detecting G3 tumors, with the former also showing significantly higher detection rates for G1 and G2 tumors [58]. As for harmful results, increased rates of false positive results were reported for four out of six (including NMP-22 ELISA and BladderChek) biomarkers in the following cases: (i). in patients with urinary tract inflammation and/or infection (NMP-22 ELISA [49,55,57] and BTAstat [32,57]); (ii). haematuria (NMP-22 ELISA, uCyt+) [65], or microscopic haematuria (UroVysion) [69]; (iii). atypical urinary cytology and other risk factors such as older age or significant tobacco use (NMP-22 ELISA, uCyt+, UroVysion) [49,69].
In terms of referral patterns, UroVysion and FGFR3 were reported to help with triaging rapid referrals for haematuria [50,61] or to reduce the frequency (uCyt+) [56] or the number of unnecessary cystoscopy/cytology tests (NMP-22 ELISA) [57] in the healthcare systems in which they were assessed. Further details of their specific use within the diagnostic pathway, however, were not provided. Finally, when it comes to costs, BTAstat was reported to have the lowest estimated cost compared to NMP-22 (unspecified test platform), cytology, and flexible and rigid cystoscopy [58] as opposed to UroVysion [69,72] and uCyt+ that were reported to come at increased cost [64].

For Biomarkers with High Negative Predictive Value
A summary of key findings pertaining to novel biomarkers that were investigated in fewer than three studies and which show potential for early detection of bladder cancer in the general population can be found in Table 3. Eight novel biomarkers/tests from all biomarker categories were purposively selected based on their reported high negative predictive value (NPV ≥ 90.0%), indicating their potential use in the general population for triaging patients for further investigations (see Table 3).

Summary of Main Findings
This systematic review identified 44 studies reporting on 112 different biomarkers and combinations for bladder cancer detection. Most of the biomarkers identified were only reported in one study, with only three biomarkers (NMP-22, UroVysion and uCyt+) in a sufficient number of studies (n ≥ 5) to be included in the HSROC calculations. These biomarkers showed similar discriminative ability (adjusted AUC estimates ranging from 0.650 to 0.707). Narrative synthesis revealed the potential of some of these biomarkers for use in the general population, based on their reported clinical utility including diagnostic performance and effects on clinicians, patients, and the healthcare system. Finally, several novel biomarkers showed high negative predictive value indicating their potential for use in the general population presenting in community settings.

Comparison with Existing Literature
The calculated adjusted HSROC revealed small variations in discrimination across the three biomarkers included in the meta-analysis, all of which are well-established FDAapproved biomarkers, that have, either individually or comparatively, been explored in systematic reviews and/or meta-analyses before [13,[73][74][75][76]. Heterogeneity beyond variation in adopted thresholds was also confirmed for these three biomarkers. This reported variation can be largely associated with a range of confounding factors mainly pertaining to the heterogeneity of included studies-this has also been identified as the main limitation in most meta-analyses conducted to date [74,75]. In addition to different probability thresholds, a series of methodological factors including the diversity in study designs and population samples, and the extent to which risk factors were also addressed in the study could potentially be influencing performance variation. Considering population samples in more detail, variation persisted not only in the numbers of participants enrolled but also in the composition of the cohorts studied (e.g., non-cancer patients ranging from healthy general population participants to hospital urology patients with or without benign pathology) and the range of symptoms reported at first presentation. There is, therefore, a risk of spectrum bias given the observed variation in the population samples in which the biomarkers were tested [77,78]. This risk was particularly evident for some of the biomarkers included in the meta-analysis such as NMP-22, which included studies that enrolled heterogeneous populations (particularly in terms of age, ethnicity, symptomatology at presentation, and risk factors). Therefore, extrapolating results to reaffirm the potential applicability of the reported biomarkers in the general population is challenging.
The complementary narrative synthesis aimed to ascertain such applicability, by investigating further the population, contextual and implementation factors. Findings indicated variation similar to that of the meta-analysis and are in line with evidence reported in previous reviews [12,79] and meta-analyses [13,80] about the potential of these biomarkers to effectively supplement cytology in bladder cancer detection or help with appropriate rapid referrals and reduce the number of unnecessary cystoscopies in the studied populations. However, certain barriers, such as diagnostic performance measures being affected by the degree of haematuria [32,65,67] or the inability of certain biomarkers to detect low grade tumors [38,58], were also identified, compromising their utility in the general population. Hence, the reported promising value of these biomarkers needs to be treated with caution.
A number of novel biomarkers (such as ADXBLADDER, CxBladder Triage and Xpert) or combinations (FGFR3 + TERT + HRAS + OXT1 + ONECUT2 + TWIST) were also reported to have high negative predictive values, indicating potential utility in community settings, as reassurance can be provided that cancer is an unlikely outcome [28,29]. This potential was also reaffirmed by the narrative synthesis. However, considering that these biomarkers were tested in either one or two studies only, validation studies in the general population are still required.

Strengths and Limitations
Comprehensive literature searches were performed, strict eligibility criteria were set for study selection and explicit methods were employed for data extraction and data analysis. Heterogeneity, however, is the main limitation of this review, pertaining to various aspects of included studies such as study design, population samples, thresholds used and outcome measures. Such heterogeneity may distort meta-analysis and, as a result, reported results should be interpreted with caution. Another limitation that is relatively common to systematic reviews of biomarker performance is a lack of clarity or a low quality of reporting, with most included studies, when critically appraised, being assessed as either unclear or at high risk of bias in at least one domain. Finally, considering that narrative synthesis was based on original authors' conclusions, an impartial assessment of those results reported as potentially promising might, to some extent, be compromised depending on how positives/negatives of each biomarker were portrayed by authors in different studies.

Implication for Research and Practice
This review identified biomarkers that could potentially be beneficial for use in community settings based on their diagnostic performance. Similar to conclusions from previous reviews [12,79] while there are promising results (particularly regarding high NPVs) for some biomarkers, additional validations are still needed in the community setting. Although an attempt to limit heterogeneity was made by only including patients with a suspicion of cancer at the point of recruitment, it is likely that levels of cancer risk vary even for this group across different studies. Furthermore, the included studies were evaluated as being at a higher risk of bias in more than one QUADAS-2 domain. Therefore, caution is warranted when generalizing performance results.
It is also important to consider the role of the biomarker within the cancer diagnostic pathway. In the general population, there is the need for a test to help better risk stratify patients with urological symptoms to facilitate clinical decision-making regarding the need for referral for subsequent cancer-specific investigations, similar to the use of fecal immunochemical testing for possible colorectal cancer [81][82][83]. We found no studies reporting the use of biomarkers for bladder cancer in this context. To assess the clinical utility of these biomarkers in the community, there is therefore a need to evaluate these biomarkers in the general population, at the pre-referral stage of the diagnostic process.
Finally, evidence on the effects on patients, clinicians, and health care systems (reported in the narrative synthesis) was not widely reported across included studies. Therefore, despite years of biomarker development and testing, implementation and cost-effectiveness (Phases 3-5 in the CanTest framework) are still not often investigated [17]. No single biomarker with excellent diagnostic performance and corresponding implementation data was identified and the current findings do not allow for firm recommendations of any of the identified biomarkers for use in the general population. Novel biomarkers showing promising results need to be further evaluated, preferably prospectively, with consistency regarding populations, care settings and thresholds/cut-off points used.

Conclusions
In conclusion, findings from this systematic review suggest that certain biomarkers show potential to complement or improve current bladder cancer diagnostic strategies.
Limited evidence on novel biomarkers shows that those with high NPV could be promising for use in community settings as a triage tool for appropriate and necessary referrals. More prospective studies are needed to further validate this promising evidence in the general population before establishing the exact place/role of these biomarkers within the diagnostic pathway.

Informed Consent Statement: Not applicable.
Data Availability Statement: The Search Strategy used in this systematic review has been provided in Supplementary Materials. The datasets generated during the study are available from the corresponding author on reasonable request.