Urinary Biomarkers for Diagnosis and Prediction of Acute Kidney Allograft Rejection: A Systematic Review

Noninvasive tools for diagnosis or prediction of acute kidney allograft rejection have been extensively investigated in recent years. Biochemical and molecular analyses of blood and urine provide a liquid biopsy that could offer new possibilities for rejection prevention, monitoring, and therefore, treatment. Nevertheless, these tools are not yet available for routine use in clinical practice. In this systematic review, MEDLINE was searched for articles assessing urinary biomarkers for diagnosis or prediction of kidney allograft acute rejection published in the last five years (from 1 January 2015 to 31 May 2020). This review follows the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) guidelines. Articles providing targeted or unbiased urine sample analysis for the diagnosis or prediction of both acute cellular and antibody-mediated kidney allograft rejection were included, analyzed, and graded for methodological quality with a particular focus on study design and diagnostic test accuracy measures. Urinary C-X-C motif chemokine ligands were the most promising and frequently studied biomarkers. The combination of precise diagnostic reference in training sets with accurate validation in real-life cohorts provided the most relevant results and exciting groundwork for future studies.


Introduction
The growing call for precision medicine justifies a trend shift towards the implementation of new prognostic and diagnostic biomarkers in many fields of medicine. Progress in molecular and biomarker technology now permits the possibility to tailor and customize clinical and therapeutic approaches to the specific needs of a single patient, for a variety of medical conditions. Kidney diseases are no exception, with biomarkers constantly gaining more ground in the management of acute kidney injury (AKI), glomerulopathies, and chronic kidney disease (CKD) [1][2][3]. Also, in the setting of kidney transplantation, precision medicine is rapidly moving forward, with biomarkers a significant part of this trend. In kidney transplantation, biomarkers have been studied for early recognition and diagnosis of disease recurrence, delayed graft function (DGF), infections, and acute and chronic allograft rejection [4]. Since the 1970s, biomarkers have been studied for organ quality assessment The main reason for exclusion was the evaluation of a different outcome instead of AR (e.g., chronic or late rejection, graft dysfunction, graft failure). Experimental studies, one trial protocol, and one letter were also excluded. A total of 38 remaining articles, published between 1 January 2015 and 31 May 2020, were finally included. No additional articles were included from the reference lists. The main reason for exclusion was the evaluation of a different outcome instead of AR (e.g., chronic or late rejection, graft dysfunction, graft failure). Experimental studies, one trial protocol, and one letter were also excluded. A total of 38 remaining articles, published between 1 January 2015 and 31 May 2020, were finally included. No additional articles were included from the reference lists. Table 1 summarizes the most relevant characteristics of each of the 38 included studies. Twenty-nine were single center studies, while nine were multicenter collaborations. Among the 32 studies assessing urinary biomarkers in the diagnosis of acute rejection, the majority of them, 18/32, were designed as case-control, while 14/32 as cross-sectional studies. Among the six studies assessing the ability of urinary biomarkers to predict acute rejection at variable time-points before the episode, five studies analyzed prospectively collected data and one study analyzed retrospectively collected data. Population characteristics, patient selection protocols, inclusion and exclusion criteria, as well as sample size, were heterogeneous between studies. Sample size varied from 15 to 396 kidney transplant patients, and occasionally more than one urine sample was obtained from the same patient. Although all of the included studies were published after January 2015, some of them enrolled patients transplanted from the early 2000 s. Table 1 also highlights the year of study population enrolment to help the identification of possibly overlapping populations and to assess the appropriateness of the Banff classification used in each study. The majority of the included studies applied up to date Banff classification, with ten studies apparently using the 1997 version or not reporting the year. Using an outdated classification could be a source of bias mostly for studies assessing ABMR, as discussed later. For one study, the application of the Banff classification was not clearly stated. Studies were also heterogeneous in terms of the considered outcome. Among the 38 included studies, 15 specifically addressed TCMR only, whereas just two exclusively focused on ABMR. Sixteen studies assessed the combination of TCMR and ABMR, while five studies did not specify the characteristics of the observed acute rejection (Table 1).

Biomarkers
The included studies were split into diagnostic and prediction studies. To be considered a diagnostic study, the collection of a urine sample should be performed on the day that AR was suspected or when per-protocol biopsies were planned. There were a few exceptions to this rule where sample collection occasionally occurred up to seven days before biopsy. For prediction studies, urine samples were collected at any time point post-transplant and, in this analysis, these ranged from day one up to six months post-transplantation. Among the various techniques for targeted analysis of known urinary biomarkers, ELISA and RT-PCR were the most frequently utilized. Mass spectrometry, nuclear magnetic resonance spectroscopy, liquid chromatography, RNA expression, and transcriptome analysis by RNA-Seq were employed for unbiased metabolomics, proteomics, and genomic profiling and for detection and identification of urinary exosome proteins. All biomarkers are detailed per category in Table 2.

Reference Standard
Tinel [16] , , , Yang [17] , , , , / , / / , , Galichon [44] , / ? , , , , Sigdel [45] , , , The most frequent reasons for high risk of bias were the selection of the study population by case-control design, which was the case for the majority of the studies, the exclusion of the typical confounding of a real-life setting, the absence of threshold definition and independent validation. For example, when control patients were selected among stable patients without performing allograft biopsy, or only among normal histology patients, and the obtained thresholds were not tested in a randomly selected validation group, the study was highlighted for high risk of bias in patient selection and index test (Table 3). This then raised the possibility of an increased risk of over-fitting association and unrealistic DTA performance and, therefore, concerns for applicability. The ideal control patients were randomly (or in a cross-sectional fashion) selected, all having had an allograft biopsy (per indication or per protocol) with various histological diagnosis (e.g., normal histology; acute tubular necrosis, ATN; interstitial fibrosis and tubular atrophy, IFTA; chronic allograft nephropathy, CAN; BK virus nephropathy, BKVN; recurrence of the primary disease on the allograft). Only 5/29 studies were found to have a low risk of bias in both patient selection and index test. Allograft histology, according to Banff classification, was the reference standard for AR diagnosis, with histology grading usually assigned in a blinded fashion with respect to the index test results. Since urinary samples were frequently obtained for all included patients, prior to a diagnostic allograft biopsy, and all included patients were evaluated in the DTA analysis, a low risk of bias was frequently identified in the flow and timing domain. The QUADAS-2 tool does not include publication bias (PB) as one of the variables and, in the context of this review, it is difficult to formally assess PB. Given the broad variety of different biomarkers that were assessed and the absence of a meta-analysis, performing formal PB assessment such as Egger's test, Deek's test or the construction of a funnel plot was not possible. It is also recognized that the assessment of PB in data synthesis of DTA data is challenging with limited reliability [55].

Summary of the Results
Tables 4-6 provide a detailed summary of each study results. When DTA analysis was available (29 studies), the results are summarized in Table 4 for diagnostic studies (24/38) and Table 5 for prediction studies (5/38). Descriptive results from the remaining nine studies are briefly reported in Table 6. For each DTA study, the particular outcome of interest and characteristics of the control population are reported with sample size for each group included in the final DTA analysis. The urinary biomarker of interest, thresholds (when available) and test design (training, validation, or particular comparisons between groups) are also detailed. For prediction studies, time from transplantation to urinary biomarker analysis is also reported (between 1 day to 6 months post-transplantation). Sensitivity, specificity, positive and negative predictive values (PPV, NPV), and area under the receiver operating characteristic curve (AUC) are reported as measures of diagnostic test accuracy when available (Tables 4 and 5) and results are in bold text when arising from validation cohorts. Results confirmation in at least one validation cohort was available in less than one third of studies (7/29, 24%). Of these, two were case-control studies [25,33], while the others were the previously mentioned five cross-sectional studies with the lowest risk of bias score [16,17,21,41,45]. Sensitivity and specificity values were highly variable between studies, ranging from 9% to 100% and from 34% and 100%, respectively. PPV and NPV were also variable, ranging from 15% to 98% and from 32% to 100%, respectively.

Acute Rejection Diagnosis
Among studies with the lowest risk of bias, only three studies [16,17,45] yielded a very good (0.8-0.9) or excellent (> 0.9) performance as diagnostic AUC ( Table 4). All of these studies provided diagnostic accuracy measure for the diagnosis of AR, considering both TCMR and ABMR as outcome of interest. Tinel et al. found that the combination of urinary CXCL9 and CXCL10 could distinguish AR patients among almost three-hundred heterogeneous patients with an AUC of 0.70 [16]. These results strengthened the good performance previously described, among dysfunctional allografts, separately for CXCL9 (AUC 0.71) and CXCL10 (AUC 0.74) by Rabant and colleagues [51]. Yang et al. separately validated the so-called Q score in two validation cohorts for the diagnosis of AR. A Q score ≥ 32 maintained an excellent diagnostic performance (AUC 0.96) also when validated in the entire study population (n = 364), with high PPV and NPV (87-98%) [17]. Banas et al., after identifying a urinary metabolite signature with good diagnostic performance for TCMR [27], validated it in a cohort of 109 patients for the diagnosis of AR with and AUC of 0.71 [21]. Through unbiased metabolomics, Sigdel et al. identified a signature of eleven urinary peptides able to segregate AR patients from normal histology, chronic allograft nephropathy and BK virus nephropathy patients with an excellent AUC of 0.94 in validation cohort [45]. The same authors proved a urine cell sediment gene expression-based score (uCRM score) able to diagnose AR with 96.6% accuracy and potentially quantify the degree of injury [24].

T-Cell-Mediated Rejection Diagnosis
The previously mentioned study by Tinel et al. also provided separate outcome analysis for CXCL9 and CXCL10, with the best performance for TCMR diagnosis with a NPV of 98% and a very good AUC of 0.81 [16]. Also of note, CCL2, at a threshold level of 198 pg/mL, yielded very good performance (AUC 0.81) for TCMR identification among a population of 300 normal and dysfunctional grafts in the study by Raza et al. [43]. Urinary exosome proteins were investigated in two case-control studies for the diagnosis of TCMR. Lim et al. found significantly higher urinary tetraspanin-1 (TSPAN1) and hemopexin (HPX) expression levels in TCMR patients with good diagnostic performance (AUC 0.74) [28], while Park et al. reported the initial results of an optimized integrated kidney exosome analysis (iKEA) able to distinguish TCMR from normal histology patients, with a very good performance (AUC 0.84) in a small validation cohort [33].

Antibody-Mediated Rejection Diagnosis
The study from Blydt-Hansen et al. was the only one to specifically evaluate the diagnostic performance for ABMR diagnosis [41]. The authors tested and validated the use of the ABMR score, with a good sensitivity (78%) and specificity (83%), NPV of 96%, a good performance (AUC 0.76 in validation), and the ability to provide a stratification from negative-indeterminate-to positive ABMR patients [41].

Acute Rejection, TCMR, and ABMR Prediction
Among prediction studies (Table 5), high risk of bias was often identified for patient selection and index test. However, good performances for AR prediction were obtained by three months post-transplant for CXCL9 and CXCL10 levels [26], and seven days and one month post-transplant for TNF-alpha levels [32]. The well-conducted study by Rabant et al. found both urinary CXCL9 and CXCL10, adjusted for urinary creatinine concentration, to have high NPV (89 to 93%) for AR at one and three months post-transplantation. CXCL10 yielded the best predictive performance (AUC 0.72) at one month post-transplantation, at the threshold of 2.79 ng/mmoL [50]. For TCMR prediction, post-transplant CXCL10 and miR-155-5p levels yielded positive results [34], while for ABMR prediction six months albuminuria was investigated [42].

Discussion
With this systematic review, we critically summarize the results of the last five years research, the latest advances, and highlight the most frequent limitations of studies assessing urinary biomarkers for the diagnosis or prediction of acute allograft rejection. We focused on study design, distinction between TCMR and ABMR setting, evaluation of confounding (e.g., DGF, infections, calcineurin inhibitors nephrotoxicity), comparison with the gold standard of diagnosis (both for cases and controls), and presence of estimates of the biomarker(s) performance in validation.
The main finding was the strengthening in evidence for the clinical utility of urinary C-X-C motif chemokine ligands (in particular for the diagnosis of TCMR) alone or in combination with other biomarkers as in the Q score (cell-free DNA, methylated cell-free DNA, clusterin, total protein, creatinine, and CXCL10) or in the CTOT-4 formula. CXCL9 and CXCL10 had AUC ranging from 0.67-0.88 with a NPV ranging from 84-98% for AR diagnosis and AUC ranging from 0.50-0.97 with a NPV ranging from 71-96% for AR prediction. Signatures of urinary peptides and metabolites identified through unbiased proteomic and metabolomics, and a cluster of urinary cell pellet genes (uCRM score) were also established for the diagnosis of AR, net of some limitations for their introduction in clinical practice. Confounding outcomes need always to be considered due to potential overlap in diagnosis. For example, urinary chemokines are also elevated in allograft BK virus nephropathy (as discussed below), urinary NGAL was proposed as early predictor of DGF [56], and as a biomarker of CNI toxicity [57], while urinary miRNAs dysregulation has been linked to interstitial inflammation and tubular atrophy [58]. For the first time Tinel and colleagues demonstrated that considering (instead of excluding) potential confounding factors (i.e., urinary tract infection and BK virus reactivation) in a diagnostic multi-parametric model could optimize its performance [16]. A model combining eight parameters (recipient age, sex, eGFR, DSA presence, signs of urinary tract infection, BKV blood viral load, CXCL9, and CXCL10) could reach AR diagnosis with high accuracy (AUC: 0.85, 0.80-0.89), paving the way for new studies combining urinary biomarkers with clinical characteristics to reach the highest clinical relevance and provide targeted therapy for our patients.
Up to 2015, almost ninety non-redundant molecules were identified as urinary biomarkers of AR, participating in different pathways such as complement activation, antigen presentation, and inflammation signaling [15]. Urine was the most frequent matrix of choice for these analyses, and studies were often limited by small sample size and case-control design, no histology in the control cohorts, lack of confounding adjustment, lack of a validation set, and technical difficulties with procedure standardization and costs [15]. Although serum creatinine levels and proteinuria monitoring are well established biomarkers used by transplant physicians to suspect AR, they lack both sensitivity and specificity, and they are of little help in the prediction phase, in detecting subclinical rejection, and in differential diagnosis between AR, infections, drug toxicity, and acute tubular necrosis [14,59]. In a study of 281 consecutive biopsies, indicated by an increase in serum creatinine levels, only 27.8% revealed any sign of AR [51]. Conversely, subclinical rejection (i.e., rejection without clinical dysfunction) was found in over 40% of patients with normal renal function in the presence of anti-HLA de novo donor-specific antibodies (DSA) [60]. Proteinuria is common after kidney transplant and, although widely used as a biomarker of renal disease and despite its value as an independent predictor of long-term graft survival, it could also be sign of post-transplant primary disease recurrence (e.g., focal-segmental glomerulosclerosis), infections (e.g., CMV), immunosuppressive medication toxicity, or systemic (e.g., new-onset diabetes) and urologic complications (e.g., ureteral stenosis) [59,61]. DSA monitoring is currently considered the primary biomarker for ABMR but, despite the increasing ability to detect low level of DSAs, their positive predictive value is low, so that up to 60% of patients showing de novo DSA do not show any sign of AR at biopsy [60].
Continuous advances in molecular techniques and the "-omics" sciences have helped to identify many potential new blood and urine biomarkers for the diagnosis and prediction of kidney allograft AR in the last two decades. Of note, elevated pretransplant serum CXCL9 and CXCL10 levels were found to be associated to increased risk of early and severe AR and graft failure [62][63][64]. Subsequently, among urine-derived proteins, a 2012 study found CXCL9 and CXCL10 to be considerably elevated in patients experiencing either AR (clinical or subclinical) or BK virus infection (86% sensitivity and 80% specificity for CXCL9; 80% sensitivity and 76% specificity for CXCL10), but they were not able to distinguish between the two conditions [65]. These results were reinforced by the 2013 CTOT-1 study, which found that low urine CXCL9 measured at 6 months post-transplant identified a subset of patients at low-risk for AR development (92% NPV for Banff ≥1A TCMR) and predicted allograft stability up to 24 months post-transplant (93-99% NPV) [66]. With the help of mass spectroscopy, elevated beta2-microglobulin levels were identified as strongly correlated with AR (83% sensitivity, 80% specificity, 89% PPV, 71% NPV) and then validated by ELISA in the urine of AR patients [67]. Cytotoxic proteins perforine and granzyme B urine mRNAs were proposed to noninvasively diagnose AR (respectively with 83% sensitivity, 83% specificity, and 79% sensitivity, 77% specificity) [68] and Treg marker FOXP3 was shown to predict reversal of AR (90% sensitivity, 73% specificity) [69]. T-cell immunoglobulin-3 domain, mucin domain mRNA expression (Tim-3, also known as hepatitis A virus cellular receptor 2) in urinary cells was found to be able to discriminate AR from other causes of acute graft dysfunction (calcineurin inhibitor nephrotoxicity or interstitial fibrosis and tubular atrophy) with an AUC of 0.96, 89% PPV and 94% NPV [70]. A 2013 multicenter study from the CTOT-4 study group later identified a 3-gene urinary mRNA signature (CD3ε mRNA, CXCL10 mRNA, 18S rRNA) able to discriminate acute TCMR from no rejection in indication biopsies, with an AUC of 0.74, 79% sensitivity and 78% specificity in a validation set [54]. Also, noncoding miRNAs (e.g., miRNA-10a, miRNA-10b, miRNA-210), although limited by the easy degradation, proved to be detectable in the urine, and in particular low miRNA-210 levels discriminated patients affected by AR from stable control transplant patients (74% sensitivity, 52% specificity) [71].
Our systematic analysis of the more recent literature details the accuracy of a variety of urinary biomarkers for allograft AR with the objective of allowing transplant physicians early diagnosis and prediction of rejection episodes, and differential diagnosis with other causes of allograft dysfunction. A correct histologic diagnosis of AR is essential during the process of new biomarkers validation and the Banff criteria are considered the gold standard for biopsy evaluation. The diagnostic criteria for TCMR have essentially undergone no major change in the last decade with lymphocytic infiltrate of tubules (tubulitis) and larger vessels (vasculitis) being the main descriptive features. The severity of these lesions is graded according to the degree of lymphocytic infiltrate per high-powered field. On the other hand, ABMR criteria has continuously evolved in recent years-thus highlighting the great importance of applying an up to date classification in this setting-with the recognition of its variable histologic presentation [72,73]. Original criteria established in 2000s included active tissue injury, immunohistologic evidence of peritubular capillary complement split-product C4d deposition and circulating DSA. Subsequent studies demonstrating the presence of ABMR also in lacking detectable C4d staining biopsies [74], pushed the Banff Working Group in 2013 to the major change in the ABMR criteria, removing the requirement for C4d detection [75]. The most recent changes in 2017 included removing the requirement for documented circulating DSA in the setting of positive C4d staining and microvascular inflammation and included the use of AMR-associated gene transcripts panels [10].
The ideal biomarker should be readily available, accurate, inexpensive, standardized, repeatable, and noninvasive and would be useful to reduce the need for protocol biopsy and enable early targeted intervention. The chance of finding an ideal biomarker with high sensitivity, specificity, PPV and NPV is small. However, not all biomarkers need to be highly sensitive and highly specific at the same time, depending on the clinical question they are going to answer. Therefore, targeting specific populations and accepting lower predictive values in certain variables may be a better strategy. For example, to confirm the need for allograft biopsy in a population at high risk for AR (thus providing biopsy to the correct patients), a test with high sensitivity, and low false negative rate, would be the most useful. On the contrary, to propose diagnostic biopsies in a population at low risk for AR (thus avoiding unnecessary per-protocol biopsies), a test with high specificity, and low false positive rate, would be the test of choice. Also, TCMR and ABMR are different clinical entities and it is unrealistic, on current evidence, to hope for a biomarker that will accurately predict AR in both forms in a typical population of transplant patients with possible confounding.
Our systematic review has some limitations. The heterogeneity of the included studies did not permit to detail the many facets of individual study results, especially the more complex ones, to stick with the systematic review question. For space restraints, tables only report the major findings of each study, limited to urinary biomarkers. A narrative synthesis of the most promising results was applied to improve readability and a meta-analysis could not be performed. From our work, overall good quality studies emerged, many with DTA analysis and some comprising a thorough validation process yielding a very good to excellent diagnostic performance. Although specific forms of bias were assessed using QUADAS-2 publication bias could not be formally assessed and the authors acknowledge this can overestimate the weight of positive results. Weaknesses of the included studies were often the use of small cohorts obtained by case-control selection yielding inflated predictive values, the exclusion of confounding, unclear or out of date Banff classification application, the absence of validation cohorts, and lack of hypothesis-driven approach. In fact, the biomarker discovery process should not only consist of a training phase (i.e., a case-control study), but also comprise independent validation in a prospective study and confrontation with real-life clinical setting.

Literature Search
This review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [76]. The objective of the study, search strategy, inclusion and exclusion criteria, and study evaluation method were planned in advance, refined, and approved by all authors. MEDLINE was searched from 1 January 2015 to 31 May 2020. Key terms like "kidney", "renal", "transplant/transplantation", "urine/urinary", "marker/biomarker", and "rejection" were combined in the search strategy. Additional relevant articles were searched from scanning reference lists of included studies and added if not detected by the original literature search.

Selection Process
The first screening by title and abstract was separately performed by two authors (F.G., L.C.) in the eligibility process. Original articles were selected if they assessed one, more, or a combination of urinary biomarker(s) and their performance in diagnosis or prediction of kidney allograft AR. Abstracts, reviews, studies assessing biomarkers from other matrix (e.g., blood samples or histology staining), and studies specifically evaluating different outcomes (e.g., chronic rejection, infection, or allograft survival) were excluded. In the inclusion process, selected articles were then independently full-text reviewed by two authors (F.G., L.C.). Any disagreement between the two investigators was discussed and solved with the help of all authors.

Data Collection and Analysis
Data from each of the included studies were collected with the help of a pre-specified spreadsheet and extraction table refined by all authors. Study design, single or multicenter patient collection, sample size, years of enrollment, urinary biomarker(s) of interest (i.e., index test), the Banff classification used for histological AR diagnosis (i.e., reference standard) and the addressed outcome(s) were collected in a descriptive table. Studies were distinguished between diagnostic and predictive. Diagnostic studies were usually collecting urine samples on the day of the diagnostic biopsy while predictive studies were analyzing urine samples collected before AR development. Studies that reported DTA data, such as sensitivity, specificity, PPV, NPV, and AUC were evaluated for risk of bias and applicability concern using the Quality Assessment Tool for Diagnostic Accuracy Studies-2 (QUADAS-2), a tool for quality evaluation of diagnostic accuracy studies [77]. The most important items for a positive evaluation included; a cross-sectional study design; avoiding patient selection bias and inappropriate exclusion; the definition of the index test (biomarker) threshold in a training set and its validation in a separate set of patients; and compliance with the correct histological definition of AR as a standard reference for all patients included in the analysis. Due to the great heterogeneity of the included studies, a meta-analysis was not performed, and a narrative synthesis of the results was preferred.

Conclusions
In recent years, numerous studies joined the challenging quest for urinary biomarkers in diagnosis and prediction of acute kidney allograft rejection. Authors must face the difficult task to allow for mediating between the need for a precise setting and reference standard diagnosis (to develop the most precise biomarkers), and the need for their validation in the most heterogeneous population of kidney allograft patients (to increase clinical utility). Urinary chemokines CXCL9 and CXCL10, alone or in combination with others, are the most frequently used and the most promising biomarkers, but multi-parametric clinical and laboratory models could represent the best strategy for future studies. Remarkable advances have been made on the path of allowing a more precise allocation of resources, helping clinicians to move from the standard protocol/indication biopsy dichotomy, to reduce unnecessary immunosuppression, and to improve kidney allograft outcomes in the long-term.
Author Contributions: Conceptualization, investigation, resources, F.G. and L.C.; methodology, data curation, formal analysis, visualization F.G.; writing-original draft preparation, F.G. and L.C.; writing-review and editing, E.B., F.B., C.E., R.M.R., J.P.H.; supervision J.P.H., P.R.; funding acquisition, P.R. All authors made a significant contribution to the content of this manuscript as per ICJME recommendations. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Acknowledgments:
We thank the Meyer Children's Hospital and the Meyer Children's Hospital Foundation.

Conflicts of Interest:
The authors declare no conflict of interest.