Prediction Models of Primary Membranous Nephropathy: A Systematic Review and Meta-Analysis

Background: Several statistical models for predicting prognosis of primary membranous nephropathy (PMN) have been proposed, most of which have not been as widely accepted in clinical practice. Methods: A systematic search was performed in MEDLINE and EMBASE. English studies that developed any prediction models including two or more than two predictive variables were eligible for inclusion. The study population was limited to adult patients with pathologically confirmed PMN. The outcomes in eligible studies should be events relevant to prognosis of PMN, either disease progression or response profile after treatments. The risk of bias was assessed according to the PROBAST. Results: In all, eight studies with 1237 patients were included. The pooled AUC value of the seven studies with renal function deterioration and/or ESRD as the predicted outcomes was 0.88 (95% CI: 0.85 to 0.90; I2 = 77%, p = 0.006). The paired forest plots for sensitivity and specificity with corresponding 95% CIs for each of these seven studies indicated the combined sensitivity and specificity were 0.76 (95% CI: 0.64 to 0.85) and 0.84 (95% CI: 0.80 to 0.88), respectively. All seven studies included in the meta-analysis were assessed as high risk of bias according to the PROBAST tool. Conclusions: The reported discrimination ability of included models was good; however, the insufficient calibration assessment and lack of validation studies precluded drawing a definitive conclusion on the performance of these prediction models. High-grade evidence from well-designed studies is needed in this field.


Introduction
Primary membranous nephropathy (PMN) is a glomerulonephropathy that affects all ethnicities, all regions, and all ages [1]. It has relatively "benign" presentations and was used to be governed for a long time by the rule of thirds, with a third of patients responding to treatments to variable extents, a third progressing to renal insufficiency, and a third undergoing spontaneous remission [2]. Despite of substantial advances in research on underlying mechanisms of PMN in the past twenty decades [3], the treatment remains controversial [1]. The KDIGO 2021 guideline recommends treating PMN according to a risk classification which includes four risk categories [4]. Strategies that can enable clinicians to identify patients who will benefit from treatments would be useful for providing individualized precision therapies while avoiding unnecessary adverse effects.
In response to these unmet needs, several statistical models for predicting prognosis of PMN have been proposed. The Toronto risk score, firstly proposed in 1992, used kidney function and proteinuria variables to predicted the risk of renal failure [5]. Some other prediction models using renal function deterioration as their predicted outcomes have also been proposed [6][7][8][9]; however, most of these prediction models have not been accepted 2 of 11 as widely as the international risk-prediction tool in IgA nephropathy [10] due to the lack of clinical validation. In addition, the published prediction models mainly focused on forecasting renal function deterioration, giving insufficient attention to the disease remission profile after treatments, which is also essential for the management of PMN.
Therefore, we conducted this systematic review and meta-analysis to summarize current prediction models for PMN, aiming to understanding the gap between ongoing studies and clinical needs and providing clues for future investigations in this field.

Data Sources and Searches
A systematic search according to the Preferred Reporting Items for Systematic Review and Meta-Analyses statement [11] was performed for eligible studies published up to 19 September 2022 in MEDLINE via PubMed (from 1946 through September, 2022) and EMBASE (from 1980 through September, 2022). The search terms included text words and medical subject headings relevant to prediction models and primary membranous nephropathy (see Supplementary Table S1). This study was registered on PROSPERO (Identifier# CRD42022363539).

Study Selection
English studies that developed any prediction models including two or more predictive variables were eligible for inclusion. Prediction models might be presented in different forms, including risk scores, equations, on-line calculators, etc. The study population was limited to adult patients with pathologically confirmed PMN. No constraint was imposed on modeling algorithms or publication years.
Two reviewers (G.C.Y. and H.L.M.) independently conducted the review process. Titles and abstracts of all returned records were carefully reviewed. Duplicates, nonoriginal studies (e.g., reviews, editorial commentaries, and correspondence), studies that investigated risk factors instead of prediction models for poor prognosis of PMN, and studies irrelevant to PMN were excluded. Abstracts with sufficient information reported were considered eligible. Reference lists from full-text reviewed publications were also manually scanned to identify any relevant studies. Any discrepancy was adjudicated by a third reviewer (F.Y.L.).

Outcomes
The outcomes in eligible studies should be events relevant to the prognosis of PMN, either disease progression or remission profile after treatments.

Data Extraction and Quality Assessment
Two reviewers (G.C.Y. and H.L.M.) independently extracted data from included studies using a standardized sheet. Disagreements were resolved by the third reviewer (F.Y.L.). Data extracted included authors, publication year, geographical origin, study population, number of patients, age and composition of study population, follow up duration, predicted outcomes, number of predictive variables, statistical modeling approaches, discrimination performance, calibration performance, and report of internal validation and external validation. The discrimination indices included C-statistics, specificity (SPEC), sensitivity (SEN), positive likelihood ratio (PLR), positive predictive value (PPV), negative likelihood ratio (NLR), and negative predictive value (NPV). The calibration indices included results of Hosmer-Lemeshow test and calibration plot.

Critical Appraisal of Included Studies
Two reviewers (G.C.Y. and H.L.M.) independently assessed the risk of bias of the included studies based on the Prediction model Risk Of Bias ASsessment Tool (PROBAST) [12] which contains four domains and twenty signaling questions. The overall risk of bias of an individual study was assessed as low only if all four domains were rated as having low risk of bias.

Data Synthesis and Analysis
The Review Manager (RevMan) (version 5.2, The Cochrane Collaboration, London, UK) and STATA 14.0 (Stata Corporation, College Station, TX, USA) software were used for data synthesis. All analysis procedures were performed through a double-checked process by two reviewers (F.Y.L. and G.C.Y.) to avoid data entry errors. A Nightingale rose chart was generated to illustrate the composition of predictive variables using Microsoft Excel. Discrimination assessment results for studies that had only provided ROC curves without detailed values were extracted from the ROC curves using GetData Graph Digitizer (version 2.2.6) [13]. A summary ROC (sROC) curve with a 95% confidence interval (CI) was generated using the hierarchical summary receiver operating characteristics (HSROC) model [14] to evaluate the pooled discrimination ability of published prediction models for PMN prognosis. Statistical heterogeneity was estimated using I 2 statistic [15], assessed as low if I 2 of <25%, moderate if I 2 ranged from 26% to 75%, and high if I 2 of >75%. The statistical significance was set at p < 0.05. Pooled results of sensitivity and specificity with 95% CIs were also calculated from the HSROC model. Intraclass correlation coefficient (ICC) was used to assess interstudy variations in sensitivity and specificity. A Fagan diagram was used to test the posttest probability [16]. Sensitivity analyses and funnel plots analysis for publication bias were not applicable due to the limited number of studies included in the meta-analysis.

Search Findings
A total of 1684 records were identified from literature searching after removing duplications. Thereafter, 33 citations were kept for full text review after title and abstract screening, among which 10 citations were further excluded due to unavailability of fulltext article. Fifteen articles were excluded for having not reported prediction models or including insufficient data for the meta-analysis, leaving eight studies finally included in this systematic review (see Figure 1).

Study Characteristics
Among these eight studies involving 1237 patients, six studies reported prognostic prediction models on renal function deterioration [5][6][7][8][9]17], while the other two reported prediction models on proteinuria remission [18,19]. Five and three of the eight studies were retrospective and prospective, respectively. The number of patients in each individual study population substantially varied from 57 to 439. The percentages of male patients in individual studies ranged from 44.9% up to 66.7%. The shortest follow-up period was six months. Basic characteristics of included studies were shown in Table 1.
The number of predictive variables in these eight models varied from two to six. The pooled frequencies of reported predictive variables indicated the four mostly reported predictive variables were, in the descending order, proteinuria, serum anti-phospholipase A 2 receptor (PLA 2 R) antibody, renal function, and age ( Figure 2). All other predictive variables had been reported only once. It should be noted only two of the eight studies had reported calibration performance assessments of their prediction models. The lack of internal and external validation was also prominent. A summary of prediction performance assessments was shown in Table 2.

Study Characteristics
Among these eight studies involving 1237 patients, six studies reported prognostic prediction models on renal function deterioration [5][6][7][8][9]17], while the other two reported prediction models on proteinuria remission [18,19]. Five and three of the eight studies were retrospective and prospective, respectively. The number of patients in each individual study population substantially varied from 57 to 439. The percentages of male patients in individual studies ranged from 44.9% up to 66.7%. The shortest follow-up period was six months. Basic characteristics of included studies were shown in Table 1. The number of predictive variables in these eight models varied from two to six. The pooled frequencies of reported predictive variables indicated the four mostly reported predictive variables were, in the descending order, proteinuria, serum anti-phospholipase A2 receptor (PLA2R) antibody, renal function, and age ( Figure 2). All other predictive variables had been reported only once. It should be noted only two of the eight studies had reported calibration performance assessments of their prediction models. The lack of internal and external validation was also prominent. A summary of prediction performance assessments was shown in Table 2.  Abbreviations: eGFR, estimated glomerular filtration rate; HDL-C, high-density lipoprotein cholesterol; PLA 2 R-Ab, PLA 2 R antibody; IgG, Immunoglobulin G; α1m/Cr, α1-microglobulin corrected by creatinine; β2m, β2-microglobulin.

Pooled Discrimination Ability
Studies using proteinuria remission as their predicted outcomes did not allow a metaanalysis due to the limited number. The pooled AUC value of the six studies with renal function deterioration and/or ESRD as the predicted outcomes was 0.88 (95% CI: 0.85 to 0.90; I 2 = 77%, p = 0.006) (Figure 3). The paired forest plots for sensitivity and specificity with corresponding 95% CIs for each of these six studies indicated the combined sensitivity and specificity were 0.76 (95% CI: 0.64 to 0.85) and 0.84 (95% CI: 0.80 to 0.88), respectively (Figure 4). ICCs (95% CI) assessing interstudy variations in sensitivity and specificity were 0.12 (0.00 to 0.30) and 0.03 (0.00 to 0.08), respectively. Assuming a 20% prevalence of renal function deterioration in PMN, the Fagan nomogram showed that the posterior probability of renal function deterioration would be 55% if the predicted outcome was positive, and the posterior probability of the absence of renal function deterioration would be 7% if the predicted outcome was negative ( Figure S1).

Pooled Discrimination Ability
Studies using proteinuria remission as their predicted outcomes di meta-analysis due to the limited number. The pooled AUC value of the si renal function deterioration and/or ESRD as the predicted outcomes was 0.85 to 0.90; I 2 = 77%, p = 0.006) (Figure 3). The paired forest plots for sensiti ificity with corresponding 95% CIs for each of these six studies indicated sensitivity and specificity were 0.76 (95% CI: 0.64 to 0.85) and 0.84 (95% C respectively (Figure 4). ICCs (95% CI) assessing interstudy variations in s specificity were 0.12 (0.00 to 0.30) and 0.03 (0.00 to 0.08), respectively. As prevalence of renal function deterioration in PMN, the Fagan nomogram sh posterior probability of renal function deterioration would be 55% if the come was positive, and the posterior probability of the absence of renal fun ration would be 7% if the predicted outcome was negative ( Figure S1).

Critical Appraisal
All six studies included in the meta-analysis were assessed as high risk of bias according to the PROBAST tool ( Figure 5). The analysis domain was mostly rated as having high risk of bias due to having an event per variable (EPV) ratio of <10, including predictive variables following a "first uni-then multi-" variable regression procedure, or lacking internal validation.

Critical Appraisal
All six studies included in the meta-analysis were assessed as high risk of bias according to the PROBAST tool ( Figure 5). The analysis domain was mostly rated as having high risk of bias due to having an event per variable (EPV) ratio of <10, including predictive variables following a "first uni-then multi-" variable regression procedure, or lacking internal validation.

Discussion
The findings of this systematic review and meta-analysis indicated the published prediction models for PMN were relatively few in number compared with those for other kidney diseases such as acute kidney injury. The pooled discrimination ability of included prediction models was good; however, the insufficient calibration assessments and lack

Discussion
The findings of this systematic review and meta-analysis indicated the published prediction models for PMN were relatively few in number compared with those for other kidney diseases such as acute kidney injury. The pooled discrimination ability of included prediction models was good; however, the insufficient calibration assessments and lack of validation studies precluded drawing a definitive conclusion on the performance of these prediction models. In addition, all included studies suffered from high risk of bias.
Among all predictive variables reported in the included studies, proteinuria and renal function variables were the two most frequently used. This is consistent with literature that indicated serum creatinine and proteinuria were the oldest predictors for risk of progressive kidney disease [1]. The majority of other predictive variables in this systematic review were laboratory findings from serum or urine samples. Serum anti-PLA2R antibody level was the third most frequently used predictive variable, even though it was still considered as a non-validated yet clinical useful predictor [1]. A few included predictive variables are not widely utilized in clinical practice, such as urinary α1 microglobulin corrected by creatinine and neutrophil-to-lymphocyte ratio. It is worth noting that none of these prediction models included variables relevant to therapeutic regimens. Recently, therapy protocols with CD-20 monoclonal antibodies have revolutionized the guideline on the management of PMN [4, [21][22][23][24]. Including variables relevant to treatments might help to improve the performance of prediction models; however, this hypothesis calls for results from well-designed studies with long-term observation.
Most published prediction models for PMN focused on forecasting so-called hard endpoints, including renal function deterioration, ESRD, and death. Only two studies in this systematic review used proteinuria remission as their predicted outcomes [18,19]. Although the natural history of PMN was traditionally considered benign for the risk of renal progression, this disease has remained one of the leading causes of renal failure among various primary glomerulopathies in the US and Europe [4]. Long-term nephrotic proteinuria not only implies the absence of remission, but also increases risk of thromboembolic events [2]. Even for patients with proteinuria of less than 6 g/day, long-term nephrotic proteinuria may cause consistent hypoalbuminemia, which in turns induce overt edema. Therefore, treatments are often required, including both non-immunosuppressive supportive treatment and immunosuppressive therapies. Prediction models targeting at disease remission may help to select patients who might benefit from treatments, customize therapeutic regimens, and avoid unnecessary adverse effects. It is for this consideration that more accurate prediction models are needed to predict the efficacy after treatments in PMN and assist clinical decision making.
Although the overall discrimination performance of included studies in this metaanalysis was good, reflected by the pooled C statistic higher than 0.85 [6][7][8]18,19], most of these models had not undergone validation. Only two studies had been internally validated [6,8]; only one study had been externally validated [5,20]. Model validation is a critical approach to confirm the robustness of a prediction model. Internal validation helps to verify the reproducibility of the modelling process and prevent overfitting of the model that might result in overestimation of the performance of the model [25], therefore, being considered mandatory by the PROBAST tool. Four prediction models in this systematic review were rated as having high risk of bias due to the lack of internal validation. External validation is not a requirement of the PROBAST tool. It is used to verify the consistency of performance of the model in different time periods, different regions, or different populations; however, predictive performance may worsen substantially on external validation [26]. It should be noted the majority of included studies in this meta-analysis did not report calibration results. The lack of calibration assessment and validation prevented us from drawing definitive conclusions on the performance of these prediction models. This might be one of the reasons why current prognosis prediction models for PMN have not been widely used in clinic yet.
To the best of our knowledge, this is the first systematic review and meta-analysis on the prognostic prediction models for PMN. There are a few limitations to be mentioned. First, the number of included studies was small, preventing us from conducting sensitivity analysis to explore the high heterogeneities or publication bias analysis. Second, partial data for generation of the sROC curve had been extracted through the GetData software, which might have led to some errors. Third, the quality of each included studies was rated as low, reflected by the high risk of bias assessment from the PROBAST tool. High-grade evidence from well-designed studies is needed in this field.

Conclusions
This systematic review and meta-analysis indicated the published prediction models for PMN were relatively few in number compared with those for other kidney diseases. The pooled discrimination ability of the included models was good; however, the insufficient calibration assessment and lack of validation studies precluded drawing a definitive conclusion on the performance of these prediction models. High-grade evidence from well-designed studies is needed in this field.

Supplementary Materials:
The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/jcm12020559/s1, Table S1: Literature search strategies. Figure S1: Fagan nomogram of the six prediction models with renal function progression as predicted outcomes.

Data Availability Statement:
The data analyzed or generated during the study is available from the corresponding author on reasonable request.