Predicting Residual Function in Hemodialysis and Hemodiafiltration—A Population Kinetic, Decision Analytic Approach

In this study, we introduce a novel framework for the estimation of residual renal function (RRF), based on the population compartmental kinetic behavior of beta 2 microglobulin (B2M) and its dialytic removal. Using this model, we simulated a large cohort of patients with various levels of RRF receiving either conventional high-flux hemodialysis or on-line hemodiafiltration. These simulations were used to estimate a novel population kinetic (PK) equation for RRF (PK-RRF) that was validated in an external public dataset of real patients. We assessed the performance of the resulting equation(s) against their ability to estimate urea clearance using cross-validation. Our equations were derived entirely from computer simulations and advanced statistical modeling and had extremely high discrimination (Area Under the Curve, AUC 0.888–0.909) when applied to a human dataset of measurements of RRF. A clearance-based equation that utilized predialysis and postdialysis B2M measurements, patient weight, treatment duration and ultrafiltration had higher discrimination than an equation previously derived in humans. Furthermore, the derived equations appeared to have higher clinical usefulness as assessed by Decision Curve Analysis, potentially supporting decisions for individualizing dialysis prescriptions in patients with preserved RRF.


Introduction
Patients with End-Stage Renal Disease (ESRD) initiate dialysis when their intrinsic, residual renal function (RRF) is between 5 and 11 mL/min/1.73 m 2 [1,2]. Although RRF will be invariably lost within the first few years of dialysis initiation, sustaining this RRF is clinically important for the following reasons: low or declining RRF is associated with worse survival [3][4][5][6][7], worse phosphate control [8], increasing left ventricular hypertrophy [9], and poorer quality of life [3,10]. Hence existing dialysis guidelines [11,12] recommend the incorporation of RRF measurements into individualized patient prescriptions of treatment frequency and duration in order to achieve minimum dialysis adequacy targets.
In recent years, the paradigm of incremental dialysis [6,13,14], i.e., the gradual increase in frequency of dialysis to match the RRF has also been proposed. In this paradigm, patients with preserved RRF are initiated on twice-weekly dialysis, and then switched to thrice-weekly dialysis as RRF is lost. This practice has not been widely adopted even though it may be associated with neutral [15] or even improved survival, reduced hospitalizations and improved control of several biochemical parameters [16,17]. A major barrier to the safe implementation of incremental dialysis is the need to measure RRF in order to prevent underdialysis. This is a concern raised by regulators Figure 1. Overview of the approach adopted in this paper. We simulated the beta 2 microglobulin (B2M) kinetics in populations undergoing hemodialysis (HD) or hemodiafiltration (HDF). These simulations were used to derive Gaussian Process approximations for dialytic clearance and equations that related B2M to the known residual renal function (a novel population kinetic equation for residual renal function (PK-RRF)). Literature data were then used to relate the RRF to urea clearance (UrCl) and derive an equation that uses B2M and dialysis parameters to predict UrCl, a proxy for the unmeasured RRF. These in silico-derived equations were assessed in terms of discrimination, validation and calibration in real-world datasets.

Simulations of B2M Kinetics during Dialysis and Hemodiafiltration
We simulated 10,000 hemodialysis patients receiving either thrice-weekly high-flux HD or online hemodiafiltration (HDF) under the two-compartment, variable volume model for B2M kinetics [24,30] following the methodology previously reported by our group [24]. These simulations used the Population Kinetic (PopK) parameters for B2M and the range of the dialyzer B2M clearances [24,25] described in our previous meta-analysis. Dialysis-related parameters (dialysis duration, ultrafiltration volume, substitution flow rates) were based on the FHN [31,32], HEMO [33] trials for the HD simulations and on the three largest randomized controlled trials (Dutch CONTRAST [34], Spanish ESHOL [35], Turkish OL-HDF [36]) for HDF. Each patient was given a unique RRF value and the interdialytic/intradialytic changes in B2M concentration were simulated over a period of three months with the Livermore Solver for Ordinary Differential Equations with Automatic Stiffness detection (LSODA) [37]. The predialysis and postdialysis-simulated concentration of B2M at the first weekly session at the end of the three-month period was extracted from the simulation files and used for model development. This simulated dataset is available online as Table S1. This simulation strategy based on the double pool model is illustrated for a few illustrative cases in Figure 2. The output of the simulation is essentially a database of readily available observables, i.e., B2M Figure 1. Overview of the approach adopted in this paper. We simulated the beta 2 microglobulin (B2M) kinetics in populations undergoing hemodialysis (HD) or hemodiafiltration (HDF). These simulations were used to derive Gaussian Process approximations for dialytic clearance and equations that related B2M to the known residual renal function (a novel population kinetic equation for residual renal function (PK-RRF)). Literature data were then used to relate the RRF to urea clearance (UrCl) and derive an equation that uses B2M and dialysis parameters to predict UrCl, a proxy for the unmeasured RRF. These in silico-derived equations were assessed in terms of discrimination, validation and calibration in real-world datasets.

Simulations of B2M Kinetics during Dialysis and Hemodiafiltration
We simulated 10,000 hemodialysis patients receiving either thrice-weekly high-flux HD or on-line hemodiafiltration (HDF) under the two-compartment, variable volume model for B2M kinetics [24,30] following the methodology previously reported by our group [24]. These simulations used the Population Kinetic (PopK) parameters for B2M and the range of the dialyzer B2M clearances [24,25] described in our previous meta-analysis. Dialysis-related parameters (dialysis duration, ultrafiltration volume, substitution flow rates) were based on the FHN [31,32], HEMO [33] trials for the HD simulations and on the three largest randomized controlled trials (Dutch CONTRAST [34], Spanish ESHOL [35], Turkish OL-HDF [36]) for HDF. Each patient was given a unique RRF value and the interdialytic/intradialytic changes in B2M concentration were simulated over a period of three months with the Livermore Solver for Ordinary Differential Equations with Automatic Stiffness detection (LSODA) [37]. The predialysis and postdialysis-simulated concentration of B2M at the first weekly session at the end of the three-month period was extracted from the simulation files and used for model development. This simulated dataset is available online as Table S1. This simulation strategy based on the double pool model is illustrated for a few illustrative cases in Figure 2. The output of the simulation is essentially a database of readily available observables, i.e., B2M measurements, anthropometrics (patient weights), dialysis parameters (e.g., treatment duration, ultrafiltration volume) and the outcome of interest (the RRF). Such quantities could be used to derive an accurate equation for the RRF, given the necessary large, yet expensive and cumbersome clinical studies. The kinetic simulations allow one to create a synthetic dataset of such measurements on the cheap ("in silico") in a way that verifies the current understanding of the population distribution of the kinetic parameters of B2M. It was our hypothesis that this synthetic dataset could be used as the basis of development of an accurate RRF-estimating equation.

Generalized Additive Models for the PK-RRF
The predialysis B2M values in the simulated dataset (Table S1) were used to predict the RRF value in each artificial patient, via means of Generalized Additive Models (GAMs) [38,39]. GAM regressions allow the data-driven discovery of complex, possibly non-linear interactions between continuous covariates and either continuous (the RRF value) or discrete (RRF >2 mL/min) outcomes. GAMs achieve these goals by modeling the relationships between variables through smoothing functions, e.g., thin plate regression splines (TPRS) or Gaussian Processes (GPs). Contrary to other popular flexible models used in biomedicine, e.g., cubic splines, TPRS can model highly complex relationships and nonlinear interactions between covariates and without the need to specify knot locations, i.e., ranges of values in which the relationship between the covariate and outcome changes mathematical form. For models with more than two interacting variables measured in different scales, we also considered GPs when fitting the GAMs. GP models require a considerable amount of data to learn these relationships, and thus could only fit on the dataset of simulated patients, since previous studies fall short of the large sample sizes (hundreds to thousands of patients) needed to learn these relationships in a data driven fashion. In fitting these models, one must specify a covariance function that describes the correlation of the value of quantity modelled as GP at two different points of the input (e.g., B2M, dialyzer clearance, body weight, treatment duration) variables. For the purpose of this work, we used the Kamman and Wand version of the Matérn covariance function [40] in the mgcv GAM package in R. This choice offers a reasonable compromise between the ability to learn complex relations and the execution speed of the GP estimation algorithm.

In Silico Exploration of Factors
Affecting the Performance of the PK-RRF While exploring our artificial dataset, we developed a series of models that used the control parameters of the simulations; e.g., the dialytic clearance of B2M or treatment time in addition to B2M levels when predicting RRF. The purpose of this modeling was to understand the additional measurements or dialysis treatment parameters [41] that should be incorporated into a basic model of B2M vs. RRF in order to develop an estimating equation that can truly individualize RRF predictions. For these analyses, we adopted a hold-out validation strategy in which the simulated dataset was randomly split in a training/development (2/3 of all data) and a testing/validation subset. The simplified formulas were then developed in the training dataset and their performance was assessed in the development one. Dialytic clearance was calculated from treatment duration, ultrafiltration volume, body weight, predialysis and postdialysis B2M measurements via the Leypoldt formula [42]. This formula is based on a unicompartmental approximation to the full double pool, variable volume kinetic model of B2M. The inputs to this formula are the predialysis and postdialysis B2M levels, body weight and ultrafiltration rate. It is not entirely clear how accurate this equation is over the entire range of the input parameters. Hence, we used our kinetic simulations to compare this formula against a GP that utilized the same variables using a hold-out validation technique. Further technical details about the analyses in this section are provided in Appendix A. These analyses were carried out in Microsoft R Open v3.4.0-3.5.3 via the package mgcv. Overview of the simulation strategy approach adopted in this paper. We simulated the B2M kinetics in populations undergoing hemodialysis (HD) or hemodiafiltration (HDF) using the variable volume double pool model for B2M [24]. This system consists of a plasma/perfusing (P) and nonperfusing/non-plasma (NP) with additional material fluxes for patients during hemodialysis sessions (stippled shapes). The symbols V, Φ, C denote the absolute and fractional volume of each compartment and the concentration of B2M, respectively. Generation (G) takes place in both compartments, in direct proportion to their fractional volumes. KD, KER, KR are the dialyzer clearance and extrarenal and residual renal clearances, respectively. The dialytic clearance incorporated diffusive/adsorptive and convective components for patients on hemodiafiltration (not shown). A population of patients was simulated by first assuming distinct values for parameters of the compartmental system (the parameters for the first five patients are shown in the figure). Using these values, we undertook detailed simulations by solving the system of the differential equations (box in the middle) over a period of three months for each individual patient. Changes in the volumes of the two compartments occur during the simulated dialysis sessions (ultrafitration, rate QUF) and in the interdialytic period (fluid intake, with rate α) so that the relative volume (VP/VNP) stays constant both intra (Θ = 1) and inter-dialytically (Θ = 0). We extracted the predialysis and postdialysis B2M at the midweek session of the last week in the three-month period and used this database constructed purely from the simulator output, as input for the development of the PK-RRF equations as described in the text. Figure adapted from Figure S1 in [24], reused under the Creative Commons CC-BY license.

Figure 2.
Overview of the simulation strategy approach adopted in this paper. We simulated the B2M kinetics in populations undergoing hemodialysis (HD) or hemodiafiltration (HDF) using the variable volume double pool model for B2M [24]. This system consists of a plasma/perfusing (P) and non-perfusing/non-plasma (NP) with additional material fluxes for patients during hemodialysis sessions (stippled shapes). The symbols V, Φ, C denote the absolute and fractional volume of each compartment and the concentration of B2M, respectively. Generation (G) takes place in both compartments, in direct proportion to their fractional volumes. K D , K ER , K R are the dialyzer clearance and extrarenal and residual renal clearances, respectively. The dialytic clearance incorporated diffusive/adsorptive and convective components for patients on hemodiafiltration (not shown). A population of patients was simulated by first assuming distinct values for parameters of the compartmental system (the parameters for the first five patients are shown in the figure). Using these values, we undertook detailed simulations by solving the system of the differential equations (box in the middle) over a period of three months for each individual patient. Changes in the volumes of the two compartments occur during the simulated dialysis sessions (ultrafitration, rate Q UF ) and in the interdialytic period (fluid intake, with rate α) so that the relative volume (V P /V NP ) stays constant both intra (Θ = 1) and inter-dialytically (Θ = 0). We extracted the predialysis and postdialysis B2M at the midweek session of the last week in the three-month period and used this database constructed purely from the simulator output, as input for the development of the PK-RRF equations as described in the text. Figure adapted from Figure S1 in [24], reused under the Creative Commons CC-BY license.

Modeling the Relation of RRF to UrCl
A key question that has not been answered in the literature [14,41,43] concerns the quantitative relationship between the readily available clinical measurements of CrCl and UrCl and their relationship with the concept of the RRF. In the context of this work, the RRF maps to the glomerular filtration rate (GFR), as has been recently discussed [41,44,45]. Only a few landmark, experimental studies [46][47][48][49] have simultaneously measured the GFR using exogenous markers and timed urine collections, but the data from these studies have not been analyzed together. We thus undertook a quantitative synthesis of the relationship between the GFR measurements and the timed urine collections for the calculation of clearances of urea and creatinine. In the compartmental kinetic model for B2M, the RRF participates in its non-normalized to the Body Surface Area form-i.e., its units are expressed in mL/min rather than mL/min/1.73 m 2 and this is the form we analyzed in this paper. We extracted individual patient data (Table S2) from these publications from the relevant tables, or figures using digitizer software as previously described [25]. These studies used a variety of methods, e.g., iothalamate, inulin or Tc-99m diethylene-triamine-pentaacetate (DTPA) to measure the GFR. The UrCl and CrCl data from these studies were related to the RRF using the following hierarchical measurement error model: According to this model, the jth measurement of the GFR (mGFR i,j j = 1,2,3 corresponding to iothalamate, inulin or DTPA) in the ith patient is an unbiased, but noisy estimate of the (unobserved) RRF in that patient (RRF i ). The (observed) UrCl i and CrCl i in the same patient are linearly related to the RRF i , with the relationship incorporating linear intercept (α U and α C , respectively) and slope (β U and β C , respectively) terms. If the urine clearances are unbiased estimates of the RRF, the corresponding intercept terms should be zero; if the slope parameters are less (greater) than 1.0, the clearance measurements consistently under(over)estimate the RRF. The quantities (e i,j , U i and C i ) are random terms, quantifying measurement and formula error in the measured GFR, UrCl and CrCl, respectively. All random terms are modelled as zero mean, Gaussian (normal) random variables with standard deviations given by the sigma symbols in (1). Model fitting procedures for this hierarchical model data synthesis are discussed in Appendix B.

Measurements of B2M, RRF and Dialyzer Clearance in Patients
We utilized the publicly available dataset of RRF, CysC, B2M, urea, creatinine and dialysis parameters that accompanied the publication of Vilar et al. [21] to evaluate the PK-RRF equation performance. This dataset included the measurements of 391 individuals-of whom, nearly two-thirds were receiving HDF and the remaining were receiving conventional, high-flux HD. Thirty six and seventy per cent of the participants in this dataset had RRF of 0 and less than 2.0 mL/min, respectively, providing an opportunity to assess equation performance in this challenging subgroups of patients. For the purpose of this paper we used the 24 h urea clearance as an index of RRF. This is the approach taken by the bulk of the literature to date [17,19,21,23] in the field. Predialysis and postdialysis B2M, and thus dialytic clearance calculations were available in 300 patients in the Vilar cohort, thus allowing us to fit the dialytic clearance equations. The comparator (Shafi) equation was developed in a cohort of 44 patients with urine volume >250 mL undergoing HD and were validated in a cohort of 826 mixed cohort of patients undergoing either peritoneal dialysis or HD from the NECOSAD study.

Models and Outcomes for UrCl
We compared our PK-RRF equations against the Shafi equation for the a) primary outcome of having a UrCl >2 mL/min vs. ≤2 mL/min and b) the continuous prediction of UrCl. The rationale for developing these two separate equations was the following: (a) The logistic equation concerns the outcome that is specifically mentioned in the guidelines and regulatory documents. This is also the level of the residual renal function-below which, dialysis prescription and modality seem to have the largest impact on middle molecule exposure levels [24,50]. By directly modeling this outcome with the logistic approach, we are able thus able to derive a prediction rule that incorporates model uncertainty about this outcome being present.
This feature also allows us to explore different thresholds (assessments of risk of having a low RRF) for opting out of the current dialysis standard. (b) Analysis of the RRF as a continuous outcome allows a more direct comparison of the kinetic approach against a previously developed equation which all provide continuous estimates. In the publication describing the Shafi equation, the formula's continuous prediction was thresholded, in order to classify patients as having a clearance above or below the cutoff. Hence, this and similar equations implement a "hard" classification of patients as having "low" (less than 2 mL/min) or high (greater than 2 mL/min) UrCl, i.e., they assign probabilities of either 1 or zero without the nuance afforded by the logistic regression model. In particular, the later model assigns a continuous probability value in the range of zero to one when classifying patients as having lor or high clearance (a "soft" classification).
We considered two PK-RRF equations as comparators of the Shafi equation: (i) a basic equation that included only the predialysis B2M, and (ii) a clearance-based equation which used a GP to account for dialytic clearance as explained in Section 2.2.2.

Model Comparison: Discrimination, Calibration and Clinical Usefulness
We adopted the Area Under the Curve (AUC) for discrimination of UrCl ≥2 mL/min vs. UrCl <2 mL/min (the cutoff of incremental dialysis in the existing guidelines) as the primary metric for evaluating the performance of the logistic PK-RRFs against the Shafi equation. Other descriptive metrics (Appendix C) were also computed to provide a more complete evaluation for the continuous UrCl outcome: the proportion of predictions within a 0.5-2 mL/min of the true RRF, measures of bias (Mean Absolute Error and Median Error), variance (Interquartile Range) and the total Root Mean Square Error (RMSE) between the model predictions and the actual RRF measurements.
The basic method for assessing calibration was the linear calibration plot. In these analyses, the UrCl, or the linear predictor for the logistic PK-RRF was used as a covariate in linear regression analyses against the real-world measurements [51,52]. This linear regression yields an intercept and a slope which can be used to assess calibration. The intercept is an index of the formulas' ability to be systematically too high or low ("calibration-in-the-large") and should be close to zero for an unbiased model. The calibration slope should be ideally equal to unity, with values smaller than one reflect model overfit, with the departure from unity quantifying the effects of overfit.

Clinical Usefulness (Decision Curve Analysis)
We also assessed the clinical usefulness of the PK-RRF and Shafi models using Decision Curve Analysis (DCA). DCA essentially quantifies the net numbers of correct classifications gained by applying a predictive model or a simple clinical decision rule and the resulting clinical consequences of a treatment decision guided by the rule. DCA [28] assumes that the threshold probability of a clinical state decided by the prediction rule, at which one would opt in or out of treatment is informative of the cost benefit ratio of a false-positive and a false-negative prediction. This relationship is then used to calculate the Net Benefit (NB) over different threshold probabilities contrasting the benefit against treatment strategies of treating all or no patients at all. The NB is a quantity that assumes values between negative infinity up to the prevalence of the clinical state of interest; positive NB values indicate that the model had a beneficial impact on clinical decision-making and negative values indicate harm. The Standardized Net Benefit (SNB) indexes the NB to the prevalence of the state of interest (e.g., the high clearance state for the opt-out policy), to give a maximum potential value of one. Since there is not usually a single, universally acceptable probability threshold, one can plot SNB against threshold probability to obtain a "decision curve". DCA then identifies the magnitude of benefit and allows a direct comparison of several models against the range of threshold probabilities, or equivalently risk-benefit ratios. The DCA shows the prediction model with the highest utility [53] over the entire range of risk thresholds.
In our context, we applied DCA under the opt-out treatment policy framework [54]: the PK-RRF and the clinical rule based on the Shafi formula are used to identify low-risk patients (patients with preserved UrCl >2 mL/min) who could opt out of the (reference) treatment strategy of thrice-weekly dialysis for a more individualized regimen that may include shorter or less frequent treatments. The SNB directly incorporates two rates as a function of the specific risk threshold used to deem a certain individual as belonging to the high-("case", UrCl ≥2 mL/min) and low-("control", UrCl <2 mL/min) risk groups: 1.
The True Negative Rate (TNR, the rate of patients correctly classified as low risk, i.e., those with a clearance of >2 mL/min that the model labeled as having a clearance of >2 mL/min); 2.
False Negative Rate (FNR, the proportion of patients with a clearance of <2 mL/min that the model incorrectly classified as having a clearance of >2 mL/min).
In a decision analytic framework, one also needs to specify the B (enefit), i.e., the totality of good and bad effects of the treatment as experienced by the "cases" and the C (ost), i.e., the expected burden of the standard treatment for the "controls". Decision analysis states that given B and C, the optimal risk threshold (R) for deciding who to treat is simply: The SNB for the opt-out framework for a population with a given prevalence of cases and a given risk threshold is calculated as [54]: In this equation, the TNR and the FNR are functions of the assumed risk threshold and thus covary with it. Stated in other terms, the SNB is the True Negative Rate appropriately discounted by the prevalence at a fixed cost-benefit ratio. Under an opt-out policy, the standard of care has a TNR = FNR = SNB = 0. For the purpose of this paper, we constructed the DCA under the prevalence (70%) of the high-risk group in the Vilar dataset and assuming a wide variety of Benefit-Cost ratios ranging from 1:100 to 100:1 to provide an inclusive range of subjective valuations of more lengthy/frequent or less lengthy/infrequent dialysis. We also undertook a sensitivity analysis in which we computed SNBs for different prevalence rate of the cases. These calculations allowed us to explore clinical situations in which the formulas developed may achieve a higher SNB and thus clinical utility.

PK-RRF Recalibration and Inclusion of Multiple Biomarkers
Finally, we explored the potential of re-calibration to improve the PK-RRF performance; in particular, we carried out analyses of "internal-external" leave-one-out cross-validation [55] of the PK-RRF against the Vilar dataset. In these analyses, we sequentially set aside each of the original study participants and repeat the linear calibration regression analysis. After these models have been fit, we generate predictions for each patient held back and compare the model prediction against the actual RRF measurement. We also examined the possibility of linear calibration of the PK-RRF equations using multiple additional biomarkers (urea, creatinine and CysC). To do so, we applied the "simulation-calibration" framework proposed from the statistical literature about computer simulations [26,27,56]. This framework was put forward to calibrate simplified versions (our PK-RRF equations) of complex computer models (such as the output of our kinetic simulations) against real-world measurements of the phenomena (B2M levels-RRF). The statistical framework for the calibration analysis rests on Gaussian Process models, thus providing a unique methodological synthesis between the GAM models used to develop the PK-RRF and this calibration analysis. For the purpose of this work, we assessed whether prediction models that included statistical interactions among the measurements of multiple biomarkers (B2M, CysC, urea and creatinine) and the PK-RRF can be used to improve discrimination and calibration of the PK-RRF equation against the Vilar dataset. Whereas these models do include the same slope and intercept terms as simple linear calibration models, they differ by incorporating GPsamong the biomarkers that could predict RRF. These additional terms affect a non-linear correction of the B2M-based equations, as they allow the contribution of the B2M measurement to non-linearly co-vary along with measurements of other biomarkers of renal function when predicting UrCl.

Software Availability
Due to the complexity of the TPRS models, it is not easy to write down the equation in a mathematically simple form, like with other equations. We thus distribute the entire R code required to estimate the PK-RRF from the software repository https://bitbucket.org/chrisarg/pk-rrf/. This software repository includes all code required to fit the models considered in this paper. More importantly, it provides the software code that undertakes the complex calculations described in the text and makes them widely available in the open source R environment. For readers who would like to explore these calculations in batch, we have also included code to set up a shiny application to use these equations from within a web server environment. An instance of this server application has also been deployed at https://chrisarg.shinyapps.io/popk_rrf/ for readers who would like to evaluate these equations for education or research. This server version is only intended for demonstration purposes (e.g., due to the licensing of the shiny platform, it will only be running for 25 cumulative hours every month). A help file has also been included in the bitbucket repository that details the use of the R programs that fit the models described in the text. This file should allow readers to refit the models and possibly adapt their use outside the prediction servers set up by the authors. We provide the driver code for the kinetic modeling of B2M as a function of the C programming language in Appendix D.

Development of PK-RRF and Performance of GP for Dialytic Clearance
The use of a predialysis level B2M achieved very high discrimination for predicting ≥2 mL/min (AUC 0.896). The inclusion of postdialysis time and dialytic clearance improved discrimination (AUC 0.915) in the simulated dataset (Appendix A, Figure A1). When benchmarked against the true value, the dialytic clearance calculated on the basis of the Leypoldt formula had a median (mean) bias of −0.78 (−3.82) mL/min and highly variable performance: an IQR (standard deviation) of the difference between true and estimated values of 14.6 (22.50) mL/min. On the other hand, the dialytic clearance estimated by the GP had a minimal median (mean) bias of 0.18 (0.07) mL/min and much less variable performance: the IQR (standard deviations) of the difference between true and estimated value of 8.29 (7.21) mL/min (Appendix A, Figure A2). In summary, the analysis of the simulated dataset suggests that one should include not just B2M, but also GP-based estimates of dialytic clearance in order to derive a predictive equation for RRF that has high discrimination.

The Relationship between RRF, Urea and Creatinine Clearance
We fit the hierarchical model to the literature data synthesis as discussed in Appendix B. The synthesis of the literature data suggests that UrCl and CrCl are linearly related to the mGFR (Figure 3), while regression estimates are shown in Table A1. Urea clearance measurements had essentially zero bias in predicting the mGFR (the intercept of the linear regression was 0.007 mL/min), while underestimating the mGFR (the regression slope was 0.751). There was a small negative bias when CrCl is used to estimate the mGFR, yet CrCl systematically overestimated the mGFR (its slope was 1.242). These analyses suggest the following simplified equation relates to UrCl to RRF: In the latter equation, ignoring the intercept and the uncertainty in the parameter estimates is justified because of the small value of the former parameter and the extremely high signal-to-noise ratio in estimating the slope. In particular, the 95% credible interval for the slope is between 0.694 and 0.810, indicating that we are 95% certain that the true value of this parameter is contained in this narrow range. In all subsequent analyses, we multiplied the output of the PK-RRF-estimating equations for the continuous RRF outcome by 0.751 to convert it to an estimate for the UrCl. Furthermore, when deriving PK-RRF equations that predict the odds of having UrCl above a given cutoff, e.g., 2 mL/min, one has to use the equivalent RRF cutoff of 2/0.751 mL/min ~ 2.66 mL/min.

The Logistic PK-RRF Has High Discrimination and Calibration When Validated against a Patient Dataset
The PK-RRF equation developed in simulations, achieved greater discrimination (AUC 0.829-0.911) compared to the Shafi equation in the entire Vilar cohort, the subset of patients on HD or HDF (Table 1). Discrimination by either equation was lower when applied in the entire dataset, i.e., not excluding anuric patients.   Table A1.
Urea clearance measurements had essentially zero bias in predicting the mGFR (the intercept of the linear regression was 0.007 mL/min), while underestimating the mGFR (the regression slope was 0.751). There was a small negative bias when CrCl is used to estimate the mGFR, yet CrCl systematically overestimated the mGFR (its slope was 1.242). These analyses suggest the following simplified equation relates to UrCl to RRF: In the latter equation, ignoring the intercept and the uncertainty in the parameter estimates is justified because of the small value of the former parameter and the extremely high signal-to-noise ratio in estimating the slope. In particular, the 95% credible interval for the slope is between 0.694 and 0.810, indicating that we are 95% certain that the true value of this parameter is contained in this narrow range. In all subsequent analyses, we multiplied the output of the PK-RRF-estimating equations for the continuous RRF outcome by 0.751 to convert it to an estimate for the UrCl. Furthermore, when deriving PK-RRF equations that predict the odds of having UrCl above a given cutoff, e.g., 2 mL/min, one has to use the equivalent RRF cutoff of 2/0.751 mL/min~2.66 mL/min.

The Logistic PK-RRF Has High Discrimination and Calibration When Validated against a Patient Dataset
The PK-RRF equation developed in simulations, achieved greater discrimination (AUC 0.829-0.911) compared to the Shafi equation in the entire Vilar cohort, the subset of patients on HD or HDF (Table 1). Discrimination by either equation was lower when applied in the entire dataset, i.e., not excluding anuric patients. The inclusion of treatment duration, body weight, predialysis and postdialysis B2M as GP increased the performance of the PK-RRF equation (AUC 0.909 for all patients and 0.855 for the non-anuric ones). The p-values of the Spiegelhalter test for overall calibration accuracy of prediction probabilities were 0.594 (basic model), and 0.743 (clearance-based model). Examination of the calibration plots (Appendix C, Figure A3) showed that all these models systematically underestimated the risk of having UrCl ≤2 mL/min (intercept different from zero). Nevertheless, the clearance-based model had a calibration slope of 1.058, close to the ideal value of unity, indicating that the model's variables and their interconnection validly generalize from the simulated to the real-world dataset. Other metrics (e.g., Brier score, Somer's rank correlation, unreliability index) favored the clearance-based PK-RRF over the basic one (Appendix C, Figure A3). These analyses suggest that an "intercept update" [57,58] of the PK-RRF to improve "calibration-at-large" may allow the use of these models to human populations with different overall probabilities for the outcome of UrCl ≤2 mL/min. The superior calibration of the clearance-based PK-RRF was also shown when the RRF was examined as a continuous outcome (Appendix C, Table A2).

Re-Calibration of the PK-RRF and Incorporation of Multiple Biomarkers
We explored the possibility of improving the performance of the PK-RRF equations using recalibration and additional biomarkers. Towards that goal, we cross-validated the logistic and continuous RRF equations in all patients with CysC, predialysis urea and creatinine measurements in the Vilar dataset. Recalibration using multiple biomarkers improved the discrimination of the clearance-based RRF compared to a simple intercept/slope recalibration that did not use these markers (Table 2). Overall, the performance of the PK-RRF equation improved but only marginally (delta AUC was~0.05-0.1) with the inclusion of multiple biomarkers. Across the entire dataset of patients receiving either HD or HDF, the incorporation of the predialysis urea and creatinine had similar incremental improvement as the inclusion of CysC. Examination of the proportion of the predicted UrCl within a fixed amount (in mL/min) for the continuous UrCl (Figure 4) reproduced the patterns seen for the discrete outcome. Notably, these equations were much more likely to be within 0.5 and 2 mL/min in this validation dataset, than in the development dataset (Appendix A, Figure A1). Analysis of bias, variance and total error (Table 3) for the cystatin-C and the urea/creatinine-calibrated models are shown in Table 3. Overall, the incorporation of additional biomarkers appears to improve performance but only marginally.

Clinical Usefulness (Decision Curve Analysis)
We investigated the clinical usefulness of the PK-RRF equations in detecting patients with UrCl >2 mL/min ("low-risk group") in the Vilar dataset who could be offered infrequent or incremental dialysis. Decision curves were constructed for the entire range of threshold probabilities from 0 to 1, corresponding to increasing risk benefit ratio from 1:100 to 100:1. SNB were calculated for the base PK-RRF, the clearance-based PK-RRF and the Shafi clinical decision rule ( Figure 5, left). The default strategy of offering thrice-weekly dialysis has by default a SNB of zero irrespective of the probability threshold used to classify patients as low risk (gray horizontal line). The Shafi rule was dominated by both the base and the clearance-based PK-RRF when the underdialysis risks of offering patients to opt out of the default thrice-weekly strategy were much smaller than the perceived benefit. In particular, the SNB of both PK-RRF rules were less than −0.1 when the threshold for classifying Figure 4. Proportion of predicted UrCl within a certain fixed amount from the measured UrCl for calibrated models that included additional biomarkers.

Clinical Usefulness (Decision Curve Analysis)
We investigated the clinical usefulness of the PK-RRF equations in detecting patients with UrCl >2 mL/min ("low-risk group") in the Vilar dataset who could be offered infrequent or incremental dialysis. Decision curves were constructed for the entire range of threshold probabilities from 0 to 1, corresponding to increasing risk benefit ratio from 1:100 to 100:1. SNB were calculated for the base PK-RRF, the clearance-based PK-RRF and the Shafi clinical decision rule ( Figure 5, left). The default strategy of offering thrice-weekly dialysis has by default a SNB of zero irrespective of the probability threshold used to classify patients as low risk (gray horizontal line). The Shafi rule was dominated by both the base and the clearance-based PK-RRF when the underdialysis risks of offering patients to opt out of the default thrice-weekly strategy were much smaller than the perceived benefit. In particular, the SNB of both PK-RRF rules were less than −0.1 when the threshold for classifying patients as low risk were less than 0.28. The Shafi rule dominated the base PK-RRF between threshold values of 0.28-0.85 and dominated by it for higher thresholds, corresponding to perceived risk that was higher than the anticipated benefit. The clearance base PK-RRF had similar performance as the Shafi equation in the mid-range of threshold probabilities and dominated it at either low or high values. The PK-RRF models and the Shafi rule dominated a strategy of treating no patients with the default policy. Furthermore, we assessed the potential improvement in the clinical usefulness of the recalibrated base and clearance PK-RRF and the inclusion of additional biomarkers. These analyses were performed under a leave-one-out cross-validation framework and are shown in the right panel of Figure 5. As expected, linear re-calibration improved the performance of the base PK-RRF model much more than the clearance-based model alone. The inclusion of urea and creatinine did not appear to improve the clinical usefulness of the clearance-based PK-RRF model. On the other hand, the inclusion of CysC measurements resulted in models with higher clinical usefulness in the middle range of threshold probabilities. patients as low risk were less than 0.28. The Shafi rule dominated the base PK-RRF between threshold values of 0.28-0.85 and dominated by it for higher thresholds, corresponding to perceived risk that was higher than the anticipated benefit. The clearance base PK-RRF had similar performance as the Shafi equation in the mid-range of threshold probabilities and dominated it at either low or high values. The PK-RRF models and the Shafi rule dominated a strategy of treating no patients with the default policy. Furthermore, we assessed the potential improvement in the clinical usefulness of the recalibrated base and clearance PK-RRF and the inclusion of additional biomarkers. These analyses were performed under a leave-one-out cross-validation framework and are shown in the right panel of Figure 5. As expected, linear re-calibration improved the performance of the base PK-RRF model much more than the clearance-based model alone. The inclusion of urea and creatinine did not appear to improve the clinical usefulness of the clearance-based PK-RRF model. On the other hand, the inclusion of CysC measurements resulted in models with higher clinical usefulness in the middle range of threshold probabilities. Figure 5. Clinical usefulness (Decision Curve Analysis (DCA)) for predicting the binary outcome of UrCl >2 mL/min and to offer patients the option to opt-out from the thrice-weekly dialysis standard. Left graph: uncalibrated base PK-RRF model ("PK-RRF"), clearance-based PK-RRF model ("PK-RRF (B2Cl)"), Shafi clinical decision rule ("Shafi"), treat "all" with thrice-weekly dialysis, and treat "none" with thrice weekly. Right graph: recalibrated basic and clearance-based PK-RRF, clearance-time based PK-RRF incorporating urea and creatinine (interacting with the B2M measurement and gender) and the clearance-based PK-RRF model that incorporates the Cystatin C. DCA curves were smoothed using the non-parametric smoother "loess" prior to plotting.
In a final sensitivity analysis, we examined the impact of the prevalence rate of the low clearance rates on clinical utility. The prevalence of this state in the Vilar dataset was 70%, so we assumed the following prevalence rate in the sensitivity analysis: 0.1 (10%), 0.3 (30%), 0.5 (50%) and 0.9 (90%). These analyses are shown in Figure 6. There are several observations worth noting when Figures 5 and 6 are examined together. First, the PK-RRF had a positive net clinical utility for prevalence of the low clearance rate that <70% and across the range of the cost-benefit assessments (1:100 to 100:1) considered. The rule based on the Shafi equation appear to dominate the PK-RRF when the costbenefit ratio and the prevalence of the low clearance rates were low, but not when either was moderate in size. For cost-benefit ratios >1, and for prevalence up to 70%, the PK-RRF prediction Figure 5. Clinical usefulness (Decision Curve Analysis (DCA)) for predicting the binary outcome of UrCl >2 mL/min and to offer patients the option to opt-out from the thrice-weekly dialysis standard. Left graph: uncalibrated base PK-RRF model ("PK-RRF"), clearance-based PK-RRF model ("PK-RRF (B2Cl)"), Shafi clinical decision rule ("Shafi"), treat "all" with thrice-weekly dialysis, and treat "none" with thrice weekly. Right graph: recalibrated basic and clearance-based PK-RRF, clearance-time based PK-RRF incorporating urea and creatinine (interacting with the B2M measurement and gender) and the clearance-based PK-RRF model that incorporates the Cystatin C. DCA curves were smoothed using the non-parametric smoother "loess" prior to plotting.
In a final sensitivity analysis, we examined the impact of the prevalence rate of the low clearance rates on clinical utility. The prevalence of this state in the Vilar dataset was 70%, so we assumed the following prevalence rate in the sensitivity analysis: 0.1 (10%), 0.3 (30%), 0.5 (50%) and 0.9 (90%). These analyses are shown in Figure 6. There are several observations worth noting when Figures 5  and 6 are examined together. First, the PK-RRF had a positive net clinical utility for prevalence of the low clearance rate that <70% and across the range of the cost-benefit assessments (1:100 to 100:1) considered. The rule based on the Shafi equation appear to dominate the PK-RRF when the cost-benefit ratio and the prevalence of the low clearance rates were low, but not when either was moderate in size. For cost-benefit ratios >1, and for prevalence up to 70%, the PK-RRF prediction rules, dominated the Shafi rule. When the prevalence of the low clearance state was 90%, the Shafi and any of the calibrated PK-RRF rules had a higher clinical benefit than the thrice-weekly standard.
rules, dominated the Shafi rule. When the prevalence of the low clearance state was 90%, the Shafi and any of the calibrated PK-RRF rules had a higher clinical benefit than the thrice-weekly standard. Figure 6. Sensitivity analysis for the clinical usefulness of the rules for predicting the binary outcome of UrCl >2 mL/min, so as to offer patients the option opt-out from the thrice-weekly dialysis standard. Each panel assesses a different prevalence rate from 10% (top left) to 90% (bottom right) of the low clearance state. The rules considered were those based on the Shafi equation, the base and clearancebased PK-RRF, their calibrated ("cal") versions against the Vilar dataset, and two multibiomarker models that incorporated urea and creatinine, or cystatic C in addition to B2M. DCA curves were smoothed with the non-parametric smoother "loess" prior to plotting.

Discussion
In this paper, we utilized a novel population compartmental kinetic framework to derive a set of equations for the prediction of RRF in patients undergoing conventional high-flux HD or on-line HDF. Our equations were derived entirely by combining computer simulations with advanced statistical modeling and had extremely high discrimination when applied to a human dataset of measurements of RRF. A clearance-based equation that utilized predialysis and postdialysis B2M measurements, patient weight, treatment duration and ultrafiltration had higher discrimination than an equation previously derived in humans. Furthermore, the derived equations appear to have acceptable clinical usefulness for a wide range of likelihood of having low (<2 mL/min) residual urea clearance. Our analyses of clinical utility suggests that these formulas can support decisions about the dialysis prescription that depend on patients having preserved RRF, e.g., incremental dialysis for those with preserved RRF, or even be used to increase the frequency and time on dialysis in those with low RRF.
Compartmental models for biomarker kinetics are familiar to nephrologists, since they have been used to quantify dialysis dose for decades [59][60][61][62][63][64]. These models are mathematical descriptions of the processes of generation, distribution, and elimination that determine the concentration of the biomarker of interest. A population viewpoint extends the kinetic approach by allowing interindividual variation in these parameters. This interindividual variation allowed us to simulate the relation between B2M, RRF and the impact of the dialytic regimen in a manner that generalized Figure 6. Sensitivity analysis for the clinical usefulness of the rules for predicting the binary outcome of UrCl >2 mL/min, so as to offer patients the option opt-out from the thrice-weekly dialysis standard. Each panel assesses a different prevalence rate from 10% (top left) to 90% (bottom right) of the low clearance state. The rules considered were those based on the Shafi equation, the base and clearance-based PK-RRF, their calibrated ("cal") versions against the Vilar dataset, and two multibiomarker models that incorporated urea and creatinine, or cystatic C in addition to B2M. DCA curves were smoothed with the non-parametric smoother "loess" prior to plotting.

Discussion
In this paper, we utilized a novel population compartmental kinetic framework to derive a set of equations for the prediction of RRF in patients undergoing conventional high-flux HD or on-line HDF. Our equations were derived entirely by combining computer simulations with advanced statistical modeling and had extremely high discrimination when applied to a human dataset of measurements of RRF. A clearance-based equation that utilized predialysis and postdialysis B2M measurements, patient weight, treatment duration and ultrafiltration had higher discrimination than an equation previously derived in humans. Furthermore, the derived equations appear to have acceptable clinical usefulness for a wide range of likelihood of having low (<2 mL/min) residual urea clearance. Our analyses of clinical utility suggests that these formulas can support decisions about the dialysis prescription that depend on patients having preserved RRF, e.g., incremental dialysis for those with preserved RRF, or even be used to increase the frequency and time on dialysis in those with low RRF.
Compartmental models for biomarker kinetics are familiar to nephrologists, since they have been used to quantify dialysis dose for decades [59][60][61][62][63][64]. These models are mathematical descriptions of the processes of generation, distribution, and elimination that determine the concentration of the biomarker of interest. A population viewpoint extends the kinetic approach by allowing interindividual variation in these parameters. This interindividual variation allowed us to simulate the relation between B2M, RRF and the impact of the dialytic regimen in a manner that generalized from the simulated patients to the real-world clinical data. The resulting equations, which were entirely derived in artificial datasets, thus, had high discrimination when applied to data from actual patients. In fact, the performance of the PK-RRF equations rivalled the performance of equations derived entirely in human populations, e.g., AUC of 0.91 [21] and 0.84 [23]. Overall, our paper adds to the expanding literature, showing that plasma levels of middle molecules in general and B2M in particular [19,21,23] can predict the regulatory relevant threshold of RRF >2 mL/min on a par with the performance of the clinically accepted, validated troponin assays in the diagnosis of acute coronary syndromes (AUC: 0.84-0.94) [65].
The high performance of the PK-RRF equations must be attributed to the validity of the constructs used to derive them from first principles. These constructs include the bi-compartmental kinetics of the B2M, the population distribution [24] of the kinetic parameters of B2M, the effects of dialytic clearance [25] and finally the relation between the RRF and the clinical measurement (UrCl) used as its proxy. Despite the high theoretical validity of our approach, translation of the derived equations to the real world should be expected to not be entirely free of complications. For example, the equations appear to have low precision when applied to the development-simulated dataset (e.g., fewer than 15% of predicted clearances are within 0.5 mL/min of their simulated values). In the simulations, this low performance is driven mainly by combinations of kinetic parameters, e.g., "large" dialysis patients given "short" treatment times that are unlikely to be observed in the real world. These unlikely combinations were an unavoidable consequence of relying on group average values for the various kinetic parameters of the dialysis regimens as reported in the papers of the clinical trials we used. Despite this possibly large amount of noise present in the source dataset, the overall pattern between residual renal function and B2M was learned successfully by the modeling methods, leading to a real-world performance that was substantially better than the one noted in the simulations (e.g., the proportion of predicted UrClr that is within 0.5 and 2.0 mL/min of the actually measured one is in the order of 40% and 90%, respectively). Notwithstanding these encouraging observations, we feel there is still the need for calibration of the base and the clearance-based PK-RRF formulas. The degree of calibration required to predict RRF in a new dataset appears to be smaller than that required to adapt the high quality Shafi equation to the same external dataset. Recalibration of the PK-RRF equations did not materially affect their extremely high discrimination but did seem to have a positive impact on the clinical usefulness as assessed by DCA at least for the relatively high prevalence of patients with UrCl <2 mL/min in the Vilar dataset. Consequently, we feel that the calibrated version of the base PK-RRF equation should be used over the uncalibrated version. However, the uncalibrated clearance-based PK-RRF equation appears to perform equally well to the calibrated and multibiomarker equations when the population level prevalence of having low UrCl is less than 50% and either the calibrated or the non-calibrated version can be applied to future clinical studies. It should be noted that the Shafi rule also performed well in this setting, especially when the population prevalence of UrCl <2 mL/min was less than 30%.
Despite the success of middle molecules in predicting RRF, a puzzling feature of the literature to date [19,21,23] concerned the marginal success of multi-biomarker equations in the field. This was also noted in our study, which showed unimpressive improvements in discrimination and precision when CysC or simultaneous urea and creatinine measurements were used to recalibrate the B2M-based PK-RRF. Although there have been concerns that the involvement of B2M in the inflammatory response may confound the relationship between RRF and B2M [41,66], our approach to consider variable rates of generation in our large scale simulations probably allowed us to derive PK-RRF equations that are largely insensitive to variations in the generation rate. Hence, our formulas are unlikely to require additional biomarker measurements for robust performance. Despite these observations, we have noticed that some improvement in clinical utility may be derived by considering additional, readily available measurements, e.g., urea, creatinine or CysC. We also provided the analytical methodology to incorporate these biomarkers, as flexible GP embedded in the recalibration framework. This opens the possibility of incorporating additional, promising biomarkers, e.g., BTP [41,42] in datasets that have simultaneously measured all the aforementioned biomarkers.
In developing our equations, we were motivated by the unmet need for measurements that can facilitate further research in the area of RRF preservation and/or implementation of incremental, individualized forms of dialysis in practice [13,43]. Research in both areas is impeded by the lack of alternative techniques to measure RRF that do not require urine collections [19,21,23,41]. Prospective observational [67], and retrospective propensity score-matched studies [15] have shown that an incremental approach to dialysis frequency is not inferior with respect to mortality and may be associated with improved quality of life. Considering the direct treatment cost differential, i.e., biweekly dialysis has 2/3 dialytic costs than thrice-weekly dialysis, a formula that can identify patients with relative preserved RRF could have direct implications for both research and practice of "personalized dialysis". However, this research should take place within the boundaries of existing regulations for the dialysis industry. In fact, one of the barriers in practice incremental dialysis in the United States is a concern raised by regulators: the Center for Medicare Services explicitly considers twice-weekly dialysis to be inadequate in patients with RRF lower than 2 mL/min. By providing a PK-RRF equation that can predict this threshold with high discrimination and positive Net Benefit across the entire spectrum of the risk-benefit assessments and a prevalence of low UrCl, we feel that research in this space can proceed in an ethical and regulatory compliant manner.
Conversely, the proposed PK-RRF formulas could be used to identify individuals who would benefit from opting out the thrice-weekly standard, for longer or more frequent, nocturnal or quotidian dialysis. If dialysis frequency is a major driver of the cost-benefit ratio in a clinical usefulness analysis, a six time per week regimen would have a ratio that lies in the opposite direction from that of an incremental dialysis approach. Hence, our sensitivity analyses that show that the PK-RRF formulas have positive SNB across the entire range of the cost-benefit ratios and for a wide range of population prevalence of a low residual renal clearance, suggest that these formulas should be preferred when selecting patients for more frequent dialysis. In the latter situation, the cost-benefit ratio is not driven solely by the increased financial and time burden of treatment frequency, but also by the potential negative effects on the preservation of residual renal function [68]. Hence, it becomes critical that the prediction tool used to identify such patients, achieves a consistent performance across the entire cost-benefit ratio for small to medium prevalence of the low UrCl state. In the later situation, most of the misguided predictions of the UrCl prediction rule will be among the patients with preserved renal function, who may be likely to lose this function faster if they are dialyzed more frequently according to analyses in the Frequent Hemodialysis Network clinical trial dataset.
A few limitations of the developed PK-RRF equations should be kept in mind. First, the analytical complexity makes it unwieldly to write them down in the closed form, of the simpler equations they outperform. This is an unavoidable price to pay for the high discrimination of the PK-RRF equation. Nevertheless, we provide these equations as software programs in the open source R programming language and a web server in order to allow other investigators to replicate our results and to deploy them in other settings. Since these formulas were developed in simulations, the entire set of programs used to derive them can be (and has been) shared in its entirety to allow for transparent verification and subsequent replication. Second, the formulas were validated only in cross-sectional assessments and their use in repeated evaluations of the RRF of the same patient remain untested. This is an area of exploration in future studies. Third, the marginal improvement of the discrimination of the multi-biomarker models may reflect deficiencies in the biomarkers available for inclusion. In particular, urea, creatinine and cystatin C, the conventional serum biomarkers of estimating renal function in patients not on dialysis, exhibit large interdialytic variation in levels and thus may not provide the optimal additional biomarkers for patients receiving hemodialysis. Fourth, the source data did not include a subset of measurements in which the RRF rather than the UrCl was measured with gold standard techniques, i.e., iothalamate or iohexol clearance. Consequently, an important component of our modeling, i.e., the relation of B2M to the RRF could only be validated indirectly, by comparing the model's predictions against the proxy of the RRF, i.e., the UrCl. We do not view this as major limitation of any future applications, since a gold standard measurement of RRF is done very infrequently, if ever, in dialysis clinical practice and nearly all research to date in the space utilizes the UrCl. Fifth, our analyses also suggest the possibility of further improvement in the performance of these equations by incorporating measurements or proxies for the non-dialytic renal clearance or the generation rate. As of the present time, proxies for these processes remain largely unknown, so that the analyses in the appendix that use these values constitute a provocative thought experiment about the ultimate potential of the population kinetic approach. Finally, the recalibrated PK-RRF equations have an intermediate validation status since we did not have an additional dataset to test their performance.
This limitation does not extend to the uncalibrated version whose discrimination and clinical usefulness can be externally validated.
In summary, we have used computer simulations, the population kinetic approach and advanced statistical modeling to develop equations that can predict RRF (as assessed by UrCl) in patients undergoing maintenance HD or on-line HDF. These equations exhibit high discrimination and clinical usefulness when validated against an external, public clinical dataset. Recalibrated versions of these equations were developed in a cross-validation setting and are available for clinical use as well. Future studies should validate these equations in repeated assessments of the same patients and explore the utility of the PK-RRF equations as research tools in the areas of preservation of RRF and incremental, personalized dialysis.
Supplementary Materials: The following are available online at http://www.mdpi.com/2077-0383/8/12/2080/s1, Table S1: Kinetic parameters and B2 pre and post-dialysis levels in the simulated dataset. Table S2: Individual patient data from previous studies linking residual renal function to urea and creatinine urinary clearances. Funding: This study received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A Development of PK-RRF Equations and Assessment of Dialyzer Clearance
For these analyses, we split the simulation dataset into a development dataset (n = 13,333) and a validation (n = 6667) cohort. The basic PK-RRF emulator that incorporated only the B2M concentration achieved a moderate R2 and a high AUC (0.896) for classifying "patients" as having a RF >2 mL/min ( Figure A1, panel "B2M Only"). There was substantial variation in predicting the RRF of individual "patients" as evidenced by an IQR of 4.1 mL/min, while precision was also small, since only 12.9% and 49% of predictions were within 0.5 and 2 mL/min, respectively, of the predicted values. The incorporation of the postdialysis dialysis level and its interaction with the predialysis level (panel "B2M + Post + B2M * Post") did not materially improve precision or discrimination. Adding treatment time as a covariate (panel "B2M + Post + B2M * Post + T") increased AUC to >0.90. Nevertheless, the inclusion of an index of dialyzer clearance and treatment time did improve the classification performance and increased the AUC to 0.915 (panel "B2M + Post + B2M * Post + T + Kd + T * Kd").
In these analyses, the value of the dialytic clearance was assumed to be known, a condition that does not apply in real-world practice in which measurement of clearance does not take place. Hence, we assessed the performance of the Leypoldt equation against a flexible, GP equation that used the same variables (body weight, ultrafiltration, predialysis and postdialysis B2M). These analyses are shown in Figure A2. While both approaches were associated with small (median) bias in the overall population, the performance of the GP was much less variable and separated the clearances between HD and HDF. only 12.9% and 49% of predictions were within 0.5 and 2 mL/min, respectively, of the predicted values. The incorporation of the postdialysis dialysis level and its interaction with the predialysis level (panel "B2M + Post + B2M * Post") did not materially improve precision or discrimination. Adding treatment time as a covariate (panel "B2M + Post + B2M * Post + T") increased AUC to >0.90. Nevertheless, the inclusion of an index of dialyzer clearance and treatment time did improve the classification performance and increased the AUC to 0.915 (panel "B2M + Post + B2M * Post + T + Kd + T * Kd"). Figure A1. Indices of precision (P05/P02: probability of a prediction within 0.5 and 2 mL/min of the simulated RRF, IQR: interquartile range), model explanatory power (R2) and classification accuracy (AUC: Area Under the ROC curve) for predicting a RRF >2 mL/min across a range of models that included only the predialysis B2M concentration, the postdialysis level (Post), Dialysis Treatment Time (T), Dialysis Clearance (Kd). In models that incorporated statistical interactions between any two parameters, the interacting covariates are joined by the multiplication ("*") symbols. Blue curve: smoothed average prediction, orange line: the line of identity, i.e., the ideal situation in which the prediction (y-axis) is identical to the simulated value (x-axis). These analyses were carried out in simulated datasets of 10,000 patients receiving hemodialysis and 10,000 receiving on-line hemodiafiltration.
In these analyses, the value of the dialytic clearance was assumed to be known, a condition that does not apply in real-world practice in which measurement of clearance does not take place. Hence, we assessed the performance of the Leypoldt equation against a flexible, GP equation that used the same variables (body weight, ultrafiltration, predialysis and postdialysis B2M). These analyses are Figure A1. Indices of precision (P05/P02: probability of a prediction within 0.5 and 2 mL/min of the simulated RRF, IQR: interquartile range), model explanatory power (R2) and classification accuracy (AUC: Area Under the ROC curve) for predicting a RRF >2 mL/min across a range of models that included only the predialysis B2M concentration, the postdialysis level (Post), Dialysis Treatment Time (T), Dialysis Clearance (Kd). In models that incorporated statistical interactions between any two parameters, the interacting covariates are joined by the multiplication ("*") symbols. Blue curve: smoothed average prediction, orange line: the line of identity, i.e., the ideal situation in which the prediction (y-axis) is identical to the simulated value (x-axis). These analyses were carried out in simulated datasets of 10,000 patients receiving hemodialysis and 10,000 receiving on-line hemodiafiltration. shown in Figure A2. While both approaches were associated with small (median) bias in the overall population, the performance of the GP was much less variable and separated the clearances between HD and HDF.

Appendix B. Fitting Literature RRF and Urea/Creatinine Clearance Data
We fit the model of (1) to the literature data from a Bayesian perspective while assuming typical non-informative priors for all the unobserved quantities in the model. For the fitting, we utilized long Markov Chain Monte Carlo (MCMC) simulations which were carried out with the JAGS software [69,70]. Technical parameters of MCMC runs were as follows: the Mersenne-Twister [71] was used as

Appendix B Fitting Literature RRF and Urea/Creatinine Clearance Data
We fit the model of (1) to the literature data from a Bayesian perspective while assuming typical non-informative priors for all the unobserved quantities in the model. For the fitting, we utilized long Markov Chain Monte Carlo (MCMC) simulations which were carried out with the JAGS software [69,70]. Technical parameters of MCMC runs were as follows: the Mersenne-Twister [71] was used as the underlying random number generator, number of independent chains (n = 5), number of samples in the adaptive phase (n = 100,000) discarded (burnin) samples (n = 100,000), post-burnin samples (n = 100,000), thinning interval (n = 50) leading to 10,000 independent samples for each quantity of interest. Convergence of the MCMC was assessed by visual (trace) plots and the Gelman-Rubin diagnostic for multiple chains [72,73]. Although an overkill, this long MCMC simulation ensured that the Potential Scale Reduction factors in the Gelman-Rubin diagnostic were indistinguishable from one (indicating successful convergence), and Monte Carlo (numerical) error was at most 0.001 for all parameters of interest. Means and standard deviations were used to summarize samples from the posterior simulations for the slope and intercept parameters, since their non-parametric kernel density estimates were visually perceived to be symmetric, gaussian-like densities. R/JAGS code and data are available in the online repository set up for this project. Model estimates are shown in Table A1.

Appendix C External Validation of the PK-RRF in the Vilar Dataset via Linear Calibration Plots
A calibration analysis of the Shafi and clearance-based PK-RRF equation against the measured RRF is summarized in Table A2. Calibration analysis was conducted on the entire Vilar cohort of anuric and non-anuric patients. The PK-RRF demonstrated a small and statistically non-significant amount of bias in both HD and HDF. The Shafi equation had considerable bias (0.79 mL/min) in the HD subgroup. Calibration slopes for both models were different from the ideal value of one, yet they were closer to unity for the PK-RRF model. These analyses indicate the need for model re-calibration for both equations.   Dashed curve: non-parametric estimate of the calibration relationship between actual and predicted probability; grey line: ideal relationship (intercept of zero and slope of one). Dxy: Somer's rank correlation; C (ROC): AUC for discrimination; R2: Nagelkerke-Cox-Snell-Maddala-Magee R-squared index; D: discrimination index; U: unreliability index; Q: quality index; Brier: Brier score (average squared difference in predicted and actual probabilities); Emax/E90/Eavg: Maximum/90th quantile, average absolute difference in predicted and smoothed calibrated probabilities; S:z/S:p the z and two-sided p-value of the Spiegelhalter test for calibration accuracy. Graphs generated by the "val. prob" function in the RMS R-package. Due to minor differences in algorithms, there is a difference in the third significant decimal from the AUC values reported in Table 1 of the main text.

Appendix D Simulation Modeling of Beta 2 Microglobulin Kinetics during Hemodialysis
The kinetics of B2M during hemodialysis were based on the variable volume double pool model originally presented by Ward et al. [30] (Figure 2 for the differential equations). This system consists of a plasma/perfusing (P) and non-perfusing/non-plasma (NP). Generation (G) takes place in both compartments, in direct proportion to their fractional volumes. KD, KER, KR are the dialyzer clearance and extrarenal and residual renal clearances, respectively. The dialytic clearance incorporated diffusive/adsorptive and convective components for patients on hemodiafiltration. A population of patients was simulated by first assuming distinct values for parameters of the compartmental system for each individual patient. Using these values, we coded the system of the differential equations ( Figure 2) in the C programming language. The code (shown in Code D1) maps one to one to the corresponding equations ( Figure 2). This function was compiled into a shared library that was dynamically linked into R in order to undertake the simulations (one for each of the thousand distinct combination of parameters used in this paper). In particular, the b2mDialWard driver function was repeatedly called by the LSODA ordinary differential equation solver, in order to calculate the trajectories of B2M over a three-month period, allowing time to be discretized up to 0.5 s. population of patients was simulated by first assuming distinct values for parameters of the compartmental system for each individual patient. Using these values, we coded the system of the differential equations (Figure 2) in the C programming language. The code (shown in Figure A4) maps one to one to the corresponding equations ( Figure 2). This function was compiled into a shared library that was dynamically linked into R in order to undertake the simulations (one for each of the thousand distinct combination of parameters used in this paper). In particular, the b2mDialWard driver function was repeatedly called by the LSODA ordinary differential equation solver, in order to calculate the trajectories of B2M over a three-month period, allowing time to be discretized up to 0.5 s.