Prediction of Acute Kidney Injury after Liver Transplantation: Machine Learning Approaches vs. Logistic Regression Model

Acute kidney injury (AKI) after liver transplantation has been reported to be associated with increased mortality. Recently, machine learning approaches were reported to have better predictive ability than the classic statistical analysis. We compared the performance of machine learning approaches with that of logistic regression analysis to predict AKI after liver transplantation. We reviewed 1211 patients and preoperative and intraoperative anesthesia and surgery-related variables were obtained. The primary outcome was postoperative AKI defined by acute kidney injury network criteria. The following machine learning techniques were used: decision tree, random forest, gradient boosting machine, support vector machine, naïve Bayes, multilayer perceptron, and deep belief networks. These techniques were compared with logistic regression analysis regarding the area under the receiver-operating characteristic curve (AUROC). AKI developed in 365 patients (30.1%). The performance in terms of AUROC was best in gradient boosting machine among all analyses to predict AKI of all stages (0.90, 95% confidence interval [CI] 0.86–0.93) or stage 2 or 3 AKI. The AUROC of logistic regression analysis was 0.61 (95% CI 0.56–0.66). Decision tree and random forest techniques showed moderate performance (AUROC 0.86 and 0.85, respectively). The AUROC of support the vector machine, naïve Bayes, neural network, and deep belief network was smaller than that of the other models. In our comparison of seven machine learning approaches with logistic regression analysis, the gradient boosting machine showed the best performance with the highest AUROC. An internet-based risk estimator was developed based on our model of gradient boosting. However, prospective studies are required to validate our results.


Introduction
Analytics for predicting postoperative morbidity has been limited to the classical statistic techniques, such as logistic regression analysis and the Cox proportional hazard model. However, these models require the statistical assumption of the independent and linear relationship between explanatory and outcome variables.
Furthermore, limitations of overfitting and multicollinearity of the regression analysis preclude the analysis of a large number of variables. These limitations have made prediction models to select a small number of variables that are known to be clinically relevant.
Recently, novel machine learning techniques have demonstrated improved predictive performance compared to classical statistical methods limited to logistic regression. For example, there have been reports of using machine learning techniques to predict postoperative clinical outcomes including specific morbidity or in-hospital mortality [1][2][3]. Compared to the logistic regression or Cox proportional hazard model, reports using machine learning techniques have shown lower prediction error. For acute kidney injury, previous studies demonstrated that machine learning techniques have excellent performance or better performance compared to logistic regression analysis in hospitalized patients [4] or patients undergoing major surgery [5]. However, although previous studies used different techniques of machine learning, including neural network [1,2], random forest [3], support vector machine [5], and gradient boosting machine [4], a performance comparison among these specific techniques of machine learning has rarely been conducted.
Postoperative acute kidney injury (AKI) is an important complication after liver transplantation which is associated with poor graft survival and increased mortality [6][7][8][9]. To find risk factors and develop a risk prediction model, many studies have reported using classical regression methods [6,[10][11][12][13][14][15]. Although several risk factors have been identified [6][7][8][9][10][11][12][13][14][15], their performance was rarely reported [6] regarding the area under the receiver-operating characteristics curve (AUROC) [13], which is the primary measure of the prediction model [16]. Furthermore, previous studies did not include a sufficient number of variables due to multi-collinearity. In addition, the possible non-linear relationship between explanatory variables and the outcome variable could not be fully considered. However, machine learning techniques are relatively free of these limitations of conventional statistical analysis and may demonstrate better performance. If machine learning techniques have a better performance to predict AKI, risk prediction at the end of surgery could be possible with readily-available patient data from electronic medical records. Therefore, firstly, we attempted to compare the performance of the prediction of acute kidney injury after liver transplantation by machine learning techniques with the prediction by multivariable logistic regression. We hypothesized that the prediction by machine learning techniques may have better performance than logistic regression. Secondly, we sought to compare the performance of difference machine learning techniques used in the previous studies at the same time. Techniques including gradient boosting machine, random forest, decision tree, support vector machine, naïve Bayes, neural network, and deep belief network were used in our analysis. Third, we planned to build a risk estimator which could be used in the daily practice based on our best prediction model by machine learning technique.

Study Design
This retrospective observational study was approved by the institutional review board of Seoul National University Hospital (1805-137-948). We retrospectively reviewed the electronic medical records of 1398 consecutive patients, who underwent living donor liver transplantation (LDLT) or deceased donor liver transplantation (DDLT) at our institution between November 2004 and December 2015. Informed consent was waived because of the study's retrospective design. Pediatric cases (n = 152) and those with missing baseline serum creatinine (n = 35) were excluded and the remaining 1211 cases were analyzed.

Anesthesia and Surgical Techniques
Anesthesia was induced with propofol and maintained with sevoflurane, remifentanil, and rocuronium. Volume controlled ventilation was maintained with a tidal volume of 6-8 mL/kg. Arterial-line catheters were inserted into the radial and femoral arteries. A pulmonary artery catheter was inserted routinely, and continuous cardiac index and right ventricle-associated variables were monitored. Ephedrine and continuous infusion of dopamine and/or norepinephrine and/or epinephrine were used to treat hypotension according to the monitored cardiac index, mixed venous oxygen saturation (SvO 2 ) and systemic vascular resistance (SVR).
Donor grafts were prepared with a histidine-tryptophan-ketoglutarate solution. The piggyback technique was used to anastomose the graft and donor vessels. End-to-end anastomosis of the hepatic artery and duct-to-duct anastomosis of the bile duct were performed in succession. During surgery, immunosuppression was induced with basiliximab 20 mg (Simulect, Novartis Pharma B.V., Arnhem, The Netherlands) and methylprednisolone 500 mg (Solumedrol, Pfizer, Ballerup, Denmark). Postoperative immunosuppression was initiated with a calcineurin inhibitor of tacrolimus with mycophenolate mofetil on the first postoperative day.

Data Collection
Based on the previous literature, data related to demographic or perioperative variables known to be related to postoperative renal dysfunction were collected from the institutional electronic medical record (Table 1) [6,7,10,13,17]. Preoperatively, the Model for End-stage Liver Disease (MELD) score, the Child-Turcotte-Pugh (CTP) score, and the Child-Pugh classification were determined for all patients [18]. Data on patient demographics, history of hypertension, diabetes mellitus, other baseline medical history, ABO blood type incompatibility, baseline laboratory findings including preoperative serum albumin were collected. Data on surgery and anesthesia-related variables including warm and cold ischemic time, graft-recipient body-weight ratio (GRWR), intraoperative estimated blood loss, intraoperative transfusion, amount of infused crystalloid, hydroxyethyl starch, and albumin levels were collected. Data on intraoperative laboratory and hemodynamic variables were collected. Hemodynamic variables included mean arterial pressure, mean pulmonary arterial pressure, central venous pressure, SvO 2 , cardiac index, and SVR. These hemodynamic variables were collected at the following eight time points: after anesthesia induction, 1 h after the end of anesthesia induction when pulmonary artery catheter was inserted, 10 min after the beginning of the anhepatic phase, 5 min before and after graft reperfusion, 20 min after reperfusion, 5 min after the completion of biliary reconstruction, and at the end of surgery. These hemodynamic variables were then averaged and entered into the analysis.   The primary outcome variable was postoperative AKI defined by acute kidney injury network (AKIN) criteria. Postoperative AKI was determined based on the maximal change in sCr level during first two postoperative days [19]. The most recent sCr measured before surgery was used as a baseline. The AKIN serum creatinine criteria are shown in Supplemental Table S1. Urine output criteria were not used because different cutoff of oliguria may be required for AKI after surgery [20,21]. Stage 2 or 3 AKI were used as secondary outcomes because higher stages of AKI is more strongly associated with patient mortality and stage 1 AKI may be only functional and transient [22].
A total of 72 explanatory variables including the variables in Table 1 were used for machine learning. Before developing prediction models, our collected data were divided into 70% of training dataset cases and 30% of test dataset cases. The cases in the training dataset were used for developing machine learning and logistic regression model. The cases in the testing dataset was used for validating and comparing the performance of the models developed in the training dataset. Each machine learning method had its own hyperparameters such as the number of layers in neural network or number of trees in random forest. To find the optimal hyperparameters, a 10-fold cross-validation was used. This cross-validation process was used only for developing the model and performance of the developed final models was evaluated in the testing dataset. All possible combinations of hyperparameters were investigated by grid search (a list of investigated hyperparameters are provided in Supplemental Text S1. The hyperparameters with the highest average validation AUROC were considered as optimal hyperparameters. After that, the final model of each technique was re-fitted with optimal hyperparameters and the entire training dataset. The test dataset was used only for testing the final model's performance.
In the neural network model and the support vector machine, the values of each variable were normalized using the mean and standard deviation of the training dataset. The maximum and minimum values of the training dataset were used for scaling values between 0 and 1 in the deep belief network. In the logistic regression, naïve Bayes and tree models, normalization was not performed because it was not necessary.
A multivariable logistic regression analysis including variables in Table 1 was performed to identify independent predictors that established a multivariable prediction model. The variables that were closely correlated were excluded before the multivariable analysis-e.g., MELD score and preoperative serum albumin level-to avoid multicollinearity. Backward stepwise variable selection was used using a cutoff of p < 0.10. As a sensitivity analysis, logistic regression analysis was performed without stepwise variable selection.
Missing data were present in <5% of records. We imputed the missing values according to the incidence of missing. If the incidence of missing was <1%, the missing was substituted by the mean for continuous variable and by the mode for incidence variable. Missing values of variables with a ratio of missing was >1% and <5% were replaced by hot-deck imputation, where a missing value was imputed from a randomly selected value of the variable. Our primary analysis attempted to compare the prediction ability of machine learning approaches to logistic regression model in terms of AUROC [25]. We also compared the accuracy, which was defined as the sum of the number of cases with true positive and true negative results divided by total number of test set.
Finally, a risk estimator based on our model with the best performance (https://vitaldb.net/aki_liver). This estimator calculates the risk of developing AKI after liver transplantation (from 0 to 1 value) and classifies the risk into three classes of the low, moderate, and high risk of AKI.

Results
A total of 1211 cases including 367 (30.3%) deceased donor and 844 (69.7%) living donor liver transplantation were included in our analysis. During the first two postoperative days, AKI, as determined by AKIN criteria, was observed in 365 patients (30.1%), and stage 2 or 3 AKI developed in 76 patients (6.3%). The incidence of AKI was 26.1% (220/844) for LDLT patients and 39.5% (145/367) for DDLT patients. The incidence of stage 2 or 3 AKI was 5.3% (45/844) for LDLT and 8.4% (31/367) for DDLT. Patient demographics and surgery and anesthesia-related variables in both training and test set are presented in Table 1.
The optimal hyperparameters found in a 10-fold cross-validation and the accuracy and AUROCs of the test set were shown in Table 2. Gradient boosting machine showed the largest test AUROC (0.90, 95% confidence interval [CI] 0.86-0.93) and highest accuracy (84%). The deep belief network classifier showed the smallest test AUROC (0.59) with the low accuracy (65%). The results of logistic regression analysis with and without stepwise variable selection are shown in Table 3 and Table S2. Seven variables were selected as independent predictors. AUROC of the multivariable logistic prediction model for the test dataset was 0.61 (95% confidence interval [CI] 0.56 to 0.66).  Multivariable logistic regression analysis was performed using all the variables with p < 0.2 in univariate logistic analysis.
Stepwise backward variable selection process was used for this analysis using a cutoff of p-value of less than 0.10. Nagelkerke R 2 statistic was 0.163. Hosmer and Lemeshow goodness of fit test was not significant at 5% (p = 0.701). GRWR = graft-recipient body-weight ratio, SvO 2 = mixed venous oxygen saturation. Table 3 and Figure 1 show the comparison of test AUROC to predict AKI of all stages according to the model. The AUROC of gradient boosting machine was significantly larger than for all other models (AUROC 0.90, 95% CI 0.86 to 0.93, all p < 0.001). The AUROC of the decision tree and random forest techniques showed better performance than that of other models but worse than the gradient boosting machine. The AUROCs of the support vector machine, naïve Bayes, neural network, and deep belief network were similar to that of logistic regression model but significantly smaller than that of other machine learning models (p < 0.001). Supplemental Table S3 shows the comparison test of the AUROC of our models to predict stage 2 or 3 AKI. The AUROC of the gradient boosting machine was also significantly larger than the AUROCs of all the other models, except random forest.  The importance matrix plot for gradient boosting machine is shown in Figure 2 and cold ischemic time and intraoperative mean SvO 2 were ranked first and second. The stepwise binary classification criteria of the best decision tree model with a depth of 5 are shown in Figure 3.

Discussion
We compared the predictive ability of 7 machine learning models and a logistic regression model to predict AKI after liver transplantation. The result showed that gradient boosting machine has the largest AUROC and highest accuracy to predict both AKI of all stages and stage 2 or 3 AKI. The authors previously reported the superior predictive ability by the gradient boosting for AKI after cardiac surgery [26]. Our study results are consistent with the previous study with different analysis tool. We also developed an internet-based risk estimator based on our gradient boosting model. This estimator should be prospectively validated for its clinical use to determine the risk of AKI at the end of liver transplantation surgery. Further prospective multicenter trials are required to validate the better performance of gradient boosting.

Discussion
We compared the predictive ability of 7 machine learning models and a logistic regression model to predict AKI after liver transplantation. The result showed that gradient boosting machine has the largest AUROC and highest accuracy to predict both AKI of all stages and stage 2 or 3 AKI. The authors previously reported the superior predictive ability by the gradient boosting for AKI after cardiac surgery [26]. Our study results are consistent with the previous study with different analysis tool. We also developed an internet-based risk estimator based on our gradient boosting model. This estimator should be prospectively validated for its clinical use to determine the risk of AKI at the end of liver transplantation surgery. Further prospective multicenter trials are required to validate the better performance of gradient boosting.
For the prediction of a non-linear relationship, the gradient boosting machine builds a sequential series of decision trees, where each tree corrects the residuals in the predictions made by the previous tress. After each step of boosting, the algorithm scales the newly added weights, which balances the influence of each tree. In addition, the gradient boosting machine uses techniques that aim to reduce overfitting by only a random subset of descriptors in building a tree [23]. The impact of gradient boosting machine has been recognized in a number of machine learning and data mining challenges. Our results also demonstrated that the gradient boosting machine appears to be a very effective and efficient machine learning method.
Although its predictive ability was less than that of the gradient boosting machine, the performance of random forest was also better than that of logistic regression in our dataset. Random forest is an extension of traditional decision tree classifiers with an ensemble technique [27]. Each tree is constructed using a random subset of the original training data and a random subset of the explanatory variables. Random forest can minimize overfitting by making the decision by voting of these randomly generated trees [28]. However, this advantage of random forests seems much more effective when many variables and a large number of datasets are used for learning. The results of the present study showed no significant performance gain over the simple decision tree model.
Simple decision tree analysis showed a good performance in our study. A decision tree is a hierarchical model that recursively splits the dataset based on the Gini impurity or entropy [29][30][31]. A decision tree could have better prediction ability than logistic regression model under certain circumstances because it uses different variable and threshold in every branch. The greatest advantage of the decision tree model is that it gives interpretable decision rules after training. However, the deeper the depth of the tree, the more difficult it is to interpret, and the risk of overfitting also increases.
Although multilayer perceptron showed a good performance to predict in-hospital mortality in a previous study [1], the performance of the neural network and deep belief network in our study was inferior to those of all other machine learning techniques. The reason for this may be explained by the fact that the relationship between the explaining variables and outcome variable is largely non-linear. Although multilayer perceptron is able to approximate any nonlinear function, a large number of learning data are required. Therefore, our dataset might be not large enough to train the multilayer perceptron [32].
There have been some reports that performance of the machine learning techniques are not superior to conventional risk score or the logistic regression model to predict mortality [1,33]. However, in these studies, they compared the performance to predict in-hospital mortality in a study sample with a low incidence (<1%) [1,33]. Therefore, the difference might not be statistically proven. In this study, it could be compared easily with a higher incidence of the outcome variable (30.3%).
Cold ischemic time and intraoperative mean SvO 2 were considered to be the most important variables to classify the development of AKI by the gradient boosting machine and decision tree [34,35]. This may be attributed to the fact that our study sample consisted of mixed populations including both DDLT and LDLT patients. The wide variability in cold ischemic time between deceased donor and living donor and a higher incidence of AKI in DDLT may have produced the high discriminative power of cold ischemic time. Therefore, cold ischemic time may be less important for prediction of AKI after either DDLT or LDLT. A low SvO 2 suggests poor oxygen delivery to the major organs including the kidney [36]. During liver transplantation, clamping of inferior vena cava and intraoperative bleeding decrease cardiac output and major organ perfusion, resulting in poor oxygen delivery, especially during graft reperfusion [37]. Therefore, SvO 2 may be an important hemodynamic goal. Further prospective trials may evaluate the effect of the optimizing intraoperative SvO 2 to reduce the risk of AKI after liver transplantation.
Our study has several limitations. First, our analysis used only a small number of cases from data derived from an Asian single-center with mixed living and deceased donor transplantation. The performance of machine learning techniques might be different when they are applied to a sample of different institution with a different distribution of covariates. The race difference might be a significant predictor of AKI, which could not be evaluated in our dataset. External validity of our prediction model may be limited. By machine learning approach using each institution's specific data set for training, each institution could obtain their own specific model best fit for their institution. We uploaded the source code for learning the gradient boosting model used in this study as a Supplemental Text S2. Therefore, each institution may develop their own prediction model by machine learning approach using their historical electronic medical record data and update their model periodically. The real-time processing of patient data would yield risk prediction for each patient after surgery. Second, the results from our machine learning models are more difficult to interpret than the results from the logistic regression model [3]. However, gradient boosting machine and decision tree provided for the modest interpretability through the variance importance plot and the inspection of the decision rule in tree nodes. Third, it is not certain that our results could translate into improved clinical outcomes for the patients undergoing liver transplantation. Many of our variables of importance are not clinically modifiable, and accurate risk prediction may not be followed by improved patient outcomes. However, a further prospective trial should evaluate whether the adjustment of hemodynamic variables including SvO 2 could decrease the incidence of AKI.

Conclusions
In conclusion, our study demonstrated that a machine learning model with gradient boosting machine, random forest, and decision tree showed better performance than the traditional logistic regression model to predict AKI after liver transplantation. Among these models, the gradient boosting machine showed the best performance with the highest AUROC. An internet-based risk estimator was developed based on our gradient boosting model. This estimator could help a clinician to predict AKI at the end of surgery. Since gradient boosting machines can be used in real-time prediction, further studies are required to prospectively validate our results and improve clinical outcomes after liver transplantation.
Supplementary Materials: The following are available online at http://www.mdpi.com/2077-0383/7/11/428/s1, Supplemental Text S1: Investigated hyperparameter in each model; Supplemental Text S2: Python source code for learning the gradient boosting model used in our study; Supplemental Table S1: AKIN (acute kidney injury network) serum creatinine diagnostic criteria of acute kidney injury; Supplemental Table S2: Results of multivariable logistic regression analysis for acute kidney injury without stepwise variable selection; Supplemental Table S3: Comparison of area under receiver-operating characteristic curve among the different models for predicting stage 2 or 3 acute kidney injury.