Artificial Intelligence—A Tool for Risk Assessment of Delayed-Graft Function in Kidney Transplant

Delayed-graft function (DGF) might be responsible for shorter graft survival. Therefore, a clinical tool predicting its occurrence is vital for the risk assessment of transplant outcomes. In a single-center study, we conducted data mining and machine learning experiments, resulting in DGF predictive models based on random forest classifiers (RF) and an artificial neural network called multi-layer perceptron (MLP). All designed models had four common input parameters, determining the best accuracy and discriminant ability: donor’s eGFR, recipient’s BMI, donor’s BMI, and recipient–donor weight difference. RF and MLP designs, using these parameters, achieved an accuracy of 84.38% and an area under curve (AUC) 0.84. The model additionally implementing a donor’s age, gender, and Kidney Donor Profile Index (KDPI) accomplished an accuracy of 93.75% and an AUC of 0.91. The other configuration with the estimated post-transplant survival (EPTS) and the kidney donor risk profile (KDRI) achieved an accuracy of 93.75% and an AUC of 0.92. Using machine learning, we were able to assess the risk of DGF in recipients after kidney transplant from a deceased donor. Our solution is scalable and can be improved during subsequent transplants. Based on the new data, the models can achieve better outcomes.


Introduction
Delayed-graft function (DGF) after a kidney transplantation (KTx) refers to an acute kidney injury (AKI) requiring at least one dialysis session within the first week after surgery. It is associated with prolonged hospitalization, higher rates of acute rejection, and, therefore, shorter graft survival [1][2][3]. The incidence of DGF is rising due to an increasing employment of kidneys procured from extended criteria donors, caused by organ shortages [4,5]. Based on US data, its prevalence is about 30.8% among deceased donors and it is significantly higher in procurement after circulatory death (DCD), at around 45-55.1% [3].
The ability to predict DGF is crucial in decision-making processes at the time of transplantation, including declining the offer, selecting a recipient with a lower DGF risk, implementing efforts to shorten the cold ischemia time (CIT), or postponing an initiation of treatment with calcineurin inhibitors (CNIs). A clinical tool allowing for the anticipation of DGF might be vital for the outcome. The most widely used tool, with about 70% accuracy, is the Irish risk calculator, which is based on 20 recipient-and donor-derived risk factors; it can identify the most significant aspects such as CIT, a donor's terminal creatinine concentration, a donor's body mass index (BMI), procurement after DCD, and a donor's age [6].
In 1959, Arthur Samuel defined the term "machine learning" as a field of study that gives computers the ability to learn without being explicitly programmed [7]. The premise of machine learning (ML) is to mimic human brain activity, including decision-making processes, such as recognizing or classifying objects. This might be accomplished by optimization and generalization. Optimization means the search for model parameters, and a loss function is as small as these parameters can be. The aim is to keep this value as low as permissible on new data. The model is designed to formulate conclusions from the training data, so that it can generate effective results from the test data. The program gains experience by learning new information and, at the same time, improves its parameters to adapt to the new knowledge. ML algorithms also need relevant data to learn from.
In a classical statistical analysis, independent variables are examined. ML may be more effective when introduced variables are calculated from other variables, such as BMI, which is derived from height and weight, as well as partially dependent parameters. Results received from other parameters may facilitate the finding of an optimal model with a sufficiently discriminant power. However, their use increases the complexity and computational cost of the algorithm [8].
In our work, we verified the factors that may be responsible for the occurrence of DGF with the use of information technology, particularly the branch of artificial intelligence called machine learning.

Data Collection
The study was conducted in a single transplant center. Retrospective patient data, both donors and recipients, who underwent KTx between 2012 and 2018, were included in the study. DGF was defined as the need for at least one dialysis during the first week after transplantation.
The following donor-derived data were collected: age, gender, weight, height, serum creatinine concentration (sCr) directly before procurement, minimal sCr during hospitalization in an intensive care unit (ICU), presence of diabetes (DM) or arterial hypertension (HTN) in medical history, cause of death, the length of stay in ICU, and the need for catecholamine use. We considered donors' minimal sCr (sCr min) and sCr before procurement as indicators of organ damage [9]. The estimated glomerular filtration rate (eGFR) was determined by the modification of diet in renal disease (MDRD) formula [10]: Accordingly, minimal eGFR and eGFR before procurement, which were sCr derived according to the above formula, served as alternative indicators of damage to the transplanted organ.
Based on donor-derived data, the kidney donor risk index (KDRI) and the kidney donor profile index (KDPI), at the time of procurement, were assessed using the organ procurement and transplantation network (OPTN) online calculator.
KDRI is an indicator that measures the risk of failure after a KTx. The parameter combines 10 donor-derived variables: age, height, weight, ethnicity/race, presence of HTN, DM, cause of death, sCr, HCV status, and donation after circulatory death [11,12]. KDPI stands for cumulative risk of kidney donor failure and allows one to effectively assess the risk of DGF; it is a good indicator of transplant and recipient survival, and it also discriminates against cardiac causes of death with cerebrovascular death [12,13].
Recipient data were also included: age, gender, height, weight, BMI, residual diuresis, presence of DM or HTN, type of renal replacement therapy (RTT) before transplantation and its duration, the use of basiliximab as an induction therapy, the KTx number, length of stay (LOS), and sCr at discharge. The estimated post-transplant survival (EPTS) for each recipient was calculated, using 4 variable parameters: age, duration of RTT, presence of DM, and history of previous solid organ transplantations (kidney, pancreas, liver, lung, heart, intestine) [14].

Statistical Scoring
Our task was to find a model that, using the available database, had the best performance. Models were evaluated by 10-fold cross validation against accuracy. We based the final assessment of effectiveness on AUROC, accuracy, precision, recall, and F1-Score. Derivations are counted for each of the classes (0 or 1) and further expressed as arithmetic mean and weighted mean.

Machine Learning Models
In our previous work, we analyzed 5 types of classifiers. This time, we focused on the two most effective ones: ANN and random forest classifiers [15].
The random forest (RF) classifier creates decision trees for the training set samples and finally averages the results. This helps to avoid overfitting, i.e., too rigid an adjustment of the added training sessions and a drop in efficiency in the test data. A practical feature of this method enables the visualization of data in the form of a decision tree, which might sometimes be very complex. It may be clearly seen how the decision-making process was carried out by the algorithm.
An artificial neural network is a collection of units, called neurons, that are connected to each other in such a way as to form input, hidden layers, and output layers. The way in which neurons communicate mimics the connection of neurons in the brain. Activation function for the hidden layer is ReLu, the rectified linear unit function, which returns 0 for negative values and input value for non-negative: f (x) = max (0, x). Neural networks are more complex and have more coefficients inscribed in the relevant elements of their structure. Each neuron in a single MLP layer stores the values of the connection weights with the perceptrons of the previous layer, i.e., it has connections with each neuron of the previous layer. The advantage of neural networks is that they may be learned from new data without starting from scratch. By partial fitting of new data, the existing neural network can be overwritten with new weights. At the same time, complexity and sensitivity to data normalization are serious drawbacks of neural networks. Random forest can be built faster than scratch on new data, and it is not sensitive to data scaling and normalization.
The complexity of algorithms significantly affects their practical application, especially in the process of building predictive models. The notation with a big O is used, which means that the complexity of the model is not greater than a certain mathematical function multiplied by a positive real number. Complexity may refer to the time required to build the model, the computer memory consumed, or the amount of time a program runs until a result is obtained. The complexity of training the random forest classifier is O (M·m·n·log(n)), where M is the number of decision trees in the random forest, m is the number of variables, and n is the number of samples in the training set [16]. This means that reducing the number of input parameters will shorten training time by half. Doubling the number of samples in the 1000 patient database will cause the algorithm to train for approximately 2.2 times longer. An MLP neural network has complexity O (n·m·o·i·h1·h2), where n is the number of samples in the training set, m is the number of input features, and "o" is the number of predicted classes, e.g., absence or presence of DGF. The sizes of the hidden layers are h1 and h2, respectively, and they denote the number of iterations leading to the best model [17,18]. This means that scaling a model from 25 neurons in 2 hidden layers to 125 in each of them increases the training complexity 25-fold.
The initial database was randomly divided into two sets: training and testing, in a ratio of 80:20. At each step of the algorithm, the program constructed a subset regarding the analyzed variables in a recursive manner. The original number of variables was recursively reduced towards the optimal subset. In each algorithm loop, the program built a model based on training data and checked its effectiveness. The training set was used to find the best model hyperparameters using 10-fold cross validation against accuracy. The model was selected based on cross validation, and statistics against test data, including AUROC, accuracy, precision, recall, and F1-Score, were saved to program memory. After checking all subsets, the program sorted the results and selected the best models. Figure 1 illustrates the block diagram of the recursive algorithm allowing the development of the models. number of variables, and n is the number of samples in the training set [16]. This mean that reducing the number of input parameters will shorten training time by half. Doublin the number of samples in the 1000 patient database will cause the algorithm to train fo approximately 2.2 times longer. An MLP neural network has complexity O (n·m·o·i·h1·h2 where n is the number of samples in the training set, m is the number of input feature and "o" is the number of predicted classes, e. g., absence or presence of DGF. The sizes o the hidden layers are h1 and h2, respectively, and they denote the number of iteration leading to the best model [17,18]. This means that scaling a model from 25 neurons in hidden layers to 125 in each of them increases the training complexity 25-fold. The initial database was randomly divided into two sets: training and testing, in ratio of 80:20. At each step of the algorithm, the program constructed a subset regardin the analyzed variables in a recursive manner. The original number of variables was recu sively reduced towards the optimal subset. In each algorithm loop, the program built model based on training data and checked its effectiveness. The training set was used t find the best model hyperparameters using 10-fold cross validation against accuracy. Th model was selected based on cross validation, and statistics against test data, includin AUROC, accuracy, precision, recall, and F1-Score, were saved to program memory. Afte checking all subsets, the program sorted the results and selected the best models. Figur 1 illustrates the block diagram of the recursive algorithm allowing the development of th models. Figure 1. The database is divided into a training and testing set. The training set is the initial database from which we recursively removed the columns, thus changing the size of the input data vector (the number of input features). The model was then built and hyperparameters were adjusted. The performance was saved to a file for later analysis. The analyzed subsets have at least 2 input variables. The database is divided into a training and testing set. The training set is the initial database from which we recursively removed the columns, thus changing the size of the input data vector (the number of input features). The model was then built and hyperparameters were adjusted. The performance was saved to a file for later analysis. The analyzed subsets have at least 2 input variables.

Study Population Baseline Characteristics
In our study, we included data of 157 organ recipients aged 19-72 years (50.55 ± 13.08) and 88 donors aged 18-69 years (46.38 ± 14.02). From each of the 69 donors, two kidneys were procured, and one kidney was procured from 19 donors.
All evaluated parameters and variables are listed in Table 1. The input database was randomly divided into two cohorts: training and testing. Most donors (69 out of 88) provided a total of 138 records for each transplant procedure and 19 donors provided individual records. In total, the database contains 157 records with data. The characteristics of the divided groups are described in Table 2.  All possible subsets of the input parameters were recursively selected, with a minimum size of two parameters. The best performance was achieved by models based on at least four key parameters: donor's BMI, recipient's BMI, recipient-donor weight difference, and donor's eGFR before procurement using random forest classifier and MLP with an accuracy of 84.38%. Models with fewer input variables were completely ineffective. The best models we found required the above mentioned four variables plus EPTS, KDRI, KDPI, recipient's gender, or recipient's age, and the result was the random forest and MLP models, which are summarized in Figure 2.
Characteristically, the top models have a similar set of input features, and differences in performance are similar. The best random forest classifier models require the following input features to achieve the discriminant power of a given AUC of 0.91 (AUC 0.92): donor's BM, recipient's BMI, recipient-donor weight difference, and donor's eGFR, as well as additional variables: a pair of EPTS% and KDRI or a triplet of KDPI recipient's gender, recipient's age.
The first model is a classifier based on the random forest method. Input parameters included: donor's BMI, recipient's age, recipient's gender, donor's eGFR before procurement, KDPI, recipient-donor weight difference, and recipient's BMI, with an accuracy of 93.75%, precision of 0.9375 and recall of 0.9375. Figure 3 illustrates the decision tree of this model. mum size of two parameters. The best performance was achieved by models based on at least four key parameters: donor's BMI, recipient's BMI, recipient-donor weight difference, and donor's eGFR before procurement using random forest classifier and MLP with an accuracy of 84.38%. Models with fewer input variables were completely ineffective. The best models we found required the above mentioned four variables plus EPTS, KDRI, KDPI, recipient's gender, or recipient's age, and the result was the random forest and MLP models, which are summarized in Figure 2. Characteristically, the top models have a similar set of input features, and differences in performance are similar. The best random forest classifier models require the following input features to achieve the discriminant power of a given AUC of 0.91 (AUC 0.92): donor's BM, recipient's BMI, recipient-donor weight difference, and donor's eGFR, as well as additional variables: a pair of EPTS% and KDRI or a triplet of KDPI recipient's gender, recipient's age. as additional variables: a pair of EPTS% and KDRI or a triplet of KDPI recipient's gender, recipient's age. The first model is a classifier based on the random forest method. Input parameters included: donor's BMI, recipient's age, recipient's gender, donor's eGFR before procurement, KDPI, recipient-donor weight difference, and recipient's BMI, with an accuracy of 93.75%, precision of 0.9375 and recall of 0.9375. Figure 3 illustrates the decision tree of this model. Figure 3. Random forest classifier illustrated with a decision tree graph. Each node has a condition; if the condition is met, it goes to the child branch on the left, otherwise to the right branch. The more uniform the color, the clearer the node is in relation to the samples it contains. Input features include donor's BMI, recipient's age, recipient's gender, donor's eGFR before procurement, KDPI, recipientdonor weight difference, recipient's BMI.
The nodes contain conditions, the fulfillment of which means moving to the left child branch in the decision tree. Otherwise, the right child node is selected. The intensity of the color means that the knot is class-uniform. End nodes uniquely defining y one of the end labels, i.e., 0 or 1, are completely homogeneous. Each node is a data break point. The functional composition of such divisions is the fundamental of the classifier's operation on Figure 3. Random forest classifier illustrated with a decision tree graph. Each node has a condition; if the condition is met, it goes to the child branch on the left, otherwise to the right branch. The more uniform the color, the clearer the node is in relation to the samples it contains. Input features include donor's BMI, recipient's age, recipient's gender, donor's eGFR before procurement, KDPI, recipient-donor weight difference, recipient's BMI.
The nodes contain conditions, the fulfillment of which means moving to the left child branch in the decision tree. Otherwise, the right child node is selected. The intensity of the color means that the knot is class-uniform. End nodes uniquely defining y one of the end labels, i.e., 0 or 1, are completely homogeneous. Each node is a data break point. The functional composition of such divisions is the fundamental of the classifier's operation on data. For example, in a first step, the condition is checked: if KDPI is less or equal 15.50, then the model judges that no DGF will occur; otherwise, the cascade of conditions leading to the corresponding end states is checked. This model achieved an AUC of 0.91, showed in  Random forest classifier with input features: donor's BMI, donor's eGFR before procurement, EPTS, KDRI, recipient-donor weight difference, and recipient's BMI. The second model with the highest scores is shown in Figure 5.  Random forest classifier with input features: donor's BMI, donor's eGFR before procurement, EPTS, KDRI, recipient-donor weight difference, and recipient's BMI. The second model with the highest scores is shown in Figure 5.  This model achieved an accuracy of 93.75%, precision of 0.9375, and recall of 0.9375 and achieved slightly better discriminant power than the previous one (presented in Figure 6).  This classifier achieved a slightly worse discriminating power than the previous ones, the performance is summarized in Figure 8.  This classifier achieved a slightly worse discriminating power than the previous ones, the performance is summarized in Figure 8. This classifier achieved a slightly worse discriminating power than the previous ones, the performance is summarized in Figure 8. Random forest classifier with input features: donor's BMI, donor's eGFR before procurement, recipient-donor weight difference, recipient's BMI, with an accuracy of 84.38%, precision of 0.8514 and recall of 0.8438. The classifier is illustrated by the decision graph in Figure 9.  Random forest classifier with input features: donor's BMI, donor's eGFR before procurement, recipient-donor weight difference, recipient's BMI, with an accuracy of 84.38%, precision of 0.8514 and recall of 0.8438. The classifier is illustrated by the decision graph in Figure 9. Random forest classifier with input features: donor's BMI, donor's eGFR before procurement, recipient-donor weight difference, recipient's BMI, with an accuracy of 84.38%, precision of 0.8514 and recall of 0.8438. The classifier is illustrated by the decision graph in Figure 9. Figure 9. Random forest classifier with input features: donor's BMI, donor's eGFR before procurement, recipient-donor weight difference, recipient's BMI. Figure 9. Random forest classifier with input features: donor's BMI, donor's eGFR before procurement, recipient-donor weight difference, recipient's BMI.
The performance of the model is summarized in Figure 10. The performance of the model is summarized in Figure 10. Figure 10. This classifier has a lower discriminant power but better DGF prediction sensitivity than the previous model.
MLP with 6 neurons in first hidden layer and 37 neurons in the second, with input features: donor's BMI, donor's eGFR before procurement, recipient-donor weight difference, recipient's BMI, with an accuracy of 84.38%, precision of 0.8734 and recall of 0.8438. The performance of the artificial neural network is summarized in Figure 11. The matrix in Figure 12 shows the accuracy of ANN with input features: donor's BMI, donor's eGFR before procurement, recipient-donor weight difference, recipient's BMI, with an accuracy of 84.38%. The performance of the model is summarized in Figure 10. Figure 10. This classifier has a lower discriminant power but better DGF prediction sensitivity than the previous model.
MLP with 6 neurons in first hidden layer and 37 neurons in the second, with input features: donor's BMI, donor's eGFR before procurement, recipient-donor weight difference, recipient's BMI, with an accuracy of 84.38%, precision of 0.8734 and recall of 0.8438. The performance of the artificial neural network is summarized in Figure 11. The matrix in Figure 12 shows the accuracy of ANN with input features: donor's BMI, donor's eGFR before procurement, recipient-donor weight difference, recipient's BMI, with an accuracy of 84.38%. The matrix in Figure 12 shows the accuracy of ANN with input features: donor's BMI, donor's eGFR before procurement, recipient-donor weight difference, recipient's BMI, with an accuracy of 84.38%. For a randomly selected testing subset, depending on the selection of hyperparameters, the number of neurons in the first layer is on the vertical axis and the number of neurons in the second layer is on the horizontal axis. Higher accuracy is marked in green, while worse is yellow and red. The best results are for an ANN with fewer neurons in the first layer and more in the second, and vice versa. On the other hand, equal-dimensional layers have a worse performance in this representation. However, this is not the subject of this analysis but instead an observation worthy of further interest.

Discussion
Different logistical solutions in the allocation of permanent organs, variability of donors (e.g., accepting expanded criteria), and recipient conditions in a given geographical region are sufficient reasons to assess the risk of DGF in a single transplant center. To assess the risk of DGF, we employed data mining and machine learning experiments. Random forest classifiers and artificial neural networks called multi-layer perceptron showed an accuracy of 84.38% and an AUC 0.84. The model additionally implementing the donor's age, gender, and the kidney donor profile index accomplished an accuracy of 93.75% and an AUC of 0.91. The other configuration with the estimated post-transplant survival (EPTS) and the kidney donor risk profile (KDRI) achieved an accuracy of 93.75% and an For a randomly selected testing subset, depending on the selection of hyperparameters, the number of neurons in the first layer is on the vertical axis and the number of neurons in the second layer is on the horizontal axis. Higher accuracy is marked in green, while worse is yellow and red. The best results are for an ANN with fewer neurons in the first layer and more in the second, and vice versa. On the other hand, equal-dimensional layers have a worse performance in this representation. However, this is not the subject of this analysis but instead an observation worthy of further interest.

Discussion
Different logistical solutions in the allocation of permanent organs, variability of donors (e.g., accepting expanded criteria), and recipient conditions in a given geographical region are sufficient reasons to assess the risk of DGF in a single transplant center. To assess the risk of DGF, we employed data mining and machine learning experiments. Random forest classifiers and artificial neural networks called multi-layer perceptron showed an accuracy of 84.38% and an AUC 0.84. The model additionally implementing the donor's age, gender, and the kidney donor profile index accomplished an accuracy of 93.75% and an AUC of 0.91. The other configuration with the estimated post-transplant survival (EPTS) and the kidney donor risk profile (KDRI) achieved an accuracy of 93.75% and an AUC of 0.92.
The growing shortage of organs for transplantation requires procurement from expanded criteria donors (ECD). ECD might be described for any donor over the age of 60, or donors over the age of 50, meeting two out of three criteria: history of hypertension, final sCr > 1.5 mg/dL, or death due to cerebrovascular disease [5,19]. Kidney transplantation from ECD carries additional risk factors for graft failure, including DGF [5]. Risk modeling using artificial intelligence opens an area for the use of organs from ECD in the face of increased organ demand and allows use to manage the risk of failure. Non-linear identification of risk factors and their multidimensional analysis may improve the survival of grafts in the future and reduce the risk of graft failure after transplantation from a deceased donor.
Machine learning predictive models allow for the effective differentiation of end states and practical application of discriminant power in the clinic. In our work, we checked two machine learning techniques through by random forest and artificial neural network models. Similar models were described by other authors [20][21][22][23]. Neural networks aimed at assessing the occurrence of DGF achieved various performances, but clearly indicated the practicality and clinical usefulness of modern computer techniques. Donor BMI and recipient BMI are among the most important predictors of the neural network model with AUROC 0.886 [20]. Additionally, clinical trials based on classical analysis prove their significance [2,4,13]. Success in creating computer models requires two keys: the first is good quality data and the second is an optimized model suitable for the data [24]. Modern computers make it possible to check many models, with different parameters and structures in a short time. There are many machine learning techniques, and models that are theoretically simpler and less complex can perform significantly better than advanced ones. Artificial neural networks can achieve good performance with AUROC 0.732 or AUROC 0.7595 [21,23], but a simpler model based on linear SVM achieves an AUROC of 0.843 [22]. The question is whether it is worth investing in the complexity of the model. Our previous study showed that this depends on the choice of input data, both the data type and the combination of the input parameters [15]. Naturally, neural networks are more complex and can offer better performance, but simpler solutions can be just as effective. The choice of method can be left to individual experience because a modern personal computer is able to check thousands of models quickly, with neural networks being checked much slower and requiring more training time.
In our paper, we refer to the commonly accepted variables that are indicated as risk factors of DGF. The size of the original database, considering prognostic factors, was so large that it provided great opportunities for analyzing data, using machine learning, and selecting clinically significant factors. We checked many models with different settings and input features, and the best models did not implement the CIT variable; although, according to this publication, it is a crucial factor for the risk assessment of DGF. [25][26][27] None of the models using CIT as a variable achieved an accuracy of more than 81.25%. The top model that applied CIT at the input reached an AUROC of 0.78, an accuracy of 81.25%, precision of 0.8318 (0.91 for non-DGF prediction and 0.60 for DGF presence), and recall of 0.8125 (0.83 and 0.75, respectively). We explain this discrepancy by the difference in methodology between classical odds ratios and random forest algorithms. Nevertheless, other key parameters are confirmed both in our work and in others.
Our study was retrospective and limited due to the size of the population. Models were prepared in a single-center study. The advantage of this is that the methods used allow for the scaling of machine learning techniques to another population, one that might be diverse in terms of ethnicity. Neural networks are so complex that their use outside of a computer program is a futile endeavor. It is a kind of black box into which we put a patient's characteristics and draw a ready conclusion. Random forest classifiers can be illustrated graphically and, despite their complex form, can be used with paper and pencil.
Of course, this is only to illustrate the clarity of this technique compared to artificial neural networks.

Conclusions
Using machine learning, we were able to assess the risk of DGF in recipients after kidney transplant from a deceased donor. Our solution is scalable and can be improved during subsequent transplants. Based on the new data, the models can achieve better outcomes in terms of what is feasible in even single transplant center.