Artificial Intelligence for Risk Prediction of Rehospitalization with Acute Kidney Injury in Sepsis Survivors

Sepsis survivors have a higher risk of long-term complications. Acute kidney injury (AKI) may still be common among sepsis survivors after discharge from sepsis. Therefore, our study utilized an artificial-intelligence-based machine learning approach to predict future risks of rehospitalization with AKI between 1 January 2008 and 31 December 2018. We included a total of 23,761 patients aged ≥ 20 years who were admitted due to sepsis and survived to discharge. We adopted a machine learning method by using models based on logistic regression, random forest, extra tree classifier, gradient boosting decision tree (GBDT), extreme gradient boosting, and light gradient boosting machine (LGBM). The LGBM model exhibited the highest area under the receiver operating characteristic curves (AUCs) of 0.816 to predict rehospitalization with AKI in sepsis survivors and followed by the GBDT model with AUCs of 0.813. The top five most important features in the LGBM model were C-reactive protein, white blood cell counts, use of inotropes, blood urea nitrogen and use of diuretics. We established machine learning models for the prediction of the risk of rehospitalization with AKI in sepsis survivors, and the machine learning model may set the stage for the broader use of clinical features in healthcare.


Introduction
Sepsis is estimated to affect 19.4 million patients, with an annual sepsis-related mortality of approximately 5.3 million cases [1]. Therefore, sepsis is a major public health concern due to the life-threatening organ dysfunction and the dysregulated host response to infection, and sepsis is a common cause of death in hospitalized patients [2,3]. As there has been significant medical progress in decreasing mortality and morbidity after sepsis, attention to the complications after discharge in sepsis survivors has become more important [4][5][6][7].
Acute kidney injury (AKI) frequently occurs with sepsis due to pathologic interactions of multiple organ dysfunction, systemic hypotension, inflammatory cytokine storms and nephrotoxic drugs, which all indirectly and directly contributing to renal injury [8][9][10]. Previous studies have found that 40% to 50% of patients with AKI had sepsis [8,11], and approximately 11% to 42% of patients with sepsis developed AKI [12][13][14]. Unplanned rehospitalization is associated with worsening patient outcomes and increased treatment costs [15,16]. Although AKI is a common complication in sepsis, the risks of rehospitalization with AKI in sepsis survivors remains unknown. Therefore, the development of a prediction model for rehospitalization with AKI has become an important therapeutic goal in the management of sepsis survivors.
To appropriately manage rehospitalization with AKI in sepsis survivors, a precise prediction model for identifying high-risk patients is required to optimize the treatment strategy. This predictive model is important not only to allow a more comprehensive prognostication of patients' well-being but also to reduce the healthcare financial burdens. Machine learning models have already been applied in many fields, such as outcome prediction [17][18][19], and these models may potentially be used to identify high-risk patients. Machine learning models have been mostly described to predict episodes of the occurrence of AKI during sepsis [20][21][22]. However, there is no study to evaluate their effects on rehospitalization with AKI after patients who survived to discharge from sepsis. To resolve this important issue, we conducted a large-scale cohort study of sepsis survivors, and the predictive ability of the machine learning model was compared to select the optimal machine learning model.

Study Design and Data Source
We established a database including the detailed information of sepsis survivors extracted from the Big Data Center of Taipei Veterans General Hospitals between 1 January 2008 and 31 December 2018, which included the comprehensive medical records from the inpatient, outpatient, and emergent departmental records [23]. The detailed patient demographic, clinical, diagnostic/procedural information, drug prescriptions, procedural codes, and laboratory data were included in our analysis. To identify the sepsis survivors, we included all patients with discharge codes based on the International Classification of Diseases, Ninth and Tenth Edition, Clinical Modification (ICD-9-CM and ICD-10-CM) codes for sepsis (ICD code 038, 995.91, A40 and A41), severe sepsis (ICD code 995.92 and R65.20) or septic shock (ICD code 785.52 and R65.21) during the study period who were discharged alive [24]. We excluded patients who had pre-existing end-stage kidney disease maintained with dialysis or kidney transplant before discharge, were younger than 20 years old, or who died during admission.

Class Definition
The class was labeled as 1 if there was rehospitalization with AKI during the follow-up periods; otherwise, the class was labeled as 0 if there was no rehospitalization with AKI. The diagnosis of AKI was defined as a 0.3 mg/dL within 48 h or 50% increase within 7 days from the baseline creatinine based on the Kidney Disease Improving Global Outcomes classification (KDIGO) definition [25]. We included the first-time rehospitalization with AKI because multiple admissions may introduce a bias favoring survivors.

Machine Learning Algorithm and Statistical Analysis
Continuous data are presented as the median (interquartile ranges (IQRs)) and categorical data are presented as numbers (proportions). Before the machine learning processes, the missing values of the clinical variables were imputed using the k-nearest neighbors (KNN) algorithm [26,27]. The whole dataset was then randomly split into a training dataset and a validation dataset at a ratio of 70:30%, respectively. In our study, we used several machine learning methods, including logistic regression, a random forest [28], an extra tree classifier [29], an extreme gradient boosting (XGBoost) [30], a light gradient boosting machine (LGBM) [31], and a gradient boosting decision tree (GBDT) [32], to predict risks of rehospitalization with AKI. The prediction abilities of various machine learning models were examined based on the area under the curve of receiver operating characteristics (AUCs) and precision-recall curves of each model. As the methods of prediction in machine learning models are often unclear, we used SHapley Additive exPlanation (SHAP) values to provide accurate attribution values for each clinical feature in our prediction model [33][34][35]. The data were analyzed by using Python (Python Software Foundation version 3.7.6, available at http://www.python.org, accessed on 1 November 2021). All tests were two-tailed, and a p value < 0.05 was statistically significant.

Study Population
In the 10-year study period, 23,761 sepsis survivors were included in our final cohort, and the detailed patient demographic data are presented in Table 1. Sepsis survivors were predominantly female, and 55.7% of the patients had hypertension, 32.8% had diabetes mellites, and 44.3% used CCBs. The patients had a baseline creatinine level of 1.1 mg/dL. We further divided the sepsis survivors randomly into the two groups and allocated 70% of them to the training set and the remaining 30% to the test set. Among these patients, 8756 (36.9%) sepsis survivors had episodes of rehospitalization with AKI in sepsis survivors with the median intervals from discharge to rehospitalization of 8.6 months. In addition, there were 6076 (36.5%) and 2680 (37.6%) episodes of rehospitalization with AKI in the training and testing datasets, respectively. Laboratory data Blood urea nitrogen, mg/dL 24

Ranks of Feature Importance and SHAP Value in the Machine Learning Models
To identify important features in the LGBM model, we performed a feature importance plot by using SHAP values and listed the features in descending order. The top five important features were C-reactive protein, white blood cell counts, use of inotropes, blood urea nitrogen, and use of diuretics, which contribute to higher predictive powers than the bottom features (Figure 2A). The local bar plot of a sepsis survivor showed how the SHAP values of features affected the model prediction ( Figure 2B). Red SHAP values increased the prediction, and blue values decreased it. The SHAP heatmap plot were

Ranks of Feature Importance and SHAP Value in the Machine Learning Models
To identify important features in the LGBM model, we performed a feature importance plot by using SHAP values and listed the features in descending order. The top five important features were C-reactive protein, white blood cell counts, use of inotropes, blood urea nitrogen, and use of diuretics, which contribute to higher predictive powers than the bottom features (Figure 2A). The local bar plot of a sepsis survivor showed how the SHAP values of features affected the model prediction ( Figure 2B). Red SHAP values increased the prediction, and blue values decreased it. The SHAP heatmap plot were shown in Figure 3A, and features with higher SHAP values were highlighted in redder boxes. The dependent plots revealed the interaction effects between C-reactive protein (which is the top-most important feature), white blood cell counts, and blood urea nitrogen in our LGBM model ( Figure 3B-D).   LGBM, light gradient boosting machine; CRP, C-reactive protein; WBC, white blood cell counts; BUN, blood urea nitrogen; CCB, calcium channel blocker; INR, international normalized ratio; Hgb, hemoglobin; Na, sodium; NSAID, non-steroidal anti-inflammatory drug; aPTT, activated partial thromboplastin time; AST, aspartate aminotransferase; INR, international normalized ratio.

Discussion
In our cohort study, 23,761 sepsis survivors suffered from rehospitalization with AKI after discharge. We developed machine learning algorithms using 84 clinical features to predict rehospitalization with AKI and compared the AUCs of the different machine learning models. We found that the LGBM model had the highest AUC of 0.816 compared to the other machine learning models. Our study suggests that AKI might still be an unrecognized outcome after discharge from sepsis, and the use of machine learning models may help to predict rehospitalization with AKI.
AKI is frequently observed in patients with sepsis, and a study including 2871 patients from the critical care database developed risk-prediction nomogram for AKI with C-index of 0.75 [20]. Another study including 15,726 patients with sepsis from the same critical care database established a prediction model by using logistic regression with a C-index of 0.71 [21]. The prediction models established by these studies were limited by only using the logistic regression method rather than other machine learning models to improve the predictive ability. Moreover, a study including 5984 septic patients with AKI established five prediction models, including logistic regression, random forest, support vector machine, artificial neural network, and extreme gradient boosting to predict persistent AKI [22]. The artificial neural network and logistic regression models achieved the highest AUC of 0.76. However, none of the studies carried out so far have considered whether sepsis survivors are still at greater risks of AKI after discharge.
Sepsis survivors were found to be the highest risks for short-and long-term outcomes after discharge from sepsis [7,[36][37][38]. However, the rates of rehospitalization with AKI have never been explored, and an understanding of such complication is required for physicians to initiate early treatment and follow-up strategies. In our study, the incidence of rehospitalization with AKI is still high, with approximately 36.9% of sepsis survivors after discharge. Our study is the first in the literature to use a machine learning approach to predict risks of rehospitalization with AKI, and the optimal AUC was achieved to 0.816 in the LGBM model. The performance of LGBM was higher than that of traditional logistic regression model (AUC: 0.683) for the prediction of rehospitalization with AKI.
In addition, the feature importance plot using SHAP value in our LGBM found some important predictors for risks of rehospitalization with AKI, some of which were consistent with the traditional factors. The important predictors for AKI, such as C-reactive protein, white blood cell counts, and the use of inotropes may be associated with the infectious status before discharge. In addition, blood urea nitrogen levels and the use of diuretics may reflex the fluid status and were traditionally associated with future risks of rehospitalization with AKI. Therefore, our machine learning model may help identify high-risk sepsis survivors who are prone to rehospitalization with AKI after considering clinical features related to their infection conditions or fluid status.
Our study has several strengths. First, compared to previous studies, our study is the first to predict the risks of rehospitalization with AKI after discharge from sepsis for a large number of sepsis survivors. Second, our study evaluated the laboratory data, and we included sepsis survivors who had more than two serum creatinine measurements. Therefore, we had the ability to discriminate the sepsis survivors' outcomes, including the rehospitalization with AKI, which may reduce the possible underreporting or misclassifications of AKI by using the International Statistical Classification of Diseases and Related Health Problems (ICD) coding, compared to other studies that extracted data from administrative datasets. Finally, we established predictive models of machine learning algorithms that might be important to apply in clinical practice.
Our study may have several limitations that should be noted. First, because of the nature of observational studies, the causal inference of rehospitalization with AKI might be confounded by unmeasured factors. Second, as our analysis was based on a single tertiary medical center's data, some age and disease group particularities regarding old age (median, 76.4 years) and higher cancer incidence (48.8%), which are factors that may induce some bias to the renal function or rehospitalization with AKI in the analyzed subjects. Third, the machine learning algorithm learned from the input clinical features, and some hidden relationships may be unknown if the features were not included by the physicians.

Conclusions
Our study established a machine learning algorithm for the detection and prediction of rehospitalization with AKI. Therefore, our findings support the implementation of a useful machine learning algorithm for risks of rehospitalization with AKI. Due to the age distribution, disease particularities, and single-center-based character of our study, external validation is required to evaluate the generalizability.   (FPC-109-002). The funders did not play any role in the study design, data collection or analysis, decision to publish, or preparation of the manuscript.

Institutional Review Board Statement:
This study was carried out in accordance with the Declaration of Helsinki, with appropriate approvals obtained from the institutional review board of Taipei Veterans General Hospital (2017-09-002BC).

Informed Consent Statement:
This study was approved waiving the informed consent requirement because of de-identified data.

Data Availability Statement:
The data analyzed in this study are not publicly available because individual privacy may be compromised. Interested groups could contact Shuo-Ming Ou at okokyytt@gmail.com to request permission to access these datasets.