Exploiting Machine Learning Technologies to Study the Compound Effects of Serum Creatinine and Electrolytes on the Risk of Acute Kidney Injury in Intensive Care Units

Assessing the risk of acute kidney injury (AKI) has been a challenging issue for clinicians in intensive care units (ICUs). In recent years, a number of studies have been conducted to investigate the associations between several serum electrolytes and AKI. Nevertheless, the compound effects of serum creatinine, blood urea nitrogen (BUN), and clinically relevant serum electrolytes have yet to be comprehensively investigated. Accordingly, we initiated this study aiming to develop machine learning models that illustrate how these factors interact with each other. In particular, we focused on ICU patients without a prior history of AKI or AKI-related comorbidities. With this practice, we were able to examine the associations between the levels of serum electrolytes and renal function in a more controlled manner. Our analyses revealed that the levels of serum creatinine, chloride, and magnesium were the three major factors to be monitored for this group of patients. In summary, our results can provide valuable insights for developing early intervention and effective management strategies as well as crucial clues for future investigations of the pathophysiological mechanisms that are involved. In future studies, subgroup analyses based on different causes of AKI should be conducted to further enhance our understanding of AKI.


Introduction
Acute kidney injury (AKI) is a condition frequently encountered in medical care [1]. The underlying pathophysiological processes of AKI ultimately lead to a decline in renal function. As a result, the patients suffer from the accumulation of waste products, an imbalance of electrolytes, and a widespread inflammatory response that affects organs beyond the kidneys [2]. According to a recent study, 20% to 50% of the patients in an intensive care unit (ICU) suffered from AKI [3]. Therefore, how to assess the risk of AKI is a critical issue for clinicians in an ICU [4]. However, several early signs of AKI, including edema, hypertension, and oliguria, are non-specific. Therefore, the current practice only monitors the level of serum creatinine and the volume of urine output in order to assess the risk of AKI [5,6].
Due to the observation above, scientists have been investigating the physiological signs that may be associated with the development of AKI. Leaf et al. conducted a review of the pathophysiology of dysregulated mineral metabolism, specifically focusing on calcium, phosphate, parathyroid hormone, and vitamin D metabolites in the context of of positive cases and the number of negative cases, respectively, in our study cohort that met the criteria specified along the path from the root node to this particular node. If a path ends at a red node, the prediction is positive. On the other hand, if a path ends at a green node, the prediction is negative. Based on these interpretable decision rules, physicians can have a comprehensive understanding of how these key factors interact with each other and develop new clinical guidelines accordingly. On the other hand, due to the non-linear transformations and the large number of coefficients involved in the prediction process, it is essentially impossible for a user to interpret the mathematics equations that an SVM or DNN model follows to make a prediction.
Diagnostics 2023, 13,2551 3 of 13 from the root node that matches the condition of the case. The path ends at one of the leaf nodes at the bottom level of the tree. The "n + " and "n − " symbols at each node denote the number of positive cases and the number of negative cases, respectively, in our study cohort that met the criteria specified along the path from the root node to this particular node. If a path ends at a red node, the prediction is positive. On the other hand, if a path ends at a green node, the prediction is negative. Based on these interpretable decision rules, physicians can have a comprehensive understanding of how these key factors interact with each other and develop new clinical guidelines accordingly. On the other hand, due to the non-linear transformations and the large number of coefficients involved in the prediction process, it is essentially impossible for a user to interpret the mathematics equations that an SVM or DNN model follows to make a prediction. Figure 1. A DT structure that summarizes the main results of this study. The root node is colored yellow.

Study Cohort
Our study cohort was extracted from the Medical Information Mart for Intensive Care (MIMIC)-IV database, version 1.0, published in March 2021 [33,34]. The MIMIC database has been carefully de-identified to protect patient privacy. Its use for research purposes has been approved by the institutional review boards of the Massachusetts Institute of Technology (Protocol No. 0403000206) and Beth Israel Deaconess Medical Center (Protocol No. 2001-P-001699/14). These approvals indicate that the appropriate ethical considerations have been taken into account to ensure the responsible and lawful use of the database for research purposes. Figure 2 shows the flow that we followed to generate our study cohort. Initially, the dataset contained 256,878 clinical records collected at the emergency department and the intensive care unit between 2008 and 2019. According to the 2012 Kidney Disease: Improving Global Outcomes (KDIGO) recommendation statements [35][36][37][38], AKI is defined by any of the following criteria: (1) an increase in the level of serum creatinine by 0.3 mg/dL (26.5 µmol/L) or more within 48 h or (2) an increase in the level of serum creatinine to 1.5 times the baseline level within 7 days. As the guideline requires two readings of the serum creatinine level and our study focused on patients in ICUs, 205,482 records in the database were excluded due to a lack of required information after admission into

Study Cohort
Our study cohort was extracted from the Medical Information Mart for Intensive Care (MIMIC)-IV database, version 1.0, published in March 2021 [33,34]. The MIMIC database has been carefully de-identified to protect patient privacy. Its use for research purposes has been approved by the institutional review boards of the Massachusetts Institute of Technology (Protocol No. 0403000206) and Beth Israel Deaconess Medical Center (Protocol No. 2001-P-001699/14). These approvals indicate that the appropriate ethical considerations have been taken into account to ensure the responsible and lawful use of the database for research purposes. Figure 2 shows the flow that we followed to generate our study cohort. Initially, the dataset contained 256,878 clinical records collected at the emergency department and the intensive care unit between 2008 and 2019. According to the 2012 Kidney Disease: Improving Global Outcomes (KDIGO) recommendation statements [35][36][37][38], AKI is defined by any of the following criteria: (1) an increase in the level of serum creatinine by 0.3 mg/dL (26.5 µmol/L) or more within 48 h or (2) an increase in the level of serum creatinine to 1.5 times the baseline level within 7 days. As the guideline requires two readings of the serum creatinine level and our study focused on patients in ICUs, 205,482 records in the database were excluded due to a lack of required information after admission into ICUs. As a result, only 51,396 records, all of which corresponded to the first available data after ICU admission, were included for subsequent analyses.   Table 1 lists the ICD-9 and ICD-10 codes employed to exclude the cases with AKI-related comorbidities/diseases. Criterion (1): (i) For a patient who had suffered from AKI, we included only the record corresponding to his/her stay in the ICU during which the patient suffered from AKI the first time. (ii) For a patient who had never suffered from AKI, we included only the record corresponding to his/her first stay in the ICU. Criterion (2): (i) the record of the case did not include all the readings listed in Table 2; (ii) one or more readings in the record were in the highest 0.1% or the lowest 0.1% of the distributions; or (iii) one or more readings in the record were not measured within 168 h of admission.   Table 1 lists the ICD-9 and ICD-10 codes employed to exclude the cases with AKI-related comorbidities/diseases. Criterion (1): (i) For a patient who had suffered from AKI, we included only the record corresponding to his/her stay in the ICU during which the patient suffered from AKI the first time. (ii) For a patient who had never suffered from AKI, we included only the record corresponding to his/her first stay in the ICU. Criterion (2): (i) the record of the case did not include all the readings listed in Table 2; (ii) one or more readings in the record were in the highest 0.1% or the lowest 0.1% of the distributions; or (iii) one or more readings in the record were not measured within 168 h of admission.  Since one patient could be admitted into the ICU more than one time, for a patient who had suffered from AKI, we included only the record corresponding to his/her stay in the ICU during which the patient suffered from AKI the first time. On the other hand, for a patient who had never suffered from AKI, we included only the record corresponding to his/her first stay in the ICU. As a result, only 41,878 records corresponding to 41,878 individual cases remained. In the next step, we employed the criteria provided in Table 1 to exclude those patients whose medical records showed AKI-related comorbidities [39] so that the interferences from other factors such as renal impairment, cardiac failure, diabetes, and electrolyte imbalances would be avoided. After this step, only 17,085 cases remained in the dataset. Finally, we employed the following excluding criteria to further screen the dataset: (1) the record of the case did not include all the readings listed in Table 2; (2) one or more readings in the record were in the highest 0.1% or the lowest 0.1% of the distributions; and (3) one or more readings for the case were not made within 168 h of admission. In the end, our study cohort contained 550 AKI-positive cases and 12,152 AKI-negative cases. A demographic analysis of the study cohort is presented in Table 2.
Etiologically, the causes of AKI can be classified into three broad categories: prerenal azotemia, intrinsic renal parenchymal damage, and post-renal obstruction. Tailoring treatment plans according to the specific causes of renal injury are crucial for improving patient outcomes. For instance, hypovolemia, often diagnosed by assessing a fluid status imbalance, insufficient renal perfusion, or inferior vena cava collapse, is a common clinical presentation associated with pre-renal azotemia. On the other hand, post-renal injury occurs when the urinary tract is partially or completely blocked due to functional or structural derangements anywhere from the renal pelvis to the tip of the urethra. Since the treatment plans for post-renal AKI patients are significantly different from the plans for non-post-renal AKI patients, we classified the AKI patients in our study cohort into two categories: post-renal AKI and non-post-renal AKI. According to several previous studies, the incidences of post-renal AKI accounted for less than 5% of all AKI cases [1,40,41]. In our study cohort, 24 out of 550 AKI cases, i.e., 4.4%, were post-renal, and the percentage was in line with the previous studies. Supplementary Table S1 shows the ICD-9 and ICD-10 codes employed to identify post-renal AKI cases. Table 3 shows the statistics of the post-renal AKI patents and non-post-renal AKI patients with respect to the features listed in Table 2.

Machine Learning Models
As mentioned earlier, we used DT and RF models in order to investigate the compound impacts of two or more factors and provide a clear picture of how these factors interact with each other. In particular, we focused on the compound effects of serum creatinine, BUN, and the 6 serum electrolytes listed in Table 2. The serum creatinine and BUN were included because in medical practice the levels of serum creatinine and BUN as well as the BUN-to-creatinine ratio are measured to clarify different types of renal function impairment, including pre-renal azotemia, intrinsic renal parenchymal disease, and postrenal obstruction. The 6 serum electrolytes listed in Table 2 were included because previous studies had reported their associations with the development of AKI.
In order to address the needs in different clinical scenarios, we generated prediction models with varying levels of sensitivity and examined the prediction rules embedded in these models. In this respect, we set the parameters of the machine learning packages to various combinations and then employed a 5-fold cross-validation [22] to evaluate the levels of sensitivity delivered by the prediction models generated with these alternative parameter settings. In the 5-fold cross-validation process, the study cohort was randomly and evenly partitioned into 5 subsets. For each combination of parameter settings, every subset was employed to evaluate the prediction models generated with the other 4 subsets. Then, the evaluation results of these 5 subsets were collected to calculate the performance data, i.e., the sensitivity, specificity, positive predictive value (PPV), etc., corresponding to this particular parameter combination. Supplementary Table S2 shows the software packages employed to generate the DT and RF models as well as the alternative parameter settings employed to generate the prediction models in the 5-fold cross-validation process. In this respect, we tried a large number of possible parameter combinations in order to generate prediction models that delivered sensitivity at the levels of 0.95 and 0.80. Furthermore, as we had only 550 positive cases in our study cohort, we employed the 5-fold cross-validation process instead of the 10-fold cross-validation process, which may be more commonly used in machine learning research, so that each partition would contain a good number of positive cases.

Results
As mentioned above, in order to address the needs in different clinical scenarios, we generated prediction models with varying levels of sensitivity. In the subsequent discussions, we will focus on the prediction models with sensitivity at the levels of 0.95 and 0.80. Table 4 summarizes the performances of the DT, RF, and LR models observed during the 5-fold cross-validation procedure. The performances of the LR models were included to provide a reference because LR models are widely employed in biomedical research communities. Detailed performance data are presented in Supplementary Table S3. The performance data in Table 4 reveal that with respect to the specificity, the positive predictive value (PPV), the relative risk, and the area under the receiver operating characteristic curve (AUC), the DT model that delivered sensitivity at the level of 0.95 performed significantly superior to the RF model that delivered the same level of sensitivity. It was also observed that the RF model that delivered sensitivity at the level of 0.80 performed marginally superior to the rival DT model in terms of specificity, PPV, and relative risk but performed inferior to the rival DT model in terms of AUC. Based on these observations, we concluded that the overall performance of the DT models was superior to that of the RF models. Therefore, in the subsequent discussions, we will focus on the DT models and the decision rules embedded in the models. Figure 3a,b show the DT models generated by feeding the entire study cohort into the decision tree package with the combinations of parameters cp and prior set to (0.005 and 0.5835) and (0.01 and 0.744), respectively. According to the 5-fold cross-validation addressed above, with cp and prior set to these two combinations, the generated DT models should deliver sensitivity at the levels of 0.80 and 0.95, respectively. One interesting observation regarding the DT model shown in Figure 3a is that the model predicts a patient with a serum creatinine level higher than 1.25 mg/dL to be at high risk. This prediction rule comes very close to the serum creatinine level of 1.3 mg/dL commonly used by physicians to determine whether a patient is at high risk of progression to AKI. It is also observed that the DT model shown in Figure 3b predicts a patient with a serum creatinine level higher than 0.95 mg/dL to be at high risk. This observation implies that 0.95 mg/dL can be employed as an alternative threshold if the physician wants to increase the sensitivity of his/her medical judgment.
The DT model shown in Figure 3a further reveals that for a patient with a serum creatinine level between 0.95 and 1.25 mg/dL, his/her level of serum magnesium can be used as a warning sign. If the reading is higher than 2.45 mg/dL, the patient is at high risk. If not, we should further examine his/her level of serum chloride. If the patient's level of serum chloride is over 106.5 mEq/L, the patient is at high risk.
The blue polygons in Figure 3a,b encircle the structure shared by these two DT models. According to the shared structure, for a patient with a serum creatinine level between 0.75 and 0.95 mg/dL, we should further examine his/her levels of serum magnesium and chloride. A patient is at high risk if (1) his/her level of serum chloride is higher than 113.5 mEq/L or (2) his/her level of serum chloride is between 105.5 and 113.5 mEq/L and his/her level of serum magnesium is higher than 2.35 mg/dL. Finally, since only a very limited number of positive cases in our study cohort met the criteria defined by the lower right parts of the tree structures in Figure 3a,b, we should be able to ignore the corresponding decision rules. In summary, the structures of the two DT models shown in Figure 3 illustrate that the levels of serum creatinine, chloride, and magnesium are the three major factors associated with the development of AKI. Though the level of serum phosphorus is present in these DT models, the nodes corresponding to the level of serum phosphorus are located in the lower parts of the structures, which implies that these nodes play less significant roles in the decision rules. by the lower right parts of the tree structures in Figure 3a,b, we should be able to ignore the corresponding decision rules. In summary, the structures of the two DT models shown in Figure 3 illustrate that the levels of serum creatinine, chloride, and magnesium are the three major factors associated with the development of AKI. Though the level of serum phosphorus is present in these DT models, the nodes corresponding to the level of serum phosphorus are located in the lower parts of the structures, which implies that these nodes play less significant roles in the decision rules.

Discussion
As of today, the clinical practice to assess the risk of AKI is based on the 2012 KDIGO Clinical Practice Guideline for Acute Kidney Injury, which monitors only the level of serum creatinine and the volume of urine output. Since AKI could lead to many complications and even fatality, identifying the risk factors of AKI and exploiting machine learning technologies to predict AKI incidences have attracted a lot of attention in biomedical research communities. In this respect, several serum electrolytes have been reported to be associated with the development of AKI. Nevertheless, the compound effects of serum creatinine, BUN, and clinically relevant serum electrolytes have yet to be thoroughly investigated. With this observation, we initiated this study aiming not only to illustrate how these factors interact with each other but also to provide new insights for developing new clinical practices for assessing AKI risk. In particular, we focused on ICU patients who had no prior history of AKI and were free of AKI-related comorbidities. By focusing on this specific group of patients, we were able to eliminate the confounding influences of these conditions and examine the associations between the levels of serum electrolytes and renal function in a more controlled manner. Furthermore, our results can provide valuable insights for developing early intervention and effective management strategies as well as for investigating the pathophysiology of AKI.
The performance data in Table 4 show that for those patients without a prior history of AKI or AKI-related comorbidities, the relative risks with these alternative prediction models were fairly high, ranging from 9.84 to 16.89. This implies that the group of patients predicted to be positive suffered significantly higher risk than the groups of patients predicted to be negative. However, the low PPVs suggest that there would be a large number of false positives if these prediction models were put into practical use. Nevertheless, according to the numbers shown in Figure 3a, this particular DT model, if put into practical use, should predict around 57% of the patients to be negative and deliver a sensitivity around 80%. Meanwhile, according to the numbers shown in Figure 3b, this particular DT model, if put into practical use, should predict around 51% of the patients to be negative and deliver a sensitivity around 95%. Therefore, a physician who employs the DT models developed in this study to assess the risks of AKI for his/her patients only needs to focus on about 50% of the patients, while the physician can expect this group of patients to suffer about 10 times the risk of the group of patients predicted to be at low risk.
Among the 10 variables listed in Table 2, only serum creatinine, chloride, magnesium, and phosphorus are present in the DT models shown in Figure 3a,b. It must be noted that this observation does not imply that serum potassium, sodium, and non-ionized calcium are not associated with the development of AKI. In fact, as mentioned earlier, previous studies have reported that serum potassium, sodium, and non-ionized calcium are all associated with the development of AKI. What happened must be that when building the prediction model, the DT algorithm figured out that the levels of serum chloride, magnesium, and phosphorus provided more information than the levels of serum potassium, sodium, and non-ionized calcium. The DT algorithm further figured out that the additional information provided by the levels of serum potassium, sodium, and non-ionized calcium after the levels of serum chloride, magnesium, and phosphorus had been incorporated was insignificant.
The DT models shown in Figure 3a,b identify the levels of serum creatinine, chloride, and magnesium as the three major factors associated with the development of AKI. Though the level of serum phosphorus is present in these two figures, all three nodes corresponding to the level of serum phosphorus are located in the lower levels of the structures. Furthermore, only a very limited number of positive cases in our study cohort met the criteria defined by these low-level structures. Therefore, in practice, we can ignore the role of serum phosphorus.
Since the level of serum creatinine is one of the major factors monitored in the current clinical practice, our study suggests that for those patients without a prior history of AKI or AKI-related comorbidities, the levels of serum chloride and magnesium should be taken into consideration in order to enhance the clinical guidelines. In this respect, the current clinical guideline, which monitors only the level of serum creatinine and the volume of urine output, may lead to misdiagnoses and/or delayed treatments in some cases because the level of serum creatinine generally reflects the degree of renal damage and should be considered as a delayed indicator of AKI. Furthermore, decreased urine output is a non-specific symptom and may only be evident once the AKI has progressed. Therefore, by incorporating the assessments of the serum chloride and magnesium levels into the enhanced clinical guideline, healthcare professionals can obtain a more comprehensive understanding of a patient's renal function and the risk of AKI. Furthermore, the numbers shown in Table 2 reveal that the distributions of the levels of serum creatinine for patients with AKI and patients without AKI must overlap to a large degree because the standard deviation of the level of serum creatinine for patients with AKI, which is 0.64, is larger than the difference between the means of these two groups of patients, which is 0.5. This implies that additional assessments must be incorporated if we would like to evaluate the risk of AKI of a patient more accurately. Finally, with respect to the decrease in urine output among AKI patients, it is a non-specific symptom and may only be evident once the AKI has progressed. Together, these observations imply that for an ICU patient without a prior history of AKI or AKI-related comorbidities, healthcare professionals can obtain a more comprehensive understanding of the patient's renal function and risk of AKI by incorporating assessments of serum chloride and magnesium levels into the enhanced clinical guideline. Accordingly, healthcare professionals will be able to evaluate and manage treatments more precisely and ultimately prevent disease progression and deterioration.
It must be noted that our results can only be immediately applied to ICU patients without a prior history of AKI or AKI-related comorbidities. For ICU patients with AKIrelated comorbidities, further studies are needed. In this respect, we can partition the patients into several groups depending on the types of comorbidities that they suffer from so that patients in the same group have similar pathophysiological mechanisms. Then, we can apply the procedure presented in this article to each group of patients in order to develop a specific prediction model for each group and identify the critical factors accordingly.
One of the major limitations of our study is due to the different causes of AKI. As the causes of AKI are essential for physicians to develop effective treatment plans, in-depth subgroup analyses based on different categories of renal injury should be conducted to gain valuable insights into the different pathophysiological mechanisms involved and guide appropriate treatment strategies tailored to each subgroup. In this study, based on the information available in the MIMIC-IV dataset, we classified the AKI patients into two categories: post-renal and non-post-renal. The statistics in Table 3 reveal that there were no statistical differences between the levels of the eight serum ingredients for the post-renal and non-post-renal AKI patients. Therefore, our prediction models should be generally applicable to both post-renal and non-post-renal AKI patients. Nevertheless, in-depth subgroup analyses should be conducted in the future.
In addition to the limitation addressed above, this is a retrospective study based on data extracted from the MIMIC-IV database. Therefore, the results derived from this study should not be extensively applied in the decision process without taking into consideration the ethnic composition of the patients and the medical interventions that these patients may have received. Furthermore, our study was based on clinical records collected in ICUs. This implies that the patients involved had serious health conditions. The data in Table 2 also show that these patients were relatively old. Therefore, the results observed in our analyses should not be generalized to patients with different health conditions and in different age groups. Finally, our results only illustrate the associations between the investigated risk factors and the incidences of AKI. In other words, causal inferences have yet to be studied.

Conclusions
This study has led to an in-depth understanding of the compound effects of serum creatinine, chloride, and magnesium with respect to the development of AKI in ICUs. As we focused on patients who had no prior history of AKI and were free of AKI-related comorbidities, our study provides valuable insights for developing early intervention and effective management strategies. Furthermore, this understanding provides crucial clues not only for future enhancement of clinical practices but also for future investigation of the pathophysiological mechanisms that are involved.
Supplementary Materials: The following supporting information can be downloaded at https://www. mdpi.com/article/10.3390/diagnostics13152551/s1, Table S1: The ICD-9 and ICD-10 codes employed to identify post-renal AKI cases. Table S2: Summary of the software packages employed and parameter ranges. Table S3: The detailed performance data observed in the 5-fold cross-validation process.