Creating a Machine Learning Tool to Predict Acute Kidney Injury in African American Hospitalized Patients

Machine learning (ML) has been used to build high-performance prediction models in the past without considering race. African Americans (AA) are vulnerable to acute kidney injury (AKI) at a higher eGFR level than Caucasians. AKI increases mortality, length of hospital stays, and incidence of chronic kidney disease (CKD) and end-stage renal disease (ESRD). We aimed to establish an ML-based prediction model for the early identification of AKI in hospitalized AA patients by utilizing patient-specific factors in an ML algorithm to create a predictor tool. This is a single-center, retrospective chart review. We included participants 18 years or older and admitted to an urban academic medical center. Two hundred participants were included in the study. Our ML training set provided a result of 77% accuracy for the prediction of AKI given the attributes collected. For the test set, AKI was accurately predicted in 71% of participants. The clinical significance of this model can lead to great advancements in the care of AA patients and provide practitioners avenues to optimize their therapy of choice in AAs when given AKI risk ahead of time.


Introduction
Acute kidney injury (AKI) is one of the most common medical problems found among hospitalized patients [1]. Wang et al. reported that African Americans were significantly more likely to develop AKI during hospitalization compared to their Caucasian (CS) counterparts and that AKI was independently associated with increased in-hospital mortality leading to a more than four-fold increased likelihood of death when a patient had an AKI while hospitalized [1].
Increased incidence of comorbidities as well as genetic differences contributes to AAs' increased risk for developing AKI during hospitalization [2]. Diabetes and hypertension are the most common precipitating factors for developing chronic kidney disease. Incidence of diabetes in AAs compared to CSs is 18% vs. 9.6%, respectively [2]. Incidence of hypertension in AA compared to CS is 43.3% vs. 29.1%, respectively [2]. Along with the increased incidence of comorbidities in the AA population, genetic polymorphisms are also responsible for some variations in kidney function. It was discovered 10 years ago by Giulio Genovese that persons of African ancestry carry an apolipoprotein L1 (APOL1) genetic variation, which results in increased occurrence of AKI by as much as 15% in AAs carrying both alleles compared to CSs [3,4]. These factors compound in the AA community and contribute to the development of AKI during hospitalization and increased likelihood of death.

Materials and Methods
This is a single-center, retrospective chart review of patients admitted to an urban academic medical center between 1 January 2021 to 1 November 2021. The EMR database was accessed to obtain participants' information. Howard University Office of Regulatory Research Committee Institutional Review Board approved this protocol (FWA00000891).

AKI Definition
AKI was staged according to the 2012 Kidney Disease: Improving Global Outcomes (KDIGO) serum creatinine (SCr) criteria. Urine output was not collected hourly for each participant and was therefore not used for AKI staging. Stage 1 AKI is defined as rise in SCr 1.5-1.9 times baseline or ≥0.3 mg/dL increase over 24 h. Stage 2 AKI is defined as 2.0-2.9 times baseline SCr. Stage 3 AKI is defined as a rise in SCr 3.0 times baseline or ≥4.0 mg/dL increase over 24 h or initiation of renal replacement therapy [7]. Comparison between groups was which participant-specific factors increased the participants' probability of developing AKI during hospitalization.

Participant Selection
Inclusion criteria included being of 18 years of age or older, being African American, and having a hospital length of stay greater than 48 h. Exclusion criteria included pregnancy, end-stage renal disease receiving any form of renal replacement therapy, laboratoryconfirmed coronavirus disease of 2019 , hemodynamic support of any kind, and a hospital stay of less than 48 h.
Data collected were categorized as categorical or binary as present or not present. Data were subsequently classified using regression, clustering, association rules mining, and visualization using Waikato Environment for Knowledge Analysis (WEKA), a data mining software platform to create a calculator. Participants included in the analysis were then divided into training and test sets. WEKA utilizes a training set of participants against a test set to determine external validity. Test and training sets do not include any duplicate participants.

Statistical Analysis and Machine Learning Creation
WEKA is an ML software developed by the Waikato machine learning group in the Department of Computer Science at the University of Waikato Hamilton, New Zealand, under the GNU General Public License. This internationally available workbench has established itself as a widely accepted data mining and machine learning tool in research and academia [8].
Naïve Bayes is a logistic multilayer regression tool within WEKA. Naïve Bayes uses a 10-fold cross validation to estimate the skill of the model to forecast and make predictions on unseen data based on historical results. WEKA uses the naïve Bayes theorem: P(A|B) = [P(B|A)P(A)]/P(B), which determines the probability of A happening (in this case AKI), given that B has occurred (any one of our given attributes). In this scenario, B is the evidence or the participant characteristic, and A is the hypothesis, occurrence of AKI [9].
Selection of prediction variables and model development was performed on the training set cohort only, and performance and stability were internally validated using the training set. Model performance was subsequently evaluated using the test set cohort against the rules of the training set cohort from the naïve Bayes classifier.

Baseline Characteristics
A total of 411 EMRs were randomized to inclusion, as seen in Figure 1. A total of 200 participants were included in this analysis. A total of 80 participants were excluded from the analysis due to duplicate visits within the established protocol time frame. Thirtytwo participants were excluded due to a history of end-stage renal disease (ESRD), and twenty-four participants were excluded due to laboratory-confirmed COVID-19. Sixtythree participants were excluded from the analysis due to missing labs. Missing labs included gaps in bloodwork during admission of 24 h or greater and omission of COVID-19 testing during admission.
Data collected from the 200 participants included in this analysis included: age, race, sex, comorbid conditions (HTN, HLD, and T2DM), SCr, GFR, height, weight, presence of infection, tobacco use, illicit drug use, nephrotoxic agent used during hospital stay (vancomycin, NSAID, amphotericin b, ACE-I/ARB, and AMG) and length of hospital stay. AKI occurrence per attribute is listed in Table 1, as well as occurrence per attribute in the total cohort. Of the 200 participants included in the analysis, a total of sixty-three or 31.5% of participants experienced AKI during hospitalization. Of participants that experienced AKI, 39.7% had a laboratory-confirmed infection, and 37.7% and 39.3% had HTN and HLD, respectively. Of the nephrotoxic agents used, the occurrence of AKI was greatest in participants receiving vancomycin, acyclovir, and AMG, 45.3%, 75.0%, and 50.0%, respectively. Data collected from the 200 participants included in this analysis included: age, race, sex, comorbid conditions (HTN, HLD, and T2DM), SCr, GFR, height, weight, presence of infection, tobacco use, illicit drug use, nephrotoxic agent used during hospital stay (vancomycin, NSAID, amphotericin b, ACE-I/ARB, and AMG) and length of hospital stay. AKI occurrence per attribute is listed in Table 1, as well as occurrence per attribute in the total cohort. Of the 200 participants included in the analysis, a total of sixty-three or 31.5% of participants experienced AKI during hospitalization. Of participants that experienced AKI, 39.7% had a laboratory-confirmed infection, and 37.7% and 39.3% had HTN and HLD, respectively. Of the nephrotoxic agents used, the occurrence of AKI was greatest in participants receiving vancomycin, acyclovir, and AMG, 45.3%, 75.0%, and 50.0%, respectively.   WEKA visualization output per characteristic is illustrated in Figure 2. To the left, 0 indicates no presence of that attribute in the population, and to the right, 1 indicates yes to the presence of that attribute in the population. This follows for sex, medication, comorbidity, AKI, illicit drug use, and cigarette use. Body mass index (BMI) and beginning kidney function were stratified, and age was continuous. WEKA visualization output per characteristic is illustrated in Figure 2. To the left, 0 indicates no presence of that attribute in the population, and to the right, 1 indicates yes to the presence of that attribute in the population. This follows for sex, medication, comorbidity, AKI, illicit drug use, and cigarette use. Body mass index (BMI) and beginning kidney function were stratified, and age was continuous. AKI occurrence per characteristic is illustrated in Figure 3. Blue indicates AKI occurred within that attribute and RED indicated no AKI occurred within that attribute. AKI occurrence per characteristic is illustrated in Figure 3. Blue indicates AKI occurred within that attribute and RED indicated no AKI occurred within that attribute.
In the training set, cross validation of data was performed using naïve Bayes on randomly selected training data (n = 100). This training model performed well, with a 77% accuracy in correctly classified instances, as seen in Figure 4. This means that this test set will accurately predict AKI 77% of the time when given these attributes. The mean absolute error percentage was 24.6%, which indicates a good fit of this model to predict AKI versus the actual occurrence of AKI. The AUC ROC curve of 0.860 also indicates that this training set yields excellent performance with a classifier model, lending credibility to this model to predict AKI correctly in the test set.  In the training set, cross validation of data was performed using naïve Bayes on randomly selected training data (n = 100). This training model performed well, with a 77% accuracy in correctly classified instances, as seen in Figure 4. This means that this test set will accurately predict AKI 77% of the time when given these attributes. The mean absolute error percentage was 24.6%, which indicates a good fit of this model to predict AKI versus the actual occurrence of AKI. The AUC ROC curve of 0.860 also indicates that this training set yields excellent performance with a classifier model, lending credibility to this model to predict AKI correctly in the test set. In the test set, the validation was performed using naïve Bayes models built from data (n = 100). The participants in the test data (n = 100) did not include any identical participants from the training data. This test set performed well against the training model, with a 71% accuracy in correctly classified instances, as seen in Figure 5. This means that this test set will accurately predict AKI 71% of the time when given these attributes. The mean absolute error percentage was 30.6%, which indicates a good fit of this model to predict AKI versus the actual occurrence of AKI. The area under ROC curve is 0.781, indicating that this test set yields a good performance as a classifier model, differing slightly from the training set.  Recall-the ability of a model to find all the relevant cases within a data set. Mathematically, we define recall as the number of true positives divided by the number of true positives plus the number of false negatives. F-Measure considers both precision and recall to compute. F1 score reaches its best value at 1 and worst value at 0. MCC-a correlation of: C = 1 indicates perfect agreement, C = 0 is expected for a prediction no better than random, and C = −1 indicates total disagreement between prediction and observation. ROC area-an AUC of 0.5 suggests no discrimination, from 0.7 to 0.8 is considered acceptable, from 0.8 to 0.9 is considered excellent, and more than 0.9 is considered outstanding. PRC area-shows the relationship between precision (=positive predictive value) and recall (=sensitivity) for every possible cut-off. Average precision ranges from the frequency of positive examples from 0.5 (for balanced data) to 1.0 (perfect model).
In the test set, the validation was performed using naïve Bayes models built from data (n = 100). The participants in the test data (n = 100) did not include any identical participants from the training data. This test set performed well against the training model, with a 71% accuracy in correctly classified instances, as seen in Figure 5. This means that this test set will accurately predict AKI 71% of the time when given these attributes. The mean absolute error percentage was 30.6%, which indicates a good fit of this model to predict AKI versus the actual occurrence of AKI. The area under ROC curve is 0.781, indicating that this test set yields a good performance as a classifier model, differing slightly from the training set. Recall-the ability of a model to find all the relevant cases within a data set. Mathematically, we define recall as the number of true positives divided by the number of true positives plus the number of false negatives. F-Measure considers both precision and recall to compute. F1 score reaches its best value at 1 and worst value at 0. MCC-a correlation of: C = 1 indicates perfect agreement, C = 0 is expected for a prediction no better than random, and C = −1 indicates total disagreement between prediction and observation. ROC area-an AUC of 0.5 suggests no discrimination, from 0.7 to 0.8 is considered acceptable, from 0.8 to 0.9 is considered excellent, and more than 0.9 is considered outstanding. PRC area-shows the relationship between precision (=positive predictive value) and recall (=sensitivity) for every possible cut-off. Average precision ranges from the frequency of positive examples from 0.5 (for balanced data) to 1.0 (perfect model). Recall-the ability of a model to find all the relevant cases within a data set. Mathematically, we define recall as the number of true positives divided by the number of true positives plus the number of false negatives. F-Measure considers both precision and recall to compute the score. F1 score reaches its best value at 1 and worst value at 0. MCC-a correlation of: C = 1 indicates perfect agreement, C = 0 is expected for a prediction no better than random, and C = −1 indicates total disagreement between prediction and observation. ROC area-an AUC of 0.5 suggests no discrimination, from 0.7 to 0.8 is considered acceptable, from 0.8 to 0.9 is considered excellent, and more than 0.9 is considered outstanding. PRC area-shows the relationship between precision (=positive predictive value) and recall (=sensitivity) for every possible cut-off. Average precision ranges from the frequency of positive examples from 0.5 (for balanced data) to 1.0 (perfect model).

Discussion
Factors that contribute to AAs' increased risk for developing AKI during hospitalization include increased incidence of comorbidities as well as genetic differences [3]. Concerning comorbidities, diabetes and hypertension are the most common precipitating factors for developing CKD. African Americans have compounded risk associated with developing AKI during hospitalization, leaving them vulnerable to poor outcomes. Early prediction of AKI in AAs is the first step in reducing the occurrence of AKI. For this reason, we aimed to establish the first AKI predictor tool made specifically for AA participants admitted to the hospital utilizing ML to be used for early identification of AKI in AA participants.
Our model accurately predicted true positives and true negatives of AKI more frequently than several available published models [20,21]. Our predictor tool provided a ROC curve of 0.860 (AUC) within our training set, indicating excellent performance as a classifier model, as seen in Figure 4. Compared with previous reports of AKI prediction, our test set provided a superior prediction of AKI in the test set population. In the study Recall-the ability of a model to find all the relevant cases within a data set. Mathematically, we define recall as the number of true positives divided by the number of true positives plus the number of false negatives. F-Measure considers both precision and recall to compute the score. F1 score reaches its best value at 1 and worst value at 0. MCC-a correlation of: C = 1 indicates perfect agreement, C = 0 is expected for a prediction no better than random, and C = −1 indicates total disagreement between prediction and observation. ROC area-an AUC of 0.5 suggests no discrimination, from 0.7 to 0.8 is considered acceptable, from 0.8 to 0.9 is considered excellent, and more than 0.9 is considered outstanding. PRC area-shows the relationship between precision (=positive predictive value) and recall (=sensitivity) for every possible cut-off. Average precision ranges from the frequency of positive examples from 0.5 (for balanced data) to 1.0 (perfect model).

Discussion
Factors that contribute to AAs' increased risk for developing AKI during hospitalization include increased incidence of comorbidities as well as genetic differences [3]. Concerning comorbidities, diabetes and hypertension are the most common precipitating factors for developing CKD. African Americans have compounded risk associated with developing AKI during hospitalization, leaving them vulnerable to poor outcomes. Early prediction of AKI in AAs is the first step in reducing the occurrence of AKI. For this reason, we aimed to establish the first AKI predictor tool made specifically for AA participants admitted to the hospital utilizing ML to be used for early identification of AKI in AA participants.
Our model accurately predicted true positives and true negatives of AKI more frequently than several available published models [20,21]. Our predictor tool provided a ROC curve of 0.860 (AUC) within our training set, indicating excellent performance as a classifier model, as seen in Figure 4. Compared with previous reports of AKI prediction, our test set provided a superior prediction of AKI in the test set population. In the study led by a group of Chinese investigators, they utilized ML to predict AKI in intensive care participants and found their tool to yield a ROC curve of 0.817 in AUC [20]. Our model predicted AKI correctly more than their prediction model. Yue et al.'s predictor model also provided good predictive accuracy in terms of discrimination and calibration, with recall and F1 scores of 0.852 and 0.895, respectively [20]. This again compared to our predictor tool, which has the recall and F1 scores of 0.770 and 0.769, respectively. In Yue's study, they aimed to predict AKI among sepsis patients. Sepsis-induced AKI can occur in several factors, including the timing of antibiotic administration, antibiotic selection, drug and pathogen resistance pattern, and fluid administration. Yue et al. did not include the nuances of several etiologies of sepsis in their model. This could explain their lower AUC ROC compared to our study.
The study led by Yue et al. does not offer the same clinical utility in the AA population as this study, as they did not include a diverse participant population. The comparator calculator measured for ethnicity, where white participants accounted for 75.3% of participants and black participants accounted for just 8.2% [20]. The race was not found to have a statistically significant effect in causing AKI. Given the increased occurrence of AKI in AAs, it is imperative that AAs hold a clinically significant part in studies such as these. In the 2016 cross-sectional survey of the National Hospital Discharge Survey of 276,138 participants by Mathioudakis et al., it was found that black patients had 50% higher odds of having AKI while inpatient. It was also found that black patients were more likely to have comorbid conditions that increase the risk of AKI during hospitalization, including sepsis (2.7% vs. 2.2%, p < 0.001) and CKD (5.0% vs. 4.0%, p < 0.001) [22]. This further indicates the need for a predictor tool, such as the one validated in this study, to provide comprehensive care for all patients.
Patient characteristics have shown that what increases the risk for developing kidney disease were also principal characteristics in patients that develop AKI inpatient. As seen in Table 1, infection and diabetes accounted for the largest amount of AKI per characteristic, 39.7% and 41.6%, respectively, aside from nephrotoxic agents. As patients battle infection, severe sepsis, for example, AKI precipitates as a result of ischemia. End organ kidney damage can often lead to metabolic derangement that results in increased length of hospital stay and increased mortality. Therefore, accounting for infection in predictive models is vital. If we can predict AKI in these patients and optimize therapy choice before AKI precipitates, we can decrease renal exposure and decrease hospital length of stay and costs.
Our cohort included only 100 participants for the classification of our model compared to the 3176 participants included in the study by Yue et al. [20]. The amount of data required for ML to correctly classify prediction is summed up by the rule of ten. The rule of ten states that ten times the number of parameters, or degrees of freedom, in the model were needed to allow for ML to correctly classify training data [23]. For this reason, the cohort size used in their analysis is a strength and limitation of our study. Nineteen parameters were used in our study, and while a total of 200 participants were included in this analysis, the split into training set and test set failed to reach the hypothesized ideal sample size. The comparator predictor tool had 56 parameters with greater than 10 times per parameter number of participants included in the analysis. Regardless, our model still showed 77% accuracy and AUC ROC of 0.86, signifying that it is still powerful and accurate.
KDIGO guidelines for the management of AKI suggested using SCr and urine output for screening. Patients with urine output of less than 0.5 mL/kg/h for 6-12 h can be classified as having AKI regardless of their SCr values. However, in our study, urine output was not collected appropriately on all included participants and was therefore not used to stage the AKI. This may limit the diagnosis of AKI in patients who did not have acute rises in SCr but had decreased urine outputs [7].
What is next for this study is the development of a website or application interface for practitioners to use to input patient attributes. Development of this model and its algorithm into a calculator for ease of use would allow for further external validation from third parties. This is a critical step in taking this research nationwide, with the eventual goal of being immersed in EMRs. This would allow for seamless integration and predictive ability with a possible added benefit of therapy choice direction.

Conclusions
Our AKI predictor tool performed well, with a 77% accuracy in correctly classified instances in correctly predicting AKI in the training set. The clinical significance of this model can lead to great advancements in the care of AA patients and provide practitioners avenues to optimize their therapy of choice in AAs when given AKI risk ahead of time. Development of this model and its algorithm into a calculator for ease of use would allow for further external validation from third parties. Institutional Review Board Statement: IRB approval number: IRB-2021-0223. The IRB has granted access to complete patient medical records with the following identifiers: Names; Geographic subdivisions smaller than state (e.g., street address, city, five-digit zip code, county); Months or specific dates (e.g., birth date; admission date, month of discharge, date of death); References to age 90 or older or references to dates or years indicative of age 90 or older; Medical record or prescription numbers and Account Numbers for the purpose of research/data analysis. The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Howard University (protocol code HU-PNAH-PNAS007 approved on 12 November 2021).

Informed Consent Statement:
Howard University Office of Regulatory Research Committee Institutional Review Board approved this protocol and granted the authors a waiver from the requirement for informed consent because this research does not pose more than minimal risk to the subject and the rights and welfare of the subjects will not be adversely affected.