Predicting Acute Kidney Injury: A Machine Learning Approach Using Electronic Health Records

Acute kidney injury (AKI) is a common complication in hospitalized patients and can result in increased hospital stay, health-related costs, mortality and morbidity. A number of recent studies have shown that AKI is predictable and avoidable if early risk factors can be identified by analyzing Electronic Health Records (EHRs). In this study, we employ machine learning techniques to identify older patients who have a risk of readmission with AKI to the hospital or emergency department within 90 days after discharge. One million patients’ records are included in this study who visited the hospital or emergency department in Ontario between 2014 and 2016. The predictor variables include patient demographics, comorbid conditions, medications and diagnosis codes. We developed 31 prediction models based on different combinations of two sampling techniques, three ensemble methods, and eight classifiers. These models were evaluated through 10-fold cross-validation and compared based on the AUROC metric. The performances of these models were consistent, and the AUROC ranged between 0.61 and 0.88 for predicting AKI among 31 prediction models. In general, the performances of ensemble-based methods were higher than the cost-sensitive logistic regression. We also validated features that are most relevant in predicting AKI with a healthcare expert to improve the performance and reliability of the models. This study predicts the risk of AKI for a patient after being discharged, which provides healthcare providers enough time to intervene before the onset of AKI.


Introduction
Acute kidney injury (AKI) is common among patients admitted to hospitals, affecting approximately 10% of hospitalized patients and more than 25% of patients in the intensive care unit [1,2]. AKI is defined as an abrupt loss of kidney function over a short period of time [2]. AKI may lead to prolonged hospital stays, lower chance of survival, and a higher risk of developing chronic kidney disease. Over the last 10-15 years, the incidence rate of AKI has increased in the United States [3,4], the United Kingdom [5] and Canada [6,7]. The growing incidence rate of AKI is associated with the changing spectrum of diseases. There is an increasing body of evidence proving that patients with extrarenal complications and multiple comorbidities are at a greater risk of developing AKI [8,9]. Aikar et al. [10] have shown that the high comorbidity rate, measured by the Deyo-Charlson comorbidity index, is associated with AKI. As a patient's number of comorbid conditions grows, there is a rise

Materials and Methods
We discuss the data sources and methodology in this section, which includes the design settings, design flow, data integration, cohort entry criteria, input features, outcomes and proposed machine learning techniques.

Study Design and Setting
We conducted a population-based retrospective cohort study in older patients who visited a hospital or emergency department between 1 April 2014 and 31 March 2016, using health administrative databases stored at ICES. These datasets were connected using unique encoded identifiers and analyzed at ICES. The use of datasets in this study is authorized under section 45 of Ontario's Personal Health Information Protection Act, which does not need review by a Research Ethics Board.
Ontario has a population of about 13 million residents with universal access to physician services and hospital care, which includes 1.9 million people aged 65 years or older. We suppressed the results of this study in cells with five or fewer patients to comply with ICES privacy regulations and minimize the possibility of reidentification of patients. Figure 1 shows the basic workflow of the study described in this paper. In the first step, we created an integrated dataset from five different health administrative databases. The data sources are discussed in Section 2.3. Next, we describe the inclusion and exclusion criteria in Section 2.4. The features in the comorbidity, prescription, demographic and diagnosis codes data were encoded and transformed into suitable forms for analysis in the preprocessing stage, which is discussed in Section 2.7. The analysis techniques and results are presented in Sections 2.8 and 3, respectively.

Data Sources
We ascertained patient characteristics, drug prescriptions, outcome and medical history data from 5 administrative databases (as shown in Table A1 Appendix A). These datasets are linked using a unique identifier, which is derived from health card numbers. We collected vital statistics from the Ontario Registered Persons Database (RPDB) [36], which includes demographic data of all residents in Ontario who have a valid health card. We utilized the Ontario Drug Benefit (ODB) Program database [37] to get prescription medication use data. The ODB database holds all the outpatient prescription records dispensed to older patients, which has an error rate of less than 1% [38]. We ascertained baseline comorbidity, emergency department visit and hospital admission data from the National Ambulatory Care Reporting System (NACRS) (i.e., for the emergency department) [39] and the Canadian Institute for Health Information Discharge Abstract Database (CIHI-DAD) (i.e., for hospital admissions) [40]. We applied the ICD-10 (i.e., International Classification of Diseases, post-2002) [41] codes to identify baseline comorbidities within the look-back window. In addition, baseline comorbidity data were acquired from the Ontario Health Insurance Plan (OHIP) database [42], which holds claim records for physician services. All the coding definitions for the comorbidity databases are provided in Table A2.

Cohort Entry Criteria
We identified a cohort of individuals aged 65 years or older who visited the emergency department or were admitted to hospital between 2014 and 2016 ( Figure 2). The hospital admission or emergency department discharge dates were taken as the cohort entry or index date. If a patient had multiple hospital admissions and emergency department visits, we chose the first incident. We excluded patients with invalid or missing age, sex and/or health card number. In addition, we excluded patients who (1) previously underwent a kidney transplant or dialysis treatment, as AKI is usually no longer relevant once patients develop end-stage kidney disease; (2) left the emergency department or hospital without being seen by a physician or against medical advice; and (3) developed AKI during emergency  Figure 1 shows the basic workflow of the study described in this paper. In the first step, we created an integrated dataset from five different health administrative databases. The data sources are discussed in Section 2.3. Next, we describe the inclusion and exclusion criteria in Section 2.4. The features in the comorbidity, prescription, demographic and diagnosis codes data were encoded and transformed into suitable forms for analysis in the preprocessing stage, which is discussed in Section 2.7. The analysis techniques and results are presented in Section 2.8 and Section 3, respectively.  Workflow diagram of the presented study where different colors are used to represent three main parts (data integration and preprocessing, analysis and validation). The figure shows how different combinations are formed using two sampling techniques (i.e., under sampling and synthetic minority over-sampling technique), three ensemble methods (i.e., boosting, bagging and XGBoost), and eight machine learning classifiers.

Data Sources
We ascertained patient characteristics, drug prescriptions, outcome and medical history data from 5 administrative databases (as shown in Table A1). These datasets are linked using a unique identifier, which is derived from health card numbers. We collected vital statistics from the Ontario Registered Persons Database (RPDB) [36], which includes demographic data of all residents in Ontario who have a valid health card. We utilized the Ontario Drug Benefit (ODB) Program database [37] to get prescription medication use data. The ODB database holds all the outpatient prescription records dispensed to older patients, which has an error rate of less than 1% [38]. We ascertained baseline comorbidity, emergency department visit and hospital admission data from the National Ambulatory Care Reporting System (NACRS) (i.e., for the emergency department) [39] and the Canadian Institute for Health Information Discharge Abstract Database (CIHI-DAD) (i.e., for hospital admissions) [40]. We applied the ICD-10 (i.e., International Classification of Diseases, post-2002) [41] codes to identify baseline comorbidities within the look-back window. In addition, baseline comorbidity data were acquired from the Ontario Health Insurance Plan (OHIP) database [42], which holds claim records for physician services. All the coding definitions for the comorbidity databases are provided in Table A2.

Cohort Entry Criteria
We identified a cohort of individuals aged 65 years or older who visited the emergency department or were admitted to hospital between 2014 and 2016 ( Figure 2). The hospital admission or emergency department discharge dates were taken as the cohort entry or index date. If a patient had multiple hospital admissions and emergency department visits, we chose the first incident. We excluded patients with invalid or missing age, sex and/or health card number. In addition, we excluded patients who (1) previously underwent a kidney transplant or dialysis treatment, as AKI is usually no longer relevant once patients develop end-stage kidney disease; (2) left the emergency department or hospital without being seen by a physician or against medical advice; and (3) developed AKI during emergency department visit or hospital admission, as they are already under observation. The diagnosis codes for the exclusion criteria are presented in Table A3.
We identified 2,305,783 hospitalization and 12,347,256 emergency department visit records in CIHI-DAD and NACRS, respectively. Next, a total of 5,635,909 unique individuals were identified using RPDB. There were 1,007,993 individuals included in the cohort after excluding patients with invalid age, sex, and/or health card number and selecting patients aged 65 years or older. Finally, a total of 905,442 individuals were included in the final cohort after applying the other exclusion criteria.

Input Features
All these features from different data sources were integrated using the encoded identifiers derived by ICES using patient health card numbers. For each patient, we generated new features and aggregated multiple values (rows) of a single feature into one by considering the latest values of that feature. There were totals of 307,624, 768,293, 898,538 and 891,176 unique observations in the aggregated CIHI-DAD, NACRS, ODB and OHIP databases, respectively. We identified patients We identified 2,305,783 hospitalization and 12,347,256 emergency department visit records in CIHI-DAD and NACRS, respectively. Next, a total of 5,635,909 unique individuals were identified using RPDB. There were 1,007,993 individuals included in the cohort after excluding patients with Information 2020, 11, 386 5 of 20 invalid age, sex, and/or health card number and selecting patients aged 65 years or older. Finally, a total of 905,442 individuals were included in the final cohort after applying the other exclusion criteria.

Input Features
All these features from different data sources were integrated using the encoded identifiers derived by ICES using patient health card numbers. For each patient, we generated new features and aggregated multiple values (rows) of a single feature into one by considering the latest values of that feature. There were totals of 307,624, 768,293, 898,538 and 891,176 unique observations in the aggregated CIHI-DAD, NACRS, ODB and OHIP databases, respectively. We identified patients transferred from the emergency department to the hospital (appeared in both CIHI-DAD and NACRS) and removed duplicates by considering the first incident. We identified a total number of 1878 unique diagnosis codes (using CIHI-DAD, NACRS and OHIP) and 595 distinct medications (using ODB) for 905,442 individuals who were included in the final cohort. We used the Chi-Square test for feature selection and then filtered the selected features with a healthcare expert. The final combined dataset included a total of 86 unique features. The cohort contained 11 comorbidity features-namely, chronic kidney disease, diabetes mellitus, cerebrovascular disease, coronary artery disease, hypertension, chronic liver disease, major cancers, peripheral vascular disease, heart failure and kidney stones. We applied a 5-year look-back window to detect these baseline comorbidities. There were four demographics features-namely, sex, age, region and income quintile. We included 55 medications that were prescribed to the patients within 120 days before the first hospital admission or emergency department visit. These medications belonged to 13 distinct drug classes-namely, ACE-inhibitors (blood pressure and heart failure), beta-blockers (blood pressure), alpha-adrenergic blocking agents (blood pressure), angiotensin-receptor blockers (blood pressure), calcium blockers (blood pressure), macrolides (antibiotics), fluoroquinolones (antibiotics), potassium-sparing diuretics (weak diuretic), other diuretics, nonsteroidal anti-inflammatory agents (pain relievers), oral hypoglycemic (diabetes mellitus) and immunosuppressive agents (immune system activity).
The cohort also included 16 ICD-10 diagnosis codes that were identified during the index hospitalization or emergency department visit. The codes were related to delirium, mycoplasma pneumoniae, disorders of fluid, electrolyte and acid-base balance (e.g., hyperosmolality and hypernatraemia, hypo-osmolality and hyponatraemia, acidosis, alkalosis, mixed disorder of acid-base balance, hyperkalaemia, hypokalaemia, fluid overload, and other disorders of electrolyte and fluid balance), atrial fibrillation, anemia, femur fracture, valve disorders, atherosclerotic cardiovascular disease, diseases of the digestive system (e.g., paralytic ileus, intussusception, volvulus, gallstone ileus, other impaction of intestine, intestinal adhesions with obstruction, and other and unspecified intestinal obstruction ileus), Certain infectious and parasitic diseases (e.g., sepsis due to Staphylococcus aureus, other specified Staphylococcus, Haemophilus influenzae, Escherichia coli, Pseudomonas, Serratia marcescens, other Gram-negative organisms, Gram-negative Septicaemia and Enterococcus), dehydration and other volume depletion, abnormal function (e.g., abnormal results of function tests of central nervous system, peripheral nervous system and special senses, pulmonary function tests, cardiovascular function tests, kidney function tests, liver function tests, thyroid function tests, other endocrine function tests and electrocardiogram suggestive of ST-segment elevation myocardial infarction, abnormal cardiovascular function tests, and other abnormal results of cardiovascular function tests), chronic pulmonary (e.g., chronic obstructive pulmonary disease with acute lower respiratory infection and acute exacerbation and other specified chronic obstructive pulmonary disease), dementia, glomerular disorders (e.g., glomerular disorders in infectious and parasitic diseases, neoplastic diseases, blood diseases and disorders involving the immune mechanism, diabetes mellitus, other endocrine, nutritional and metabolic diseases, and systemic connective tissue disorders) and hyperplasia of prostate.

Outcome: Identification of AKI
Machine learning models were built to predict AKI within 90 days after being discharged from the hospital or emergency department. Positive cases were those in which patients revisited hospital or emergency department with AKI within 90 days after being discharged, and negative cases were the ones wherein hospitalizations or emergency department visits with AKI never took place. There were totals of 899,449 negative and 5993 positive cases in the dataset. There were no recurrent AKI examples (i.e., excluded 25,084 patients) in the data because we excluded the cases wherein AKI or dialysis was acquired during the index hospital stay or emergency department visit.
The incidence of AKI was detected using the Canadian Institute for Health Information Discharge Abstract Database and National Ambulatory Care Reporting System based on the ICD-10 (International Classification of Diseases-Tenth Revision) diagnostic codes (i.e., ICD-10 code of AKI is "N17").

Data Preprocessing
The features in the cohort were transformed into a format and scale that was suitable for the machine learning techniques. For each feature described in Section 2.5, the last recorded value before the first hospital admission or emergency department visit was captured. Medication, diagnosis code and comorbidity features were set to either "Y" or "N." If a patient had a certain comorbid condition or was prescribed a medication, then its corresponding value was taken as "Y." Instead of reporting individual ages, we calculated age group features for the patients. If a patient's age was within the specified range of an age group, we set the value to "1" for that corresponding feature. The sex feature took either "M" or "F" if the information was available in the dataset. Patients with invalid age or sex were removed from the cohort. The region feature took either "R" or "U" to represent rural or urban, respectively. The income feature took an integer value ranging between 1 and 5 to represent the income quintile of a particular patient.

Analysis Using Machine Learning Techniques
We employed both traditional and state-of-art analysis techniques to build trust with end-users and, at the same time, allow them to explore complex relationships in the dataset. We developed 31 AKI prediction models based on combinations of eight classifiers-namely, classification and regression tree (CART) [43], C5.0 [44], naïve Bayes (NB) [45], logistic regression [46], and support vector machine (SVM) with four different kernels (linear, polynomial, sigmoid and radial) [47], two sampling techniques (namely, under sampling and SMOTE) and three ensemble methods-namely, Boosting, Bagging and XGBoost. These techniques were chosen for several reasons, as follows: (1) They each represent different types of machine learning methods. For example, the decision tree is a rule-based, regression is a statistical, and naïve Bayes is a probability-based method. (2) Each of these methods has its own set of advantages and limitations. For instance, decision tree models are more human-interpretable but often fail to represent complex relationships among data elements. On the contrary, SVM is equipped to model complex non-linear relationships using different kernels, but is difficult to interpret. (3) Medical experts are more familiar with regression than other machine learning algorithms, which convinced us to include regression in this analysis.

Ensemble Methods
Since the number of negative cases was significantly higher than the number of positive cases, we considered the dataset as highly imbalanced. Traditional machine learning techniques, such as decision tree, support vector machine and so on, which are designed to optimize the overall accuracy, tend to achieve poor performance in this class imbalanced learning scenario because they try to minimize the overall error to which the minority class barely contributes. These techniques have shown high precision (i.e., a small number of false positives), reduced sensitivity (i.e., a higher number of false negatives) and low AUROC scores for our dataset, because they get biased toward the majority class and fail to map minority class. An ensemble method offers a solution to this problem by combining several classification models to obtain better performance than the base classifiers [48]. To deal with the class imbalance issue in this study, we incorporated four different combinations of ensemble and sampling methods-namely, SMOTEBoost, SMOTE-Bagging, UnderBagging and RUSBoost, which are available in the "embc" package of R [49][50][51]. The RUSBoost was implemented using the "rus" function in the "ebmc" package. The weak learners in RUSBoost are trained on random under-sampled datasets [52]. Those learners are then combined to generate the final ensemble model. We used the "sbo" function to implement SMOTEBoost. SMOTE (Synthetic Minority Oversampling Technique) is a sampling technique that synthesizes new instances for the minority class using the k-nearest-neighbors algorithm [53]. SMOTEBoost returns several weak learners that are trained on SMOTE-generated datasets along with their error estimations [54]. The "sbag" function was used to implement SMOTEBagging, which combines SMOTE and random over-sampling to rebalance the dataset [44]. We used the "ub" function to implement the UnderBagging method. Unlike other ensemble methods discussed above, UnderBagging only incorporates random under-sampling to reduce the instances of the majority class in each bag to rebalance the class distribution. We configured this function in such a way that the amount of majority instances became equal compared to the minority instances (i.e., imbalance ratio = 1). We compared the models' performance for different ensemble sizes (i.e., 10, 15, 20, 25 and 30) and used 20 weak learners for the algorithms. We used NB, SVM, CART and C50 as weak learners for the ensemble methods, which are discussed in the following subsections. Since ensemble methods are designed to combine several base models to obtain better performance than the weak learners, and these algorithms (i.e., NB, SVM, CART and C50) are used as weak learners in this study, we did not perform an explicit grid search to tune the hyperparameters.

Support Vector Machine
The objective of the SVM is to find an optimal separating hyperplane in a multi-dimensional space (i.e., depending on the number of features) that distinctly divides the instances of different classes. Although SVM models are often not human-interpretable, it has been proven to work well on prediction tasks involving a large number of features [18]. It has become popular in healthcare research recently because it is more effective in analyzing high dimensional EHRs. In addition, the regularization parameters of SVM kernels help users avoid over-fitting. Since the performance of the models widely varies depending on the selection of the kernel [55], and kernels are quite sensitive to over-fitting [56], one of the main challenges is to select an appropriate kernel. Thus, we tested the performance of four well-known kernel functions in this study-namely, linear, polynomial, sigmoid and radial.

Decision Tree
A decision tree is the representation of possible outcomes of a decision depending on certain conditions [44]. It is similar to a flowchart where every non-leaf node represents a test for a specific feature, and the leaf node represents a particular outcome. A decision tree reduces the ambiguity of complicated clinical decisions and requires reduced effort for data preparation compared to other techniques. It can be an effective technique for analyzing datasets with missing values because the tree-building process is not affected by the missing data [57]. We chose the decision tree mainly because it is easy to interpret and understand. Despite the advantages, decision tree models are often volatile, meaning that a minor alteration in the training data may cause a massive change in the structure of the tree. To overcome this issue, we included other types of base classifiers along with the decision tree and verified the structure of the generated tree with a healthcare expert. We incorporated two different algorithms to develop decision tree models in this study. The classification and regression tree (CART) were implemented using the "rpart" package [43], and the C5.0 classifier was implemented using the "C50" package in R [44].

Naïve Bayes
NB is a simple probabilistic classifier founded on Bayes theorem [45], which is exceptionally fast to train compared to other complex techniques [55]. Classification of the new data using this technique only requires mathematical operations based on the feature probability. We chose NB mainly because it is less sensitive to missing data. However, since this technique is designed based on the assumption of feature independence, the performance may deteriorate when features in the training data are related. We used the "naive Bayes" package to implement the NB algorithm in this study [58].

Logistic Regression
Logistic regression draws a separating line among the classes using the training dataset, and then applies that line to classify the unknown data points. It is used to analyze the relationships between one dependent feature and one or more independent features. Logistic regression models are informative as they reveal the association among features in terms of odds ratios. Over the last few decades, logistic regression techniques have become very popular in healthcare studies [59]. Although logistic regression models are not designed to support imbalanced classification directly, they can be modified to work with skewed distributions. In order to adjust the regression coefficients while training with the imbalance data, we implemented a cost-sensitive regression model. We adjusted the weight of the minority class based on the cost of its misclassification compared to the cost of misclassifying the majority class. We used internal 10-fold cross-validation during training to determine the appropriate weight for the minority class.

XGBoost
XGBoost (i.e., eXtreme Gradient Boosting) is an advanced implementation of gradient boosted decision trees that can be used for ranking, regression and classification problems [60]. One of the main advantages of XGBoost is that it supports parallel computation, which makes it faster than other techniques of gradient boosting. Because of its time complexity and performance superiority, it has been widely used in healthcare research, such as analysis of EHRs [61] and cancer diagnosis [62]. We used the "xgboost" package to implement XGBoost in R. Since this implementation of XGBoost only works with numeric data, we converted the categorical features in our dataset into numerical vectors. The "xgboost" package includes both a tree learning algorithm and a linear model solver. We implemented both algorithms to compare their performances. This package also has a built-in mechanism to control the balance of positive and negative weights. To train the models with unbalanced data, we adjusted the "scale_pos_weight" parameter based on the ratio of the negative class to the positive class [63]. We performed a grid search on the parameters of XGBoost and tuned the regularization parameters using the best parameters from the grid search.

Results
This section presents the results of this study. We divided the results into two subsections. First, we provide an overview of the dataset in Section 3.1. The results of predictive models are presented in Section 3.2.

Cohort Characteristics
A total of 905,442 participants were included in the derivation cohort, of which 5993 had AKI during their hospital admission or emergency department visit after being discharged from the index encounter. We excluded 25,084 patients who developed AKI during the index hospitalization or emergency department visit. Selected characteristics of the derivation cohort are presented in Table 1. Table 1. All the patients in the cohort were aged 65 years or older, where the mean age was 70 years. Among the participants, about 56% were women. About 6% of patients were in long term care, and 16% were from rural areas. The pre-existing comorbidities were diabetes (38%), hypertension (88%), major cancer (16%), coronary artery disease (25%), cerebrovascular disease (3%), heart failure (14%), chronic kidney disease (9%), kidney stones (1%) and peripheral vascular disease (2%). Some of the commonly prescribed medications were rosuvastatin calcium (22%), atorvastatin calcium (24%), amlodipine besylate (19%), metformin hcl (16%) and hydrochlorothiazide (20%). Baseline characteristics of patients in the cohort who were admitted to the hospital or visited the emergency department between 2014 and 2016.

Classification Results
We evaluated all of the machine learning models using 10-fold cross-validation [66]. The cohort was divided into 10 equal groups, wherein 9 groups were used for training, and the 10th group was used for testing. We repeated this process 10 times, using different parts for training and testing, and assessed the performance of the models for each fold. We then combined the results of these folds to calculate the evaluation scores. We measured the validity of the tests in terms of sensitivity and specificity. Sensitivity is the capacity of a test to classify an individual as "at-risk" correctly. It represents the probability of a test being positive when "AKI" is present. On the contrary, specificity refers to the ability to classify an individual as "risk-free" correctly. Since predicting AKI was a binary classification problem (i.e., AKI or Non-AKI), all of the machine learning techniques were capable of providing a confidence score along with the output. The trade-off between sensitivity and 1-specificity was achieved by altering the threshold on the confidence scores, generating the receiver operating characteristic (ROC) curve. We used the ROC space to compare the performances of alternative tests in terms of 1-specificity and sensitivity. Thus, we computed and reported sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC). The AUROC ranged from 0.61 to 0.88 for predicting AKI among the 31 machine learning models. The average AUROC values of ensemble methods were higher than the cost-sensitive logistic regression model. Among the sampling-based ensemble methods, the performances of the UnderBagging and RUSBoost methods were better than the SMOTE. We achieved the best result of AUROC 0.88 with (1) a combination of RUSBoost and SVM using a sigmoid kernel and (2) XGBoost using a tree learning algorithm. The AUROC of the linear boosting algorithm (XGBoost) was 0.84, which was higher than the cost-sensitive logistic regression but lower than the tree learning algorithm (XGBoost). Since it is a disease prediction problem, high sensitivity was more useful than specificity. The highest sensitivity was 0.90, which was achieved using SVM-sigmoid and SVM-radial kernels with RUSBoost and SMOTE-Bagging, respectively. The complete list of performance measures is presented in Table 2.

Discussion
In this study, we demonstrated how machine learning techniques could help with the prediction of AKI using administrative health databases stored at ICES. Several machine learning-based models have been developed in recent studies to predict AKI among ICU and post-operative patients [29][30][31][32][33][34][35]. However, most of these models only focus on a specific medical condition, and consider the risk factors associated with that condition. For instance, Go et al. (2010) examined how AKI affects the risk of chronic kidney disease, cardiovascular events and other patient-related outcomes in hospital settings [67]. The earlier AKI can be predicted, the better the chances are of preventing AKI and its associated cost. The features that have been used in most of the existing studies work better in predicting AKI if their values are recorded closer to the timing of AKI onset. However, it may not be beneficial to detect AKI close to its onset because clinicians will not have enough time to intervene. Thus, there is a trade-off between accuracy and usefulness, which can be optimized by using the information available in EHRs. Although some studies have developed risk stratification models for AKI using EHRs [68,69], they can only predict hospital-acquired AKI and do not consider patients who are at risk of developing AKI after being discharged. To our best knowledge, there are no previous studies in the literature that predict the risk of AKI after being discharged from the hospital using both the historical and healthcare utilization data. Thus, this study is not only novel, but also clinically relevant, because it provides clinicians with the ability to intervene and treat patients before AKI causes irreversible damage.
We analyzed all AKI events that took place within 90 days after being discharged from the hospital or emergency department, and developed prediction models to identify high-risk patients. We decided to choose a 90-day timeframe for following up because (1) out of all AKI cases within six months after discharge, about 85% were acquired within this timeframe, and (2) it was a reasonable timeframe considering the trade-off between the models' usefulness (from a clinical point of view) and predictive power (from a machine learning point of view). Table 3 shows how many acquired AKI cases were identified within different time intervals. The machine learning models presented in this study can be adapted to make predictions at any other timeframes if needed. We incorporated eight different machine learning classifiers, three ensemble methods and two sampling techniques to develop 31 prediction models. Although each combination of machine learning techniques and ensemble-based methods performed reasonably well, the performance of SVM with sigmoid kernel and tree-based XGBoost produced better results than other techniques in general. The performances of all of the ensemble-based methods were consistent, and produced similar results for different base classifiers. The results shown in Table 2 indicate that the models agreed with each other.
To understand the models better, we explored the features that are important in each prediction model. We analyzed this information with a nephrologist to confirm the correctness of the models. We observed the odds ratio and p-value of the features in the regression model, the feature importance in the decision tree and XGBoost models, and the coefficients in the SVM-linear models in order to understand the associations between different features and AKI. The features included in this study can be divided into four categories-namely, demographics, comorbidities, medications and diagnosis codes.
In general, features from comorbidities and hospital diagnosis codes were more associated with AKI. Although the importance of the features varied based on the machine learning techniques, most of the features that stood out were common among these models. For instance, diabetes mellitus, hypertension, coronary artery disease, heart failure, major cancers, chronic liver disease, peripheral vascular disease and chronic kidney disease were the comorbidity features that were important in most of the prediction models. These comorbid conditions are already known to be associated with AKI in the literature [70][71][72][73][74]. The medication features that contributed to the higher risk of AKI include furosemide, allopurinol, hydrochlorothiazide, atorvastatin, metolazone, sunitinib malate, spironolactone, dexamethasone, chlorthalidone, atenolol, dexamethasone and oseltamivir phosphate. These medications are known to be nephrotoxic [3,[75][76][77][78][79]. Delirium, anaemia, mycoplasma, fluid disorders, atrial fibrillation, atherosclerotic cardiovascular disease, mycoplasma pneumoniae, hyperplasia of prostate, glomerular disorders and valve disorders were the features belonging to the diagnosis codes that were associated with increasing the risk of AKI in the prediction models. Several studies in the literature associate these medical conditions with AKI [80][81][82][83][84]. Among the demographic features, age, sex, location (i.e., urban or rural residence) and long-term care were found to be associated with AKI in most of the prediction models. Similar to comorbidity, medication and diagnosis code, these demographic features are already known to be associated with AKI [85][86][87] in the literature, which more conclusively proves the correctness of the prediction models. Through a comprehensive analysis of ICES's healthcare administrative datasets, this study shows that AKI is predictable using EHRs. Successful implementation of these prediction models in a healthcare setting can potentially reduce the risk of AKI among older patients.

Limitations and Future Work
The paper should be evaluated with respect to several limitations. First, our models were trained and tested on a cohort of older patients (65 years or older), which limits the generalizability of the models. Second, we excluded patients with missing or invalid demographics information. This may affect the performance of the models if the excluded data includes any interesting or rare cases. Third, the models are based on a cohort containing Ontario patients only, which limits this study to a specific geographic location. Fourth, the proposed prediction models are trained and tested on a specific patient cohort. It is essential to test the models' performances with real-time medical data before applying them in a clinical setting. Fifth, since we developed 31 prediction models, and many of them have different mechanisms of identifying feature importance, the interrelationships produced by these models are very complex. This paper only identifies the most significant predictors, but does not incorporate any ranking system for predictors. Finally, we identified episodes of AKI using ICD-10 codes, which may not include undetected cases in hospital settings. Moreover, since AKI was identified using the diagnosis code, this study does not consider the severity of AKI. Our future work concerns a deeper analysis of severe AKI that requires dialysis.

Conclusions
AKI is characterized by a sharp decline in renal function, and is associated with increased health-related costs and mortality. AKI is avoidable and may be preventable through an earlier prediction using risk factors available in EHRs. This study is designed to identify older patients who are discharged from the hospital or emergency department, and are at risk of developing AKI within 90 days after discharge. We employed eight traditional and state-of-art machine learning algorithms, along with two sampling techniques and three ensemble methods, to build AKI prediction models. The performances of these models were consistent, and a maximum AUROC of 0.88 was achieved through 10-fold cross-validation. We analyzed the models with a healthcare expert and identified features that are most relevant in predicting AKI. Most of these features are already known to be AKI-associated, which proves the correctness and feasibility of the prediction models. This study predicts the risk of AKI for a patient after being discharged from the hospital or emergency department, which provides healthcare providers enough time to intervene, monitor them more carefully, and avoid prescribing nephrotoxic medications for such patients.     I62, I630, I631, I632, I633, I634, I635,  I638, I639, I64, H341, I600, I601, I602,  I603, I604, I605, I606, I607, I609, I61,  G450, G451, G452, G453,