Predicting Adult Hospital Admission from Emergency Department Using Machine Learning: An Inclusive Gradient Boosting Model

Background and aim: We analyzed an inclusive gradient boosting model to predict hospital admission from the emergency department (ED) at different time points. We compared its results to multiple models built exclusively at each time point. Methods: This retrospective multisite study utilized ED data from the Mount Sinai Health System, NY, during 2015–2019. Data included tabular clinical features and free-text triage notes represented using bag-of-words. A full gradient boosting model, trained on data available at different time points (30, 60, 90, 120, and 150 min), was compared to single models trained exclusively at data available at each time point. This was conducted by concatenating the rows of data available at each time point to one data matrix for the full model, where each row is considered a separate case. Results: The cohort included 1,043,345 ED visits. The full model showed comparable results to the single models at all time points (AUCs 0.84–0.88 for different time points for both the full and single models). Conclusion: A full model trained on data concatenated from different time points showed similar results to single models trained at each time point. An ML-based prediction model can use used for identifying hospital admission.


Introduction
Emergency departments (ED) are overcrowded in the United States and internationally, hindering patient care and system efficiency. ED overcrowding has been associated with increased mortality and morbidity, longer wait times, and length of stay. Overcrowding has also increased hospital expenses and generated poorer patient perceptions of care [1][2][3][4].
The ED admission process is aimed to expedite patient disposition, as EDs are becoming overcrowded. Typically, ED admission is started by a triage nurse that performs the triage process. The nurse records the patient's demographic data and measures vital signs. The nurse also records the patient's visit reason in a free-form text note [4]. Other clinical data accumulate as the patient is in the ED. This mainly includes laboratory results that can infer essential clues to the patient's clinical condition. For example, leukocytosis (WBC) may signal an infection, and increased troponin may signal myocardial infarction.
In hospitals with EHRs, where patient data are recorded at the point of care, EHR data can be utilized to generate short-term predictions of hospital admissions and thus bed demand. These would help control teams, responsible for allocating beds, to make best use of available capacity and reduce cancellations of elective admissions [5]. Secondly, patients can anticipate hospitalization, which could increase patient satisfaction. Thirdly, it may have prognostic value as patients who need hospitalization are often the sickest and will benefit most from time-sensitive ED treatment [6].
Increases in digital electronic health records (EHR) data volume [7] are driving machine learning use in healthcare processes [8,9]. Previous works utilized ED data [10,11] for building models for predicting hospital admission.
It is important to note that triage is not the same as the hospital admission prediction (also known as patient disposition). ED data include multiple features that stream at different progressive time points. Thus, different predictions can be made depending on the time from the patient's arrival at the ED. Yet, previous works presenting decision support tools for predicting disposition from the ED did not describe a full gradient boosting model that handles multiple time points. Such a model may be easier to train and implement than multiple models trained exclusively at each time point. However, it should be tested whether such a model remains stable, as it aggregates multiple time points.
This study aimed to develop an inclusive tabular-free-text gradient boosting model for predicting hospital admission and compare its results to multiple time points models.

Data Sources
The model is built using patient data from Mount Sinai Health System (MSHS), an urban health system in New York City. Emergency Department data were obtained from five hospitals within the MSHS in New An institutional MSHS ethical board committee approval was granted for this retrospective study. The committee waived informed consent.

Study Design
Using the MSHS data warehouse, we identified patients who presented to our ED between January 2015 to December 2019. We extracted tabular clinical and demographic data from this cohort, including all free-text triage notes. Figure 1A presents the data collection method.
For all ED patients within the extracted cohort, the models were designed considering tabular EHR data and triage notes. The single models were developed on five different time points data (30, 60, 90, 120, and 150 min), and the full model was developed using all time points data concatenated. All models' testing sets results were compared using confusion matrix metrics and the area under the receiver operating characteristic (ROC) curves (AUC).

Study Population
We retrospectively included patients who presented to the ED between 2015 and 2019. All patients over 18 years of age and admitted to the EDs of five MSHS hospitals. We excluded all patients younger than 18. We also excluded erroneously created or duplicated patient records.

Study Data
We extracted clinical and demographic data, including sex and age, hospital facility, admission time, vital signs, and laboratory results, as described in Figure 1C. Within the same data source, we also extracted all triage notes. Upon a patient's arrival in the Mount Sinai ED, the first clinical documentation recorded in a patient's chart is a triage note consisting of an abbreviated patient history written by a triage nurse. All included recorded data was time stamped.

Outcome Definition
The primary outcome was a prediction of hospitalization. The average disposition time from ED presentation is usually ≥3 h. This model can reduce hospitalization time and predict hospitalization starting from 30 min of patients' admission to the ED.

Model Development
For this study, we selected Extreme Gradient Boosting (XGBoost) implementation of gradient boosting decision trees [12]. The XGBoost algorithm provided robust prediction results through an iterative process of prediction summation in decision trees that fit the residual error of the prior ensemble. Both tabular and free-text data were used to build the model to predict patients' admission from the ED. Table 1 presents the list of tabular features used in the models. The prediction is updated every 30 min from the patient's ED arrival time. Model development and how the dataset moves from epic to XGBoost model for prediction are described with an architecture diagram in Figure 2.
The bag-of-words (BOW) model was implemented on triage free-text notes. BOW is a representation that turns arbitrary text into fixed-length vectors by counting word frequencies, and this process is often referred to as vectorization. A statistical classifier is then trained to classify each paragraph based on word frequency and number. To implement the BOW model, we preprocess the data by converting text to lower case and removing all non-word characters and punctuations. The BOW Vector and tabular EHR data were combined using sparse vector representation.
Multiple models were developed to ensure its performance in the form of Experiments. In Experiment 1, multiple single models were developed and trained on multiple time points data with 30, 60, 90, 120, and 150 min timeframes. In Experiment 2, we developed a full model trained on the concatenated entire time points data and tested on the multiple time points (30, 60, 90, 120, and 150 min). This was done by concatenating the rows of data available at each time point to one data matrix for the full model, where each row is considered a separate case. Thus, at each time point, the full model gives predictions based on the data available at that time.

Models Training and Testing
Data were split chronologically. Data from the years 2015-2018 were used for training and validation, and the 2019 data were used for testing. The default XGBoost hyperparameters were used for all the models: eta = 0.3, max depth = 3. We set n_estimators = 200. The XGBoost model handled imputations of missing values. Scale balancing of the XGBoost was set to the default scale pos weight = 1.

Models Interpretation
We evaluated the performance of the full vs. single models for each time point (30, 60, 90, 120, and 150 min). Single feature analysis was also performed for the structured variables. SHapley Additive exPlanations (SHAP) summary plots were constructed to assess the full XGBoost model feature importance. Finally, using the full dataset, we compared a structured data only model to a free-text-only model.

Statistical Analysis
All development and statistical analysis were done using Python (Version 3.6.5). Continuous variables are reported as the median, with the spread reported as the Interquartile range (IQR). Categorical variables are reported as percentages. Categorical variables were compared using the χ2 test, and continuous variables were compared using Student's t-test. Statistical significance was established at a two-sided p-value of p < 0.05.
We constructed receiver operating characteristic (ROC) curves for all models and evaluated the AUC. We determined sensitivity (recall), specificity, and precision (positive predictive value, PPV) for a default cut-off probability of 0.5. Confusion matrix values (true positive, false positive, true negative, false negative) are also reported for this cut-off probability.

Cohort Characteristics
1,043,345 ED visits in five years (2015-2019) were included. Of those, 19.3% were hospitalized (n = 201,520). The median time from ED entrance to ED disposition time was 194 min (IQR 113-314 min). Table 2 represents the cohort's characteristics. Patients admitted to the hospital had an average age of 64. They were more likely to have higher systolic blood pressure, higher respiratory rate, and higher heart rate ( Table 1).

Experimental Results
To evaluate an inclusive tabular-free-text gradient boosting model which predicts hospitalization, we have tested two experiments on multiple progressive time points.

Experiment 1
Five single models (T30, T60, T90, T120, and T150) were developed on the ED dataset at multiple time points (30, 60, 90, 120, and 150 min from presentation to the ED), as described in Figure 3. The AUC metrics for experiment 1 were generated for each single model (T30, T60, T90, T120, and T150) and are presented in Figure 3 and the ROC curves are presented in Figure 4. The metrics of the single models are presented in Table 3.

Experiment 2
The full model is developed using the ED patients dataset with all-time points data concatenated as one dataset, as presented in Figure 5. The full model was tested individually on multiple progressive time point datasets (30, 60, 90, 120, and 150 min from ED presentation), as described in Figure 6. The metrics for the full model on different time points are presented in Table 4.     For all time points, AUC confidence intervals (CI) overlapped between the single models and the full model. Thus, no significant statistical difference was shown (Table 5). We also trained the full model (tabular + text) without the ESI feature. For this model, the AUC was 0.86. This is comparable to the full model (tabular + text) with the ESI feature, which showed a similar AUC of 0.87.

Single Feature Analysis
In the single feature matrix analysis, the tabular variables with the highest AUC were Age (AUC 0.726), followed by ESI (AUC 0.722), sodium (AUC 0.621), and calcium (AUC 0.618). The single feature analysis is presented in Table 6.  Figure 7 shows the SHAP plot of the full model analyzed at all time points. Our analysis of ED notes suggests that certain features are highly associated with hospitalization. Based on the SHAP graph, the top three terms were "admission", "surgery", and "crohn". Similarly, the structured feature "ESI", which is nurse assessment of severity (scale 1-5), was also a high predictor of hospital admission.

Discussion
In this study, we developed a machine learning model based on free-text triage notes and EHR tabular data to predict hospital admission from the ED. For this task, a full gradient boosting model, trained on the entire structured and free-text data at different time points, showed stability compared to multiple single models trained on various time-frame points. Using a full model may be easier to train and implement.
Several previous works presented models for the prediction of hospital admission. For example, Hong et al. compared gradient boosting (XGBoost) to deep neural networks for predicting hospital admission with demographics and triage features. Both models showed similar results [10]. Their models showed AUCs of 0.87, which is comparable to the current AUC of 0.85-0.88. However, Hong et al. used previous patient data (medication, past medical history), while we used more current HER data (labs). Their approach has benefits, although it requires the patient to be seen in the same health system before to have these data recorded and also to maintain an updated hashed data lake for instant access for each patient.
Raita et al. predicted both hospitalization and critical care outcomes, and their features included demographics, vital signs, chief complaints, and comorbidities. They've shown that machine learning models (lasso logistic regression, random forest, gradient boosting, neural networks) outperformed logistic regression [13]. Their neural network model showed similar results to our model (AUC 0.86) while again using previous medical history data (comorbidities). Unlike the previous examples that evaluated a single model at triage time, Lee et al. used logistic regression to predict admission at three different positions. They built single models using three data sets (demographics, triage vitals, and laboratory results) [14]. Lee's model predicted different disposition endpoints, including ICU, but general practice and observation units showed similar AUCs to our model (0.89, 0.86). Barak et al. also evaluated single logistic regression models at three progressive time stops, using demographics, triage, and laboratory data [15]. Barak's et al. model showed an AUC of 0.97, which is higher than all other cited models and our model. We cannot explain this difference based on available data.
While the previous studies evaluated only structured data, several studies used freetext. For example, Lucini et al. utilized provider notes available several hours after triage to train several machine learning models (for example, random forest and logistic regression) [16]. Sterling et al. used the single data source of triage notes using neural network regression models with bag-of-words (BOW) [4]. Sterling's model, which utilized triage notes as a single feature, showed an AUC of 0.73.
Our previous study trained a single triage data gradient boosting model (demographics, vital signs, and triage notes) to predict admission to the neurosurgical intensive care unit [17].
In the current study, we evaluated the use of one full gradient boosting (XGBoost + BOW) model at multiple progressive time points (every 30 min after the patient presented to the ED) using both structured and free-text data. We compared the full model to single models trained at each time position. While the full model demonstrated satisfactory results and performed well within 30 min of the patient's admission to ED, by using continually aggregated data from the ED, the models showed increased sensitivity of 15% from 30 min to 150 min while maintaining the same PPV. Such a solution is simple to train and easy to deploy. A machine learning model like the one presented could be used by hospital care experts and clinical stakeholders, such as ED clinicians and nursing managers, to identify patients that might need hospitalization early in the ED triage process. This could also allow providers to administer timely care specialists, accelerate patient movement into the hospitals, and potentially reduce ED boarding time.
Implementing the model presented here could be a clinical decision support tool that identifies patients requiring hospital admission and delivers a notification to the specialized care team. In such implementation, selecting an optimal alert threshold necessitates a careful evaluation of model performance and likely depends on multiple factors, including healthcare institution needs, clinical stakeholder preferences, and hospital resource availability.
It should be noted that ESI showed a high AUC in a single feature matrix. To compare the functional value of our model to triage by humans (ESI), we evaluated the AUC of the full model without the ESI feature. This showed an AUC of 0.86. Thus, although ESI by itself has high prediction capability, by utilizing multiple features, the model can "overcome" the loss of ESI as a feature.
To summarize, in this study, we used ED features (demographics, vital signs, ESI, triage note, and laboratory data) to develop a model for predicting hospital admission at different time points from ED presentation. Our model is based on the gradient boosting algorithm, with a BOV approach for free-text notes. We compared a single full model, trained at all time points, to multiple single models, trained at each time point. We have shown that for each time point the full model achieves similar results to the single models. Thus, potentially making it easier to be trained and implemented.
Our study has several limitations. First, this is a retrospective study. Second, while we experimented with the widely used gradient boosting algorithm, other approaches, such as deep learning transformer models, may show better results. Third, it is important to note that each hospital has its own admission policy. Thus, our model serves as a proof of concept, and cannot be implemented as is in a new site, but needs to be adjusted according to the site's setting and policy. Fourth, our model is not intended for the triage setting, as it uses out-of-triage features, mainly laboratory data.

Institutional Review Board Statement:
This performance improvement initiative was approved by the Mount Sinai Institutional Review Board (STUDY-18-00573) prior to beginning data analysis.