An Artificial Neural Network Model for Predicting Successful Extubation in Intensive Care Units

Background: Successful weaning from mechanical ventilation is important for patients in intensive care units (ICUs). The aim was to construct neural networks to predict successful extubation in ventilated patients in ICUs. Methods: Data from 1 December 2009 through 31 December 2011 of 3602 patients with planned extubation in Chi-Mei Medical Center’s ICUs was used to train and test an artificial neural network (ANN). The input was 37 clinical risk factors, and the output was a failed extubation prediction. Results: One hundred eighty-five patients (5.1%) had a failed extubation. Multivariate analyses revealed that failure was positively associated with therapeutic intervention scoring system (TISS) scores (odds ratio [OR]: 1.814; 95% Confidence Interval [CI]: 1.283–2.563), chronic hemodialysis (OR: 12.264; 95% CI: 8.556–17.580), rapid shallow breathing (RSI) (OR: 2.003; 95% CI: 1.378–2.910), and pre-extubation heart rate (OR: 1.705; 95% CI: 1.173–2.480), but negatively associated with pre-extubation PaO2/FiO2 (OR: 0.529; 95%: 0.370–0.750) and maximum expiratory pressure (MEP) (OR: 0.610; 95% CI: 0.413–0.899). A multilayer perceptron ANN model with 19 neurons in a hidden layer was developed. The overall performance of this model was F1: 0.867, precision: 0.939, and recall: 0.822. The area under the receiver operating characteristic curve (AUC) was 0.85, which is better than any one of the following predictors: TISS: 0.58 (95% CI: 0.54–0.62; p < 0.001); 0.58 (95% CI: 0.53–0.62; p < 0.001); and RSI: 0.54 (95% CI: 0.49–0.58; p = 0.097). Conclusions: The ANN performed well when predicting failed extubation, and it will help predict successful planned extubation.


Introduction
A significant percentage of intensive care unit (ICU) patients require endotracheal intubation [1]. Prolonged ventilatory support increases the risk of complications, such as ventilation-associated pneumonia, and could be associated with higher in-hospital mortality and greater post-discharge 2 of 10 mortality, healthcare utilization, and healthcare costs [2]. Thus, extubation of ventilated patients as early as possible after respiratory stabilization is desirable [3]. To reduce the risk of prolonged ventilatory support, it is crucial to determine the appropriate time for weaning a patient from mechanical ventilation [4] because extubation failure might occur in premature extubation. Standard clinical practice is to extubate based on a comprehensive assessment that considers a patient's clinical condition, arterial blood gas results, ventilator settings, and weaning profiles [5]. However, extubation failure often occurs (~19% reintubation required) even after the comprehensive assessment [6]. This suggests that the ability of clinicians to predict successful extubation is limited; a more powerful tool is required to help determine the optimal time to extubate [7].
Outcome prediction models using artificial neural networks (ANNs) and multivariable logistic regression analyses have recently been developed in many areas of healthcare research [8,9]. Artificial neural networks are computer-based algorithms that mimic the habits and structures of neurons. They have also been successfully used to predict mortality in trauma patients [10]. Recently, ANNs have been introduced to predict extubation outcomes, but findings vary by study [11,12]. The main reasons for poor outcome predictions might be because of differences in clinical input data. It was aimed to construct an ANN model for clinicians making extubation decisions.

Patients and Setting
This study retrospectively analyzed 3602 adult patients with planned extubation in eight ICUs of Chi-Mei Medical Center from December 2009 through December 2011. All of them were enrolled in a prospective observational study [13]. All patients were separated into two groups: a successful extubation group and a failed extubation group. Patients who remained extubated after 72 h were classified as having a successful extubation, even if they required reintubation later during the same hospitalization [14,15]. In contrast, patients who needed reintubation within 72 h after a planned extubation were classified as having a failed extubation. Patients who died within 72 h of extubation are also considered as going through an extubation failure. Noninvasive ventilation (NIV) may be considered to rescue extubation failure [16,17]. Patients who withstood NIV without reintubation for more than 72 h after extubation were classified as having a successful extubation. There were 161 patients treated with NIV, and 29 patients needed reintubation within 72 h. Demographic and clinical information, laboratory results, comorbidities, and the severity scores of all patients were collected. Chi-Mei Medical Center's Institutional Review Board approved the study protocol (IRB no. 10706-009).

Constructing Training Data Set
All features were extracted from the original dataset. The data of all patients was normalized to have an overall mean of 0 and a standard deviation of 1. After data processing, there were 37 input features, each of which were chosen for their wide availability in ICUs, and two outputs, each of which represented a prediction of successful or failed extubation.

Data Description
The entire data set was comprised of 3602 data points. In both the training and test data sets, the positive class was dominant: 3416 of 3602 (94.8%) patients had a successful extubation. The ratio between successfully and unsuccessfully extubated patients was 1:18.47. The data were split into training and test sets at approximately a 9:1 ratio, which was chosen in accordance with other ANN research [18]. The 3242 data points were randomly allocated to the train set and 360 data points were randomly allocated to the test set.

Algorithm and Training
A multilayer perceptron (MLP) neural network was used to train the data. K-fold cross-validation with a k value of 10 was used over 10 epochs to select the best-performing hyperparameters, optimizers, and loss function. The three-layered model consists of one input layer with 37 dimensions, a hidden layer of 19 dimensions, and an output layer of 2 dimensions. The network was trained using stochastic gradient descent with a mini-batch size of 1. The network was optimized using Adam with default parameters as described by Kingma et al. [9]. The neural network was trained for 60 epochs. The Scaled Exponential Linear Unit (SeLU) activation function was used at each layer, and Softmax was used at the output layer [15]. A 20% dropout rate (a simple way to prevent neural networks from overfitting) was applied to the input layer and a 50% dropout rate was applied to output layer [19]. The categorical cross-entropy error function for binary classifiers was used as the loss function. Each data point was weighted based on its outcome ratio; this was done to ensure that the output of the neural network was not heavily skewed toward the dominant class.

Statistical Analyses
Mean values, standard deviations, and group sizes were used to summarize the results for continuous variables. The differences between the successful and failed extubation groups at hospital discharge were examined using univariate analysis with a Student's t test and a χ 2 test. Significance was set at p < 0.05. Predetermined variables, or those significantly associated with successful extubation in univariate analysis (p < 0.05), were tested for interaction using multivariate logistic regression analysis. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated. SPSS 24.0 for Windows (SPSS, Inc., Chicago, IL, USA) was used for all statistical analyses.
Because the data distribution was unbalanced, accuracy was not a reliable measurement of predictor performance [23]. Instead, the weighted averaged recall (sensitivity), precision (positive predictive value [PPV]), and F 1 scores (harmonic mean of sensitivity and precision) were used to measure ANN performance. The value of ideal recall, precision, and F 1 scores = 1 [24]. All three scores were calculated for the test set and for all data.
The ANN performance was also measured using the area under the receiving operating characteristic (ROC) curve. The area under the ROC curve (AUC) of the neural network was compared against the AUC of variables that had significantly different outcomes. The AUC was also compared against the ideal value of 1 [25].
To ensure that it is the ANN and not the individual variables that improve the prediction, the ROC of the ANN was compared with that of a composite score created from relevant variables. To create a composite score that was representative of individual variables, principal component analysis (PCA) was first performed on the significant variables. A composite score was then created by the results of the multivariate analysis. The variable weightings in the composite score were based on its correlation with the first principal component. Table 1 shows the demographic and clinical characteristics of the sample of ICU patients with planned extubation. Of the 3602 patients included in the study, 50.9% were male and 49.1% were female. Patients with extubation failure were older than the successful extubation group (p < 0.001). In addition, patients with extubation failure had higher Acute Physiology and Chronic Health Evaluation (APACHE) II scores (18.9 ± 7.0 vs. 16.2 ± 7.4) and therapeutic intervention scoring system (TISS) scores (29.3 ± 7.5 vs. 27.1 ± 7.8) than the successful extubation group (both p < 0.001). Regarding weaning parameters, there is a significant difference in the TISS score, maximum expiratory pressure (MEP), and rapid shallow-breathing index (RSI) between the patients in the failed and successful extubation groups (all p < 0.05). Overall, failed extubation patients had longer duration of mechanical ventilation (MV) uses (140.8 ± 145.8 h vs 106 ± 126.9 h) than patients with successful extubation (p = 0.002). Multivariate analyses showed that failed extubations were positively associated with TISS scores, chronic hemodialysis, RSI, and pre-extubation heart rate, but negatively associated with pre-extubation PaO 2 /FiO 2 and MEP (Table 2).

Results of Artificial Neural Networks (ANN)
The overall performance of the ANN model was shown in Table 3. The weighted k-fold accuracy of the ANN (k = 10) was 0.94.  Figure 1 shows the ROC curve of the ANN, TISS, MEP, and RSI on all patient data. The AUC in the test set of the ANN model was 0.85 (95% CI: 0.82-0.87, p < 0.001), which was better than any one of the following predictors: 0.58 (95% CI: 0.54-0.62, p < 0.001) for TISS, 0.58 (95% CI: 0.53-0.62, p < 0.001) for MEP, and 0.54 (95% CI: 0.49-0.58, p = 0.097) for RSI. Whether there was a significant difference between the ANN and other variables was determined using the DeLong test on the AUC [26]. There is a significant difference between the AUC for the ANN and the AUC for TISS (z = 10.71, p < 0.0001), MEP (z = 10.95, p < 0.0001), and RSI (z = 12.52, p < 0.0001).
The weight of each variable in the composite score used as a point-of-comparison are summarized in Table 4. [26]. There is a significant difference between the AUC for the ANN and the AUC for TISS (z = 10.71, p < 0.0001), MEP (z = 10.95, p < 0.0001), and RSI (z = 12.52, p < 0.0001).  Table 4.    Figure 2 shows the ROC curve of the ANN and the composite score. The AUC of the combined score was 0.64 (95% CI 0.60-0.68, p < 0.001). The AUC of the ANN was significantly better than the AUC of the combined score (z = 8.79, p < 0.0001).

Hematocrit (%)
0.643 BUN −0.033 Figure 2 shows the ROC curve of the ANN and the composite score. The AUC of the combined score was 0.64 (95% CI 0.60-0.68, p < 0.001). The AUC of the ANN was significantly better than the AUC of the combined score (z = 8.79, p < 0.0001).

Discussion
It was found that a neural network model is a good predictor for successful extubation. While other weaning parameters, such as tidal volume, frequency, minute ventilation, MEP, and RSI, are used to help assess the weaning process, they did not yield a high degree of accuracy in predicting extubation outcomes [27][28][29][30]. It was also found that the predictive performance of ANN was better than those of RSI and MEP. This is consistent with a previous study [11] which reported that the proposed ANN yielded better discrimination for predicting successful extubation than did the RSI and PI max. Also, it was found that the ANN yielded a better performance than a composite score based on significant variables created using PCA. Moreover, the ANN in this study was created and trained based on the data of 3602 ready-to-wean patients, far more than in Kuo et al. [11]. Finally, the ANN algorithm provided useful information about the optimal time to extubate.
Several other studies have tried to find appropriate predictors of successful weaning and presented different findings. In other studies, the factors that predicted failed weaning were older age, pulmonary cause of intubation, and lower mean arterial pressure [13,31]. The prediction of successful extubation have also been reported: being female and low blood urea creatinine [32]; MIP and arterial carbon dioxide tension (PaCO2) [33]; and respiratory rate, RSI, MIP, and APACHE II scores [27]. All these factors were included in the current ANN algorithm; thus, it should provide an accurate prediction based on comprehensive information.
Previous ANNs were usually developed using proprietary software such as Statistica (TIBCO Software Inc., San Francisco, CA, USA) and SPSS. The free and open source Tensorflow framework was used to create our ANN. Tensorflow has a fast update cycle and frequently incorporates newer neural network configurations. In contrast, SPSS has a slow update cycle and fewer configurations. For instance, SPSS 25 does not have the Adam optimizer, which was used in the present ANN.

Discussion
It was found that a neural network model is a good predictor for successful extubation. While other weaning parameters, such as tidal volume, frequency, minute ventilation, MEP, and RSI, are used to help assess the weaning process, they did not yield a high degree of accuracy in predicting extubation outcomes [27][28][29][30]. It was also found that the predictive performance of ANN was better than those of RSI and MEP. This is consistent with a previous study [11] which reported that the proposed ANN yielded better discrimination for predicting successful extubation than did the RSI and PI max. Also, it was found that the ANN yielded a better performance than a composite score based on significant variables created using PCA. Moreover, the ANN in this study was created and trained based on the data of 3602 ready-to-wean patients, far more than in Kuo et al. [11]. Finally, the ANN algorithm provided useful information about the optimal time to extubate.
Several other studies have tried to find appropriate predictors of successful weaning and presented different findings. In other studies, the factors that predicted failed weaning were older age, pulmonary cause of intubation, and lower mean arterial pressure [13,31]. The prediction of successful extubation have also been reported: being female and low blood urea creatinine [32]; MIP and arterial carbon dioxide tension (PaCO 2 ) [33]; and respiratory rate, RSI, MIP, and APACHE II scores [27]. All these factors were included in the current ANN algorithm; thus, it should provide an accurate prediction based on comprehensive information.
Previous ANNs were usually developed using proprietary software such as Statistica (TIBCO Software Inc., San Francisco, CA, USA) and SPSS. The free and open source Tensorflow framework was used to create our ANN. Tensorflow has a fast update cycle and frequently incorporates newer neural network configurations. In contrast, SPSS has a slow update cycle and fewer configurations. For instance, SPSS 25 does not have the Adam optimizer, which was used in the present ANN.
This study has some limitations. First, the rate of extubation failure was particularly low in this study (5%) compared with rates in the literature (about 15%). In the present study, the final decision to extubate was made by the intensivists treating the intubated patients. Thus, it was possible that they did not follow the weaning and extubation protocol. Second, delayed extubation might have occurred in this study. Third, the dataset used for this project was from December 2009 through December 2011; thus, the rapid advancements in sedation practices, delirium awareness, early mobility, anesthesia and pain management, and ventilator capacity over the last decade might be a significant confounder to the utility of this work. Fourth, the selection of variables for the model was based only on the widespread availability of these data. This "availability" may depend from the type and habits of each ICU. The findings may not be generalized to other ICUs. Finally, patients who needed reintubation or died within 72 h after a planned extubation were classified as having a failed extubation. As NIV can postpone the need for reintubation, a period of 7 days after extubation is required for a more accurate definition of extubation failure when NIV is of broad use [34].

Conclusions
An extubation strategy for all ventilated ICU patients should be thoroughly planned. The present study shows the parameters used to predict a successful planned extubation using an ANN. Failed extubations were positively associated with TISS, RSI, pre-extubation heart rate, and chronic hemodialysis, but negatively associated with MEP and pre-extubation PaO 2 /FiO 2 . Furthermore, this present ANN model efficaciously predicted successful planned extubations in ICU patients.