Prediction of Mortality after Burn Surgery in Critically Ill Burn Patients Using Machine Learning Models

Severe burns may lead to a series of pathophysiological processes that result in death. Machine learning models that demonstrate prognostic performance can be used to build analytical models to predict postoperative mortality. This study aimed to identify machine learning models with the best diagnostic performance for predicting mortality in critically ill burn patients after burn surgery, and then compare them. Clinically important features for predicting mortality in patients after burn surgery were selected using a random forest (RF) regressor. The area under the receiver operating characteristic curve (AUC) and classifier accuracy were evaluated to compare the predictive accuracy of different machine learning algorithms, including RF, adaptive boosting, decision tree, linear support vector machine, and logistic regression. A total of 731 patients met the inclusion and exclusion criteria. The 90-day mortality of the critically ill burn patients after burn surgery was 27.1% (198/731). RF showed the highest AUC (0.922, 95% confidence interval = 0.902–0.942) among the models, with sensitivity and specificity of 66.2% and 93.8%, respectively. The most significant predictors for mortality after burn surgery as per machine learning models were total body surface area burned, red cell distribution width, and age. The RF algorithm showed the best performance for predicting mortality.


Introduction
Burns are one of the most devastating types of traumatic injuries, causing morbidity and mortality worldwide. Burn wounds induce an excessive inflammatory response that triggers the immune system to protect against risk of infection, which can be harmful and potentially fatal [1]. The inflammatory mediators produced and released after a burn injury affect microcirculation, resulting in significant hypovolemic shock and substantial tissue injury [2]. The challenges of resuscitation and treatment with potential adverse outcomes have led to advances in the prediction of risk factors. Early detection and recognition of risk factors associated with mortality are essential in the management of a burn injury.
Machine learning is a type of artificial intelligence (AI) that leads to a superior prediction ability compared with conventional models, and has gained recent prominence [3]. Machine learning models have gained attention for their diagnostic performance, which can automatically build analytical models to predict postoperative mortality [4]. Recently, machine learning explainability is emphasized on decisions based on predictions provided by machine learning algorithms, which aid in making decisions to adopt the model [5]. Studies have reported the importance of explainable machine learning which can be applied to predict risk factors for mortality [6]. Mortality prediction is considered crucial in the early management of burn injuries, which can affect patient outcomes. Studies on 2 of 11 machine learning models for mortality prediction in burn injuries have been in progress for decades [7,8]. The application of machine learning to burn injuries enables clinicians to reveal patterns and observe correlations that are not disclosed by traditional linear statistical analysis [9]. Efforts have been made to demonstrate the potential of machine learning approaches in predicting mortality in burn patients [10,11]. Not only for mortality prediction, machine learning is also being studied for the prediction of sepsis and acute kidney injury in burn patients, which are issues of concern [12]. However, the performance in predicting mortality after burn surgery using different machine learning techniques has not been clearly elucidated.
The aim of the study was to identify machine learning models with the best diagnostic performance for predicting mortality in patients after burn surgery and to compare each model's suitability for this purpose. Our analysis used the following machine learning algorithms: random forest (RF), adaptive boosting (AB), decision tree (DT), support vector machine (SVM), and logistic regression (LGR). This study may help validate the use of machine learning models for applications in clinical practice.

Study Population
Critically ill burn patients admitted to the intensive care unit (ICU) before burn surgery from January 2010 to February 2018 were recruited. Patients with burns on more than 20% of their total body surface area (TBSA) are defined as critically ill burn patients [13]. Data immediately before the first burn surgery under general anesthesia were collected. The inclusion criteria were patients who underwent burn surgery within 14 days of a burn event, while patients under 18 years of age, those who underwent local anesthesia, and those with known chronic kidney disease were excluded from the study. We reviewed the electronic medical records of the patients to obtain laboratory and clinical data. This retrospective study was approved by the Institutional Review Board of the Ethical Committee of Hangang Sacred Heart Hospital, Hallym University, Seoul, Republic of Korea (No. 2018-057). The informed consent was waived by the committee due to the nature of retrospective design.
The primary outcome was the identification of risk factors for 90-day mortality after burn surgery using machine learning. The secondary outcome was the selection of the machine learning model with the best prediction performance.

Data Collection
Demographic data, laboratory data, and other patient variables were reviewed and collected using the electronic medical records system. Preoperative characteristics of the patients included sex, age, body mass index, history of hypertension or diabetes, American Society of Anesthesiologists physical status (ASA PS), TBSA burned, and the presence of inhalation injury. "TBSA burned" included a certain percentage of the body surface with a second-or third-degree burn. The presence of inhalation injury was diagnosed by bronchoscopic findings, with any findings other than normal considered an indication of the presence of inhalation injury. All preoperative blood tests were performed early in the morning of the day of surgery or the day before surgery. These preoperative laboratory data include hemoglobin, platelet count, prothrombin time (PT), albumin, creatinine, red cell distribution width (RDW), neutrophil-lymphocyte ratio (NLR), platelet-lymphocyte ratio (PLR), monocyte-lymphocyte ratio (MLR), and systemic immune-inflammation index (SII). NLR, PLR, MLR, and SII were each calculated using complete blood count (CBC) information. SII was calculated using the following formula: (granulocyte × platelet)/lymphocyte.

Primary Analysis of the Dataset
The baseline characteristics and laboratory findings were compared in the survivor and non-survivor groups 90 days after burn surgery. Risk factors for 90-day mortality after burn surgery were also identified using univariate and multivariate logistic regression analysis. The significant factors in univariate logistic regression were analyzed using the backward stepwise elimination procedure of multivariate logistic regression analysis. A two-tailed p-value < 0.05 was considered statistically significant. All statistical analyses were performed using SPSS for Windows (version 24.0; IBM-SPSS Inc., Armonk, NY, USA).

Clinical Feature Selection and Classification Method Using Machine Learning
Although many quantitative features can be extracted from medical datasets, they may be highly correlated with each other, or simply noise. Thus, it is important to select a subset of features to enhance the performance and minimize the computational cost. Feature selection using RF regressor and 20 repeated 10-fold stratified cross-validations were performed to avoid overfitting in limited datasets ( Figure 1) [14]. Important clinical features for predicting mortality in patients after burn surgery were selected using a RF regressor in Python (Python Software Foundation, version 3.7.4, Fredericksburg, VA, USA) with the Scikit-learn package [https://github.com/scikit-learn/scikit-learn (accessed on 25 September 2021)) [15]. A RF classifier model was trained to use the selected features to predict mortality [16]. The receiver operating characteristic (ROC) curve and classifier accuracy were used to compare the predictive accuracy of the RF, AB, DT, SVM, and LGR algorithms. Statistical differences in the AUC of each classifier were compared using a machine learning model with DeLong's test using R (version 3.5.1; R Foundation for Statistical Computing, Vienna, Austria), with p-values < 0.05 considered statistically significant.
after burn surgery were also identified using univariate and multivariate logistic regression analysis. The significant factors in univariate logistic regression were analyzed using the backward stepwise elimination procedure of multivariate logistic regression analysis. A two-tailed p-value < 0.05 was considered statistically significant. All statistical analyses were performed using SPSS for Windows (version 24.0; IBM-SPSS Inc., Armonk, NY, USA).

Clinical Feature Selection and Classification Method Using Machine Learning
Although many quantitative features can be extracted from medical datasets, they may be highly correlated with each other, or simply noise. Thus, it is important to select a subset of features to enhance the performance and minimize the computational cost. Feature selection using RF regressor and 20 repeated 10-fold stratified cross-validations were performed to avoid overfitting in limited datasets ( Figure 1) [14]. Important clinical features for predicting mortality in patients after burn surgery were selected using a RF regressor in Python (Python Software Foundation, version 3.7.4, Fredericksburg, VA, USA) with the Scikit-learn package [https://github.com/scikit-learn/scikit-learn (accessed on 25 September 2021)) [15]. A RF classifier model was trained to use the selected features to predict mortality [16]. The receiver operating characteristic (ROC) curve and classifier accuracy were used to compare the predictive accuracy of the RF, AB, DT, SVM, and LGR algorithms. Statistical differences in the AUC of each classifier were compared using a machine learning model with DeLong's test using R (version 3.5.1; R Foundation for Statistical Computing, Vienna, Austria), with p-values < 0.05 considered statistically significant.

Algorithms of Each Machine Learning Model
The RF algorithm is an ensemble of many decision trees, which are non-linear models on various sub-samples of the dataset and calculate averaging to improve the predictive accuracy and prevent overfitting [16]. The importance of each feature is computed from the RF package of Scikit-learn. The RF algorithm is also known as the Gini importance.
where K is the number of classes, and P is the probability of each class.
The AB classifier is a meta estimator that first fits a classifier on the original dataset, and then fits additional copies of the classifier on the same dataset with the weights of incorrectly classified instances adjusted, such that the subsequent classifiers focus more on difficult cases [17]. The AB classifier is also calculated using the following equation: DT is a non-parametric supervised learning method for classification and regression. The goal of this method is to generate a model predicting a target value by learning simple decision rules inferred from the data features [18]. A tree can be seen as a piecewise constant approximation. For a classification outcome with values 0, 1, . . . , K − 1, for node, is the proportion of class k observations in node m. If m is a terminal node, the predicted probability for this region is set to P mk . SVM constructs hyperplanes in a high or infinite dimensional space, which can be used for classification, regression, or other tasks. Intuitively, a good separation is achieved by a hyperplane that has the maximum gap to the nearest training data points of any class, because typically, the larger the margin, the lower the generalization error of the classifier [19,20]. The primal problem can be equivalently formulated as: where we make use of epsilon-insensitive loss, i.e., errors of less than ε are ignored. This is the form that is directly optimized by linear support vector regression (SVR).
LGR is a linear model for classification rather than regression. It quantifies the relationship between a dependent categorical outcome and one or more independent predictor variables. This implementation can fit binary, One-vs.-Rest, or multinomial logistic regression with optional l 1 , l 2 [21,22]. As an optimization problem, binary class l 1 penalized logistic regression minimizes the following cost function: Similarly, l 2 regularized logistic regression solves the following optimization problem:

Discussion
In this study, we applied a machine learning approach to clinical features to compare models that predict patient mortality after burn surgery. RF achieved the highest AUC

Discussion
In this study, we applied a machine learning approach to clinical features to compare models that predict patient mortality after burn surgery. RF achieved the highest AUC (0.922) among the evaluated models. Additionally, the pairwise comparisons of AUCs demonstrated that RF showed no statistical difference with AB. However, comparisons between RF and DT, SVM, and LGR showed a significant difference.
Using machine learning, the current study identified the most significant predictors of mortality after burn surgery as TBSA burned, RDW, and age. Among the 11 clinical features analyzed, TBSA burned constituted almost 30% of the feature importance. The feature importance of the other clinical features in descending order is RDW, age, creatinine, platelet count, PLR, prothrombin time, ASA PS, albumin, hemoglobin, and SII, with each forming less than 10% of the importance. TBSA burned is well known for its strong association with mortality in burn patients [23]. Additionally, RDW and age showed high feature importance among the clinical features. Clinical laboratory results such as creatinine, platelet count, PT, and PLR are significant risk factors in burn patients. This result is consistent with previous studies using classic logistic regression analysis [24,25].
The extent of injury is described using the percentage of the TBSA affected by a burn. The evaluation of TBSA burned is important for the initial burn management to estimate fluid requirements. TBSA burned is known to be a risk factor of mortality in burn injury, because higher TBSA leads to a poor prognosis [26]. Age is another well-known risk factor of mortality in burn patients. The underlying medical conditions, impaired response to infection, decreased ability to tolerate stress and physiological insult, and poor nutritional status associated with old age may cause adverse outcomes in elderly patients after burn injury [27,28].
Several preoperative laboratory variables have been analyzed for their predictive ability of mortality in burn patients. CBC is a routinely applied laboratory blood test for most patients. The unique components analyzed by CBC are known to be related with inflammation or infection that affects the prognosis of medical conditions [24]. Of these simple blood biomarkers, RDW is a numerical measurement of the range in the volume and size of the erythrocytes. An increase in RDW may reflect conditions that modify erythrocyte shapes as a result of premature release of immature cells into the bloodstream, as in the case of massive blood loss [29]. In addition, reports have shown that inflammation contributes to an increased RDW by inhibiting the production of erythropoietin or by decreasing erythrocyte survival [30,31]. Recently, RDW's prognostic ability to predict morbidity and mortality in various clinical conditions has been demonstrated [32]. In burn patients, high RDW has been associated with adverse outcomes with mortality, but not as an independent risk factor [33,34]. However, in this study, we found that preoperative RDW is an independent predictive factor for 90-day mortality in patients after burn surgery using multivariate regression, as well as in the evaluations using machine learning.
Machine learning is a subset of AI that develops algorithms and technologies that enable computers to learn. Machine learning is a statistical method for extracting regularities from data. Machine learning uses various models or algorithms to extract data, predict, and classify their laws. Application of machine learning has advanced recently in various aspects of medicine [3]. Logistic regression is a traditional model commonly employed in medical applications to interpret clinical data in depth. Recent machine learning models include RF, AB, DT, SVM, and LGR, which are methods used to find a more optimal predictive model [35,36].
Among these machine learning models, our study demonstrated that RF showed the best performance in terms of predicting mortality in patients after burn surgery. Additionally, RF was not significantly different from AB. Despite the high AUC values of RF and AB, PPV and NPV were not high. Thus, the selection of the appropriate machine learning model to be used in clinical situations depends on the user.
Machine learning approaches have recently been reported to have better predictive abilities than classic statistical analysis. Regarding machine learning techniques in burn research, burn injury and management can be recognized as patterns that can capture non-linearities shown in independent features such as TBSA burned, age, or inhalation injury, which is different from conventional statistical approaches [37]. Another study about predicting mortality of burn patients was conducted using artificial neural networks, which included 15 clinical features, including inhalation injury, TBSA burned, and admission period [38]. To our knowledge, the current study is the first attempt at evaluating the clinical features of the patients to assess 90-day mortality after burn surgery using machine learning, with AUC as the performance metric.
The conducted analysis had some limitations. First, since this is a single-center study; institutional characteristics may have contributed to the survival of the burn patients. Perioperative clinical management might have changed over the eight-year period over which the patient data were collected. Thus, the results cannot be generally applied to burn patients. However, since the data used in this study were collected in the largest burn center in Asia, which performs standardized burn surgery, the effects on the present outcome may be minimal. Second, there was data loss or inaccurate data due to the retrospective design, which resulted in a relatively small dataset. Third, the models use the baseline preoperative characteristics without postoperative data. Although a dynamic model with sequential data may be superior, the model in our study predicts mortality within a specific period, which may be significant. Fourth, this study did not include the machine learning explainability techniques which may have provided a better understanding of how the models yield to their predictions. Finally, the additional data not included in our clinical features may have improved prediction. Further prospective studies are needed concerning these additional data with common clinical features for clinical acceptance.

Conclusions
This study demonstrated that the most significant predictors for 90-day mortality after burn surgery are percentage of burned TBSA, RDW, and age, using machine learning techniques. The RF algorithm showed the best performance for predicting mortality among the machine learning models evaluated. Pairwise comparisons demonstrated that RF showed no statistical difference with AB. However, comparisons between RF and DT, SVM, or LGR showed a significant difference. Further investigation in the future on a larger cohort with composite factors may help support the validity of the machine learning models.
Author Contributions: Conceptualization, methodology, writing-original draft preparation, data curation, J.H.P.; methodology, validation, software, Y.C.; investigation, formal analysis, data curation, writing-review and editing, D.S.; visualization, conceptualization, validation, methodology, supervision, S.-S.C. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Ethical review and approval were waived for this study due to the nature of retrospective design of the study.
Informed Consent Statement: Patient consent was waived due to the nature of retrospective design of the study.