Estimation of Neonatal Intestinal Perforation Associated with Necrotizing Enterocolitis by Machine Learning Reveals New Key Factors

Intestinal perforation (IP) associated with necrotizing enterocolitis (NEC) is one of the leading causes of mortality in premature neonates; with major nutritional and neurodevelopmental sequelae. Since predicting which neonates will develop perforation is still challenging; clinicians might benefit considerably with an early diagnosis tool and the identification of critical factors. The aim of this study was to forecast IP related to NEC and to investigate the predictive quality of variables; based on a machine learning-based technique. The Back-propagation neural network was used to train and test the models with a dataset constructed from medical records of the NICU; with birth and hospitalization maternal and neonatal clinical; feeding and laboratory parameters; as input variables. The outcome of the models was diagnosis: (1) IP associated with NEC; (2) NEC or (3) control (neither IP nor NEC). Models accurately estimated IP with good performances; the regression coefficients between the experimental and predicted data were R2 > 0.97. Critical variables for IP prediction were identified: neonatal platelets and neutrophils; orotracheal intubation; birth weight; sex; arterial blood gas parameters (pCO2 and HCO3); gestational age; use of fortifier; patent ductus arteriosus; maternal age and maternal morbidity. These models may allow quality improvement in medical practice.


Study Design and Ethical Approvals
This was an observational retrospective study. Data were obtained from patient records hospitalized in the Neonatal Intensive Care Unit (NICU) of a tertiary care hospital from January 2015 to August 2017. Informed consent was not required (protocol approved by the Institutional Research and Ethics Boards of the Instituto Nacional de Perinatología Isidro Espinosa de los Reyes, certificate number: 2017-2-65).
This work was designed to forecast IP related to NEC and to investigate the predictive quality of variables. For this purpose, we developed two estimation models for intestinal perforation associated with NEC: (a) an ANN model at birth and (b) an ANN model at birth and during hospitalization. Three groups of neonates were compared: (1) control group without NEC nor intestinal perforation but with similar gestational age (N = 27), (2) NEC group (according to Bell's staging criteria) (N = 23) and (3) Intestinal perforation associated with NEC group (Bell's Stage IIIB) (N = 26). We excluded 15 cases with incomplete clinical information as well as spontaneous or not associated with NEC intestinal perforation, as well as digestive tract malformations. Control neonates were not stage I NEC (or any NEC stage). We carefully collected a balanced dataset with a similar number of samples per group.
Diagnosis of NEC and intestinal perforation associated with NEC was defined according to modified Bell's staging criteria [28] modified by Walsh [29]. NEC patients included those presenting bedside KUB radiographic findings described in stages IIA, IIB, and IIIA as follows: Ileus with dilated bowel loops and focal pneumatosis, or widespread pneumatosis, or portal venous gas with or without ascites, without free air. Patients with intestinal perforation related to NEC included those presenting radiographic findings described in stage IIIB as follows: Pneumoperitoneum.

Dataset
Taking into account variables already reported in the literature, as well as others proposed during meetings with the clinical staff, we chose 113 variables that included maternal and neonatal data recorded at birth and during hospitalization. Within these, we collected maternal and neonatal clinical/demographic data (maternal age, maternal morbidity, gestational age, birth weight), diagnosis, oxygen therapy at birth, enteral feeding, laboratory, and clinical findings. Routine laboratory tests are performed for all premature neonates <35 weeks and, within the first 24 h for closer monitoring, and included arterial pH, blood gases and hematologic data. Maternal obesity was defined as a Body Mass Index (BMI, calculated as weight (kg)/height (m 2 ) greater or equal to 30 (World Health Organization). Chorioamnionitis was defined as an acute inflammation or infection of any combination of fetal membranes, amniotic fluid, decidua and chorion of the placenta. Numerical data were expressed by a number, whereas absence or presence (no or yes) was expressed as 0 or 1, respectively. Differences in numerical variables between groups were analyzed by ANOVA, and categorical variables were analyzed by Pearson's Chi-square test (SPSS version 22, IBM, Armonk, NY, USA). Twenty-three variables were chosen for the IP associated with NEC model at birth, while 35 parameters were selected for the IP associated with NEC model at birth and during hospitalization (data were taken 24 h before intestinal perforation diagnosis). The anthropometric/clinical maternal and neonatal characteristics are depicted in Tables 1 and 2, respectively.

ANN (Learning, Testing, and Validation)
The architecture of ANN models comprised three layers of neurons (nodes) connected together: an input layer (parameters predicting the outcome), a hidden layer (activation transfer functions) and an output layer: the prediction of diagnosis, either (1) no NEC nor IP, (2) NEC or (3) IP associated with NEC. The database (N = 76 neonates) was randomly divided into training (80%) and testing, validation (20%). The input variables were normalized in the range of 0.1 to 0.9, as previously described [26], in order to prevent within-patient differences in variation and amplitude among variables, and the output variable was not normalized. The Back-propagation neural network (BPNN) was used to train and test ANN models using the Levenberg-Marquardt algorithm [30], as previously explained [26,27]. Briefly, in the hidden layer, one to <5 neurons were applied until the Root Mean Square Error (RMSE) between the experimental data (Target) and predicted values (network) was <10 −12 , as well as validation of the model by the slope and intercept statistical test (see Section 2.3.3) and avoiding overfitting (performance evaluation of the model through training, testing, and validation).
A representative ANN architecture for intestinal perforation associated with NEC model is depicted in Figure 1, with 23 input variables at birth and 1 output variable: diagnosis (No NEC, NEC or IP associated with NEC).

Figure 1.
A representative network architecture of Intestinal perforation (IP) model. The learning procedure used by ANN for the estimation of IP associated with NEC from 23 maternal and neonatal variables at birth (maternal age, maternal morbidity, chorioamnionitis, prenatal antibiotic, number of offsprings, premature rupture of membranes, gestational age, oxygen therapy (indirect oxygen, Tpiece, continuous positive airway pressure, positive pressure ventilation cycles, orotracheal intubation), birth weight, sex, arterial pH, blood gas (CO2, HCO3, base deficit), diastolic arterial blood pressure, number of total leukocytes, neutrophils, platelets and catheter location), trained by the Levenberg-Marquardt optimization algorithm. The same architecture was used for IP estimation with birth and hospitalization variables.

Statistical Test for ANN Model Validation
We applied a statistical test (slope and intercept test [31]) in which the upper and lower intervals of the slope and intercept from linear regression models of the experimental database versus the simulated ones (learning and validation database) must approach 1.0 and 0, respectively; with a 99.8% confidence level according to the Student T-test.
The regression coefficient (R 2 ) was then obtained from linear regression models for each ANN model:

Sensitivity Analysis
In order to identify key factors that play an important role in predicting intestinal perforation associated with NEC, we performed a sensitivity analysis to the trained and validated neural network, as previously described ( [26,27] and Garson algorithm in Appendix B), allowing to determine which input variables (maternal and neonatal parameters) are more important (or Figure 1. A representative network architecture of Intestinal perforation (IP) model. The learning procedure used by ANN for the estimation of IP associated with NEC from 23 maternal and neonatal variables at birth (maternal age, maternal morbidity, chorioamnionitis, prenatal antibiotic, number of offsprings, premature rupture of membranes, gestational age, oxygen therapy (indirect oxygen, T-piece, continuous positive airway pressure, positive pressure ventilation cycles, orotracheal intubation), birth weight, sex, arterial pH, blood gas (CO 2 , HCO 3 , base deficit), diastolic arterial blood pressure, number of total leukocytes, neutrophils, platelets and catheter location), trained by the Levenberg-Marquardt optimization algorithm. The same architecture was used for IP estimation with birth and hospitalization variables.

Statistical Test for ANN Model Validation
We applied a statistical test (slope and intercept test [31]) in which the upper and lower intervals of the slope and intercept from linear regression models of the experimental database versus the simulated ones (learning and validation database) must approach 1.0 and 0, respectively; with a 99.8% confidence level according to the Student T-test.
The regression coefficient (R 2 ) was then obtained from linear regression models for each ANN model:

Sensitivity Analysis
In order to identify key factors that play an important role in predicting intestinal perforation associated with NEC, we performed a sensitivity analysis to the trained and validated neural network, as previously described ( [26,27] and Garson algorithm in Appendix B), allowing to determine which input variables (maternal and neonatal parameters) are more important (or sensible) to attain precise output values (diagnosis).

Results
The aim of this study was to obtain ANN models estimating intestinal perforation associated with NEC (IP) in order to differentiate from NEC or no NEC (Control) diagnosis and to investigate key factors for the prediction. For this purpose, extracted data from maternal and neonatal records were used to train two BPNN models: (1) an IP model with birth variables or (2) an IP ANN model with birth and hospitalization parameters. Such design will allow to explore the importance of risk factors at birth compared to hospitalization parameters, obtained from a sensitivity analysis of both models.
All neonates with IP associated with NEC diagnosis in NICU (N = 27) were chosen while NEC (N = 23) or control groups (N = 27) were carefully selected in order to have similar gestational ages in all groups. Neonatal birth weight was significantly higher in the control group (1384 ± 95.5 g) compared to NEC or IP groups (1085 ± 65.8 g and 1141 ± 65.21 g, respectively; see Table 2, p < 0.05).
For both models, distinct parameters (number of neurons) and transfer functions were tested, finding the best performance to be the hyperbolic tangential function (TANSIG) in the hidden layer and the Log-sigmoid function (LOGSIG) in the output layer. For both models, 30,000 runs (with 1000 epochs) were applied in the hidden layer (from one neuron to two or three neurons). For the ANN model predicting IP at birth, the final architecture was 23 input variables, 3 neurons in the hidden layer, and 1 neuron in the output layer (IP associated with NEC, NEC or no NEC diagnosis): 23-3-1; while the final topology for the ANN IP model at birth and during hospitalization was 35-2-1. The representative neural architecture for the estimation of IP diagnosis is depicted in Figure 2, while the equations, weights, and biases of both models are reported in Appendices B and C (Equations (A1)-(A6) and Tables A1 and A2). were used to train two BPNN models: (1) an IP model with birth variables or (2) an IP ANN model with birth and hospitalization parameters. Such design will allow to explore the importance of risk factors at birth compared to hospitalization parameters, obtained from a sensitivity analysis of both models. All neonates with IP associated with NEC diagnosis in NICU (N = 27) were chosen while NEC (N = 23) or control groups (N = 27) were carefully selected in order to have similar gestational ages in all groups. Neonatal birth weight was significantly higher in the control group (1384 ± 95.5 g) compared to NEC or IP groups (1085 ± 65.8 g and 1141 ± 65.21 g, respectively; see Table 2, p < 0.05).
For both models, distinct parameters (number of neurons) and transfer functions were tested, finding the best performance to be the hyperbolic tangential function (TANSIG) in the hidden layer and the Log-sigmoid function (LOGSIG) in the output layer. For both models, 30,000 runs (with 1000 epochs) were applied in the hidden layer (from one neuron to two or three neurons). For the ANN model predicting IP at birth, the final architecture was 23 input variables, 3 neurons in the hidden layer, and 1 neuron in the output layer (IP associated with NEC, NEC or no NEC diagnosis): 23-3-1; while the final topology for the ANN IP model at birth and during hospitalization was 35-2-1. The representative neural architecture for the estimation of IP diagnosis is depicted in Figure 2, while the equations, weights, and biases of both models are reported in Appendices B and C (Equations (A1)-(A6) and Tables A1 and A2). Both intestinal perforation associated with NEC models had a good accuracy, with regression coefficients of R 2 = 0.9764 and R 2 = 0.98029 for the IP model with birth variables ( Figure 3A) and the IP model with birth and hospitalization parameters ( Figure 3B), respectively, evaluated by the linear regression between the experimental and simulated data, as well as the statistical tests from these plots, with a 99.8% confidence level for all determinations (Tables A3 and A4 in Appendix C). Both intestinal perforation associated with NEC models had a good accuracy, with regression coefficients of R 2 = 0.9764 and R 2 = 0.98029 for the IP model with birth variables ( Figure 3A) and the IP model with birth and hospitalization parameters ( Figure 3B), respectively, evaluated by the linear regression between the experimental and simulated data, as well as the statistical tests from these plots, with a 99.8% confidence level for all determinations (Tables A3 and A4 in Appendix C). In order to identify which maternal and neonatal factors were critical for the prediction of intestinal perforation associated with NEC in both models, we performed a sensitivity analysis which depicts the importance of individual factors (inputs variables) in the modeling of the IP diagnosis (Figures 4 and 5). At birth, neonatal platelets number was the most significant parameter for IP prediction followed by the use of orotracheal intubation (OTI) as oxygen therapy, birthweight, sex, maternal age, neonatal diastolic blood pressure (DBP), pCO2 and gestational age (Figure 4).  In order to identify which maternal and neonatal factors were critical for the prediction of intestinal perforation associated with NEC in both models, we performed a sensitivity analysis which depicts the importance of individual factors (inputs variables) in the modeling of the IP diagnosis (Figures 4 and 5). At birth, neonatal platelets number was the most significant parameter for IP prediction followed by the use of orotracheal intubation (OTI) as oxygen therapy, birthweight, sex, maternal age, neonatal diastolic blood pressure (DBP), pCO 2 and gestational age (Figure 4).  In order to identify which maternal and neonatal factors were critical for the prediction of intestinal perforation associated with NEC in both models, we performed a sensitivity analysis which depicts the importance of individual factors (inputs variables) in the modeling of the IP diagnosis (Figures 4 and 5). At birth, neonatal platelets number was the most significant parameter for IP prediction followed by the use of orotracheal intubation (OTI) as oxygen therapy, birthweight, sex, maternal age, neonatal diastolic blood pressure (DBP), pCO2 and gestational age (Figure 4).  In the IP ANN model with birth and hospitalization parameters, the number of neonatal neutrophils, PDA, sex, pCO 2 , use of fortifier, maternal age, OTI, HCO 3 , indirect oxygen, and gastric residuals were the key factors with the highest relative contribution to the estimation of intestinal perforation. These variables were followed by birthweight, PPV, maternal age, first day of oral feeding, hypotension and early sepsis ( Figure 5). Overall, maternal factors accounted for 18.1% of the importance in estimating IP while neonatal birth variables were responsible for 44.4% (oxygen therapy 14%, arterial blood gas and laboratory findings 30.4%) and hospitalization factors for 37.5%. In the IP ANN model with birth and hospitalization parameters, the number of neonatal neutrophils, PDA, sex, pCO2, use of fortifier, maternal age, OTI, HCO3, indirect oxygen, and gastric residuals were the key factors with the highest relative contribution to the estimation of intestinal perforation. These variables were followed by birthweight, PPV, maternal age, first day of oral feeding, hypotension and early sepsis ( Figure 5). Overall, maternal factors accounted for 18.1% of the importance in estimating IP while neonatal birth variables were responsible for 44.4% (oxygen therapy 14%, arterial blood gas and laboratory findings 30.4%) and hospitalization factors for 37.5%. Figure 5. Relative contribution of each predictor variable to the estimation for the IP ANN model at birth and hospitalization. The relative influence histogram shows the mathematical importance of each predictor variable in the model evaluated by a sensitivity analysis. It is measured as a percentage of quantitative significance on the Y-axis for each predictor parameter at birth and during hospitalization.

Discussion
An emphasis in forecasting intestinal perforation associated with NEC from NEC alone was the objective of this work, which was attained with good performances by both models (at birth or at birth and during hospitalization), the regression coefficients between the experimental and predicted data were R 2 > 0.97. Learning to estimate perforation depends on all variables from individual cases working together in a multidimensional process in order to obtain a pattern of forecasting, and allowing for the non-linear relations between variables to be determined during the learning process, make these ANN models highly valuable for clinicians since prediction approaches personalized medicine.
Identification of critical factors and the assessment of how output changes by varying input variable values one by one, added knowledge in the field and will permit additional understanding of risk factors for the prediction of a future intestinal perforation associated with NEC. Previously unreported key variables for the prediction of IP in both models were: orotracheal intubation, arterial blood gas parameters (pCO2 and HCO3), use of milk fortifier and maternal age. Therefore, attention should be paid to these parameters.

Discussion
An emphasis in forecasting intestinal perforation associated with NEC from NEC alone was the objective of this work, which was attained with good performances by both models (at birth or at birth and during hospitalization), the regression coefficients between the experimental and predicted data were R 2 > 0.97. Learning to estimate perforation depends on all variables from individual cases working together in a multidimensional process in order to obtain a pattern of forecasting, and allowing for the non-linear relations between variables to be determined during the learning process, make these ANN models highly valuable for clinicians since prediction approaches personalized medicine.
Identification of critical factors and the assessment of how output changes by varying input variable values one by one, added knowledge in the field and will permit additional understanding of risk factors for the prediction of a future intestinal perforation associated with NEC. Previously unreported key variables for the prediction of IP in both models were: orotracheal intubation, arterial blood gas parameters (pCO 2 and HCO 3 ), use of milk fortifier and maternal age. Therefore, attention should be paid to these parameters.
The relative contribution of each predictor variable also allowed to verify that the models are doing what they are intended to do (estimation of intestinal perforation diagnosis), by finding variables involving literature-known factors, that may allow an early diagnosis and follow-up of premature neonates. With respect to previously described risk factors for intestinal perforation associated with NEC, lower birth weight [32][33][34], decreased gestational age [32,33], apnea episode [32], presence of sepsis [32], lower platelet count [6,32,35] were also obtained by our models. Risk factors contained in the final GutCheck model included gestational age, packed red blood cells transfusion, unit NEC rate, late-onset sepsis, multiple infections, hypotension treated with inotropic medications, Black or Hispanic race, birth in a different NICU and metabolic acidosis [13]. As well, risk parameters in a disease progression statistical regression model comprise gender, gestational age, and birth weight [13,36].
In the matched prospective multicenter cohort study by Berkhout et al., multivariable logistic regression modeling demonstrated only 2 independent variables to be associated with an increased risk of NEC: administration of predominantly formula feeding and the cumulative number of parenteral feeding days. Remarkably, administration of any antibiotics initiated within 24 h after birth was associated with a reduced risk of NEC [37].
We found that male gender was a highly predictive parameter for intestinal perforation associated with NEC compared to only NEC. There are only a few studies where male gender has been significantly associated to increased risk of NEC [36,38] or not [39]. Duci et al. reported that gender was not statistically significant when comparing patients with NEC treated medically vs. NEC requiring surgery [39,40]. However, more work is needed to conclude that males are more likely to progress to intestinal perforation. To include a greater number of patients from multicenter studies could clarify this.

Maternal Burden
In regard to maternal factors, older age (>38 years) followed by preeclampsia or hypertension determined perforation by NEC in the ANN models. In contrast, Lee et al. reported a lack of association between preeclampsia and NEC, demonstrating that neutrophil-to-lymphocyte ratio (NLR) at the time of admission and multiparity was associated with the occurrence of NEC [41]. Zhang et al. did not find a difference in perinatal factors including hypertensive disorders, diabetes mellitus, intrahepatic cholestasis, heart disease, hypothyroidism, premature rupture of membranes, placental abruption, antenatal steroid use or mode of delivery between NEC and control groups [41,42]. The literature, however, evaluating the association between maternal preeclampsia and neonatal NEC is conflicting. Bashiri et al. reported that maternal hypertensive disorders may be independent predictors of NEC in children smaller than 1500 g at birth [43]. Other studies have demonstrated that the risk of NEC is increased by intrauterine growth restriction and maternal smoking [44].
Recent studies performed in twins have suggested that a genetic variation in an intergenic region of chromosome 8, labeled as the "NECRISK" region may be associated with increased risk for surgical NEC. Although no specific genes have been identified, pathway analyses have indicated possible pathways related to growth factor, calcium, and G-protein signaling, and others associated with inflammation that may contribute to NEC complications [45].

First Day of Life
From laboratory findings, a higher neutrophil count and lower platelet numbers predicted perforation by the models. In this regard, thrombocytopenia has been associated with NEC and perforation in several studies [6].
In our study, arterial blood gases (pCO 2 and HCO 3 ) were the most important variables to predict IP associated with NEC. It is known that a high base deficit from umbilical cord arterial samples at birth can contribute to NEC in growth-restricted infants [6,46]. In our models, metabolic acidosis combined with higher total numbers of leukocytes, neutrophils, and platelets forecasted intestinal perforation with NEC. In agreement with our data, a study by Duci and colleagues reported a statistically significant difference in pH between patients with medically treated NEC vs. NEC requiring surgery (7.35 vs. 7.2, p < 0.0001) and identified a lower risk for surgery in patients with a later onset of NEC and higher pH values [40]. Altogether, these findings support the need for a critical care and follow-up in the first hours of life as a predictive event for developing intestinal perforation associated with NEC.

During Hospitalization
Regarding variables taken from hospitalization data, the presence of PDA, the use of fortifier, early-onset sepsis, hypotension, and gastric residuals were the most important factors for estimating intestinal perforation. Except for the use of fortifier, all other variables have been associated with NEC but not specifically to intestinal perforation [34]. The use of fortifier is a new variable for IP prediction and will be part of the hypothesis to be tested in a future study examining patients. Tepas proposed seven criteria that may be considered as predictive of surgical intervention: bandemia, positive blood culture, acidosis, hypotension, thrombocytopenia, hyponatremia, or neutropenia [47], and some of these parameters were also shown as relevant in the models performed in this study.
It is also important to describe the less/non-predictive factors such as catheter location, formula feeding, and CPAP. At birth, catheter location seemed a valuable parameter, however when taking into account birth and hospitalization data together, its contribution diminished compared to other key factors. With respect to formula feeding, the NICU has implemented the use of donor milk when the patient's mother is not available, perhaps explaining why the use of bovine formula was not an important variable in this study.

Limitations and Strengths of the Study
We have to acknowledge the limitation of this work to be the relatively small dataset size (n = 76) however, the use of anthropometric, clinical and laboratory findings as well as a balanced dataset gave accurate results. Another important limitation is the fact that it is a single center study nonetheless, our institution is a tertiary care hospital that concentrates complicated pregnancies from all over the country. We also have to recall that both ANN models predict the diagnosis of intestinal perforation associated with NEC within the limits of the variables range (Tables 3 and 4). Also, adding more variables to the model (birth and hospitalization data) did not improve the power of prediction.
The strengths of this work include a complete maternal and patient information from variables at birth and associated with the course of the disease taking into account lifestyle, morbidity, anthropometric, clinical, enteral feeding, mechanical ventilation, blood gas, medications and laboratory findings of the study population from three balanced groups. We report both key factors as well as less-important factors for the prediction of IP. Learning by ANN to estimate a pattern of intestinal perforation associated with NEC from individual cases relied on several variable parameters, making such tools highly valuable for clinical setting since they allow a more precise prediction of the outcome.

Conclusions
Both BPNN models were able to accurately estimate intestinal perforation associated with NEC. Furthermore, key maternal and neonatal variables were found by the models involving well-known factors reported in the literature, as well as new parameters that may allow the early diagnosis and follow-up of premature neonates at risk of surgical NEC. Our results highlight the value of integrating maternal and neonatal variables at birth and during hospitalization variables in BPNN models to better estimate surgical NEC. A suggestion for new modeling will be to incorporate data from birth, day 3 and day 7, as well as from multi-center studies. We hope our models may be useful to preselect at-risk patients for perforation associated with NEC before randomization for a strict follow-up that could result in different surgical interventions.
In this work, to change the weights and biases, we applied the Levenberg-Marquardt algorithm, following our previously reported methods [26,27]. This uses the adaptation as follows: where: J is the Jacobian matrix (first derivative) e is a vector of network errors µ is the combination coefficient with a value of 0.001 I is the identity matrix.
The Root Mean Square Error (RMSE) was applied as the error function which describes the performance of the network according to the following equation Equation (A5): where: Q is the number of data points (n = 76), y q,exp is the experimental data, y q,ANNsim is the network prediction.

Results for intestinal perforation ANN models
The obtained ANN models for diagnosis of Intestinal Perforation (IP) associated with NEC followed equation Equation (A6) with TANSIG-LOGSIG: n output is: In k +b1 (s,1) ) − 1 + b2 (l,1) (A6) Equation (A6) estimates intestinal perforation associated with NEC, with weights and biases from Table A1 (IP ANN model at birth, 23-3-1) and Table A2 (IP ANN model at birth and during  hospitalization, 35 Sensitivity analysis To obtain the relative importance of variables in predicting intestinal perforation associated with NEC, we performed a sensitivity analysis based on the partitioning of connection weights proposed by Garson in Equation (A7): where: I j is the relative importance of the jth input variable on the output variable, Ni is the number of input neurons, Nh is the number of hidden neurons, W is the connection weight, And the superscripts i, h and o refer to input, hidden and output layer.
Appendix C Table A1. Weights and biases for the IP ANN model at birth (3 neurons in the hidden layer, k = 3 and l = 1).

Wi {s,k}
Wi