Clinical Decision Support System to Detect the Occurrence of Ventilator-Associated Pneumonia in Pediatric Intensive Care

Objectives: Ventilator-associated pneumonia (VAP) is a severe care-related disease. The Centers for Disease Control defined the diagnosis criteria; however, the pediatric criteria are mainly subjective and retrospective. Clinical decision support systems have recently been developed in healthcare to help the physician to be more accurate for the early detection of severe pathology. We aimed at developing a predictive model to provide early diagnosis of VAP at the bedside in a pediatric intensive care unit (PICU). Methods: We performed a retrospective single-center study at a tertiary-care pediatric teaching hospital. All patients treated by invasive mechanical ventilation between September 2013 and October 2019 were included. Data were collected in the PICU electronic medical record and high-resolution research database. Development of the clinical decision support was then performed using open-access R software (Version 3.6.1®). Measurements and main results: In total, 2077 children were mechanically ventilated. We identified 827 episodes with almost 48 h of mechanical invasive ventilation and 77 patients who suffered from at least one VAP event. We split our database at the patient level in a training set of 461 patients free of VAP and 45 patients with VAP and in a testing set of 199 patients free of VAP and 20 patients with VAP. The Imbalanced Random Forest model was considered as the best fit with an area under the ROC curve from fitting the Imbalanced Random Forest model on the testing set being 0.82 (95% CI: (0.71, 0.93)). An optimal threshold of 0.41 gave a sensitivity of 79.7% and a specificity of 72.7%, with a positive predictive value (PPV) of 9% and a negative predictive value of 99%, and with an accuracy of 79.5% (95% CI: (0.77, 0.82)). Conclusions: Using machine learning, we developed a clinical predictive algorithm based on clinical data stored prospectively in a database. The next step will be to implement the algorithm in PICUs to provide early, automatic detection of ventilator-associated pneumonia.


Introduction
Ventilator-associated pneumonia (VAP) is a common and severe complication in intensive care units.VAP, as a care-related complication leads to a worsening prognosis for the affected patients and its early diagnosis remain an ongoing challenge in intensive care.In an attempt to enhance VAP detection, the Centers for Disease Control (CDC) issued diagnosis criteria allowing the identification of VAP after 48h of clinical alteration (defined by worsening gas exchange, fever >38 • C or hypothermia, leukocytosis >15,000/mm 3 or leukopenia <4000/mm 3 , new onset of purulent sputum, apnea or tachypnea, wheezing/rales/rhonchi, cough and bradycardia <100/min or tachycardia >170/min) [1].However, delays in VAP diagnosis and, to some extent, in initiating anti-infectious therapy are observed and associated with worse outcomes [2][3][4].Furthermore, subjective criteria included in the CDC pediatric definition for VAP results in a variability of VAP diagnosis and incidence (changes in the appearance and amount of sputum, worsening of an existing cough) [5][6][7].
To help physicians to prospectively diagnose VAP, the CDC developed the concept of Ventilator-Associated Events (VAE) in adults, but children have long been excluded from this definition [1].It is usual that for adult recommendations, children are excluded mainly because of physiological differences between populations (normal respiratory parameters for an adult are very different from those of a child).Cirulis et al. [8] proposed a pediatric modified VAE definition.Chomton et al. [9] evaluated the pediatric modified VAE definition to detect VAP, but the sensitivity (66%) to identify this ICU-related complication remained disappointing.
In recent years, the number of publications dealing with the development of computerized clinical decision support systems (CDSS) to improve disease diagnosis increased and was shown to be useful for several disease in ICUs [10][11][12][13][14].The emergence of highresolution databases supports these developments [15] which allow for a precise and continuous analysis of clinical and biological parameters.Leisman et al. [16] recently reported several recommendations for the development and reporting of predictive models.They identified two categories of predictive models: (1) clinical prediction models for bedside use, and (2) other prediction models intended for deployment across populations for research, benchmarking, and administrative purposes.The usefulness of CDSS had already been highlighted by Mack et al. [17] but no reports on VAP are available currently.To that effect, our project has been developed with the main objective of developing a predictive model to provide early diagnosis of VAP at the bedside in a pediatric intensive care unit (PICU).

Materials and Methods
This single-center retrospective study was performed using the data collected in the PICU electronic medical record (Intelligence Critical Care and Anesthesia (ICCA ® ); Philips Medical, version F0.1) of a tertiary-care pediatric teaching hospital (Sainte-Justine Hospital, Montréal, QC, Canada).To improve data quality, ICCA ® was configured with drop-down menus and critical values alerts.Furthermore, all data entered in ICCA ® benefited from a medically-endorsed validation.
The hospital database was queried using SQL Server Management Studio 18 ® (Microsoft, Redmond, WA, USA) to select patients who were aged from 1 day to 18 years at PICU admission and were mechanically ventilated for more than 48 h, between September 2013 and October 2019.We analyzed the first 30 days of invasive mechanical ventilation.
During the first step of the study, all medical files were reviewed by two senior pediatric intensive care experts (JR and PJ) to classify patients into two groups: VAP patients and free-of-VAP patients.VAP was defined according to the 2021 CDC criteria [1]: The 1st context criteria: invasive mechanical ventilation for more than 48 h, 2nd radiological criteria: new or progressive and persistent infiltrate/consolidation/cavitation, 3rd clinical criteria: worsening gas exchange, fever >38 • C or hypothermia, leukocytosis >15,000/mm 3 or leukopenia <4000/mm 3 , new onset of purulent sputum, apnea or tachypnea, wheezing/rales/rhonchi, cough and bradycardia <100/min or tachycardia >170/min.
Data formatting.The data was formatted using R (version 3.6.1)as a preparation step to train the prediction models based on different algorithms.
All times were expressed as a relative duration since ICU admission.Data cleaning and Missing data.Incoherent data were identified and corrected according to the scheme described in Supplementary Data S1 "Data Cleaning".Variables consisting of data streams of continuous values were imputed following the last observation carried forward method.For missing data at the beginning of the stream, the first valid observation was carried backward.
Segmenting Variables in Time Blocks.The variables data streams were first segmented into time blocks of 6 h and then for each variable the median (mode for the discrete variable) was calculated over each 6 h time block to avoid aberrant or missing data.Then, the 6 h blocks were aggregated into 48 h time blocks.We chose to aggregate into 48 h time blocks to be as close as possible to the actual VAP timing definition.For each variable, two columns were generated.One consisted of the first non-missing value among the 6 h time blocks and the other one the last non-missing value among the 6 h time blocks, if there was any, in each 48 h time block (if there was no observation, the data was considered missing).For the development of the algorithms, for each variable, the first non-missing values and the actual difference or relative change of the values of the two columns were considered (more details are available in Supplementary Data S2 "Segmenting variables in time blocks").
Stratified train-test split at a patient level.VAP patients and non-VAP patients were split into the training set (70% of each class) and the testing set (remaining 30% of each class).Since some patients had more than one stay in the PICU, all stays of a patient in the training set were kept in the training set (and the same for patients in the testing set).All details for the train-test split are available in Supplementary Data S3 "train-test split".
Imputation.Preliminary inspection of the dataset showed that around 50% of data was missing for the variables "pulmonary dynamic compliance" and "minute ventilation".Missing values imputation in the training dataset was performed by 'randomForest' (v4.[6][7][8][9][10][11][12][13][14] with the function 'rfImpute' [22].The imputed values were the weighted average of the non-missing observations, where the weights were the proximities from randomForest.For data in the testing set, the missing values in each variable were replaced by the mean of the imputed values for the variables with missing values in the training set (more details are available in Supplementary Data S4 "Imputation").
Predictive models.We applied six different learning algorithms to generate predictive models.The algorithms were: Random Forest with the function 'rfsrc' and error rate as the measure of performance, Imbalanced Random Forest with the function 'imbalanced' and G-means as the measure of performance, Stepwise Regression and Random Forest using 5-fold cross validation (5-CV) with the 'train' function; 'glmStepAIC' and 'rf' methods and accuracy were used to select the optimal model using the largest value [23].Finally, we implemented Elastic Net Regression (5-CV) and Weighted Elastic Net Regression (5-CV) with the 'glmnet' method and ROC was used to select the optimal model using the largest value.The hyperparameters for the Random Forest, Imbalanced Random Forest and stepwise regression (5-CV) algorithms were 'ntree' (number of trees used at the tuning step) and 'mtry' (number of variables randomly selected as candidates for the division of a node) [24].The parameters in Elastic Net regression were alpha, which controls the relative balance between the lasso and ridge regularization, and lambda, which controls the amount of the penalty.All these models used readily available implementations in R [25,26].Here, cross-validation was performed inside the training set only (more details are available in Supplementary Data S5 "Predictive models").
Performance measure and model choice.Models resulting from the different algorithms were evaluated, at the level of 48 h time blocks, on the train and the test set by calculating their AUC score and by determining classification thresholds reaching predetermined levels of sensitivity (80%, 85%, 90%, 95%).The final model was chosen based on the capacity to [1] maximize specificity under these sensitivity levels, and [2] generalize the sensitivity and specificity from the test set.The area under the ROC curve (AUC) was considered as the primary measure of performance to choose the best model.
Per patient validation.The final model was evaluated on its capacity to correctly assess the infection status of patients over time.The predictions' results obtained after setting different classification thresholds were taken.The number of patients with accurate predictions (i.e., predicted class = observed VAP status) and inaccurate predictions (i.e., predicted class = observed VAP status) were computed over time.The number of patients for whom the predictions contained at least one error were identified.We looked at the accuracy of predictions by stratifying patients into two groups.We identified the patients for whom the predictions contained at least one error for each subgroup.The global error rates were calculated for each subgroup.

Statistics
Development of the clinical decision support was performed using open-access R software, Version 3.6.1 ® (R Foundation for Statistical Computing, Vienna, Austria).Statistical analysis of patients' characteristics was performed using Prism X ® software (version 7.05) (GraphPad Inc.San Diego, CA, USA).Kolmogorov analysis was performed to test the normal distribution of continuous variables.Population description used categorical variables expressed as frequency with corresponding proportion and quantitative variables presented as mean and standard deviation.Performance evaluation was conducted using ROC curves, AUC and their confidence intervals, and derived measures of sensitivity and specificity.The ethical committee of Sainte-Justine University hospital approved the study and waived the need for informed consent given the retrospective design.
The Saint-Justine ethical committee approved the study as a retrospective study and waived the need for written consent (n • 2020-2454).

General Description of the Population
A total of 5153 children had been hospitalized in Saint-Justine PICU during the study period of which 40% (2077) were mechanically ventilated and 1235 episodes with more than 48 h of mechanical invasive ventilation were identified (Figure 1).Seventy-seven patients had at least one VAP event.Seventy-eight VAP events (6%) were diagnosed by two experts.The patients' general characteristics are described in Table 1.Patients with less than 4 days of mechanical ventilation were removed (see Supplementary Data S2 "Segmenting variables") to achieve 811 episodes of invasive mechanical ventilation.The training set (70% of each class) and testing set (remaining 30% of each class), resulted in a training set of 461 patients free of VAP and 45 patients with VAP and in a testing set of 199 patients free of VAP and 20 patients with VAP.Since some patients had more than one stay in the ICU, there could be different events for the same patient.The training set thus had 513 stays with no VAP event and 45 stays with a VAP event, and the testing set had 231 stays with no VAP event and 22 stays with a VAP event.The segmenting of variables in 48 h non-overlapping time blocks generated, from these datasets, 1852 time blocks free of VAP and 45 time blocks with VAP in the training set, and 788 time blocks free of VAP and 22 time blocks with VAP in the testing set.
We observed similar characteristics in the train and test groups (Table 2).Patients with less than 4 days of mechanical ventilation were removed (see Supplementary Data S2 "Segmenting variables") to achieve 811 episodes of invasive mechanical ventilation.The training set (70% of each class) and testing set (remaining 30% of each class), resulted in a training set of 461 patients free of VAP and 45 patients with VAP and in a testing set of 199 patients free of VAP and 20 patients with VAP.Since some patients had more than one stay in the ICU, there could be different events for the same patient.The training set thus had 513 stays with no VAP event and 45 stays with a VAP event, and the testing set had 231 stays with no VAP event and 22 stays with a VAP event.The segmenting of variables in 48 h non-overlapping time blocks generated, from these datasets, 1852 time blocks free of VAP and 45 time blocks with VAP in the training set, and 788 time blocks free of VAP and 22 time blocks with VAP in the testing set.
We observed similar characteristics in the train and test groups (Table 2).

Missing Data
We observed two missing values for "sf ratio" and "oxygen saturation index (OSI)" in the test set (0.1% of total observations).For the variable "pulmonary dynamic compliance" the proportion of missing values in the train and test sets were 0.49 and 0.54, respectively.For the variable "minute ventilation", the proportion of missing values in the train and test sets were 0.49 and 0.54, respectively.

Results of Training Algorithm
The Imbalanced Random Forest model was considered as the best fit with an area under the ROC curve of 0.86 from the train set.
Thresholds and specificities corresponding to the predetermined levels of sensitivity are presented in Table 3. Variable importance obtained from the Imbalanced Random Forest model are presented in Figure 2.

Missing Data
We observed two missing values for "sf ratio" and "oxygen saturation index (OSI)" in the test set (0.1% of total observations).For the variable "pulmonary dynamic compliance" the proportion of missing values in the train and test sets were 0.49 and 0.54, respectively.For the variable "minute ventilation", the proportion of missing values in the train and test sets were 0.49 and 0.54, respectively.

Results of Training Algorithm
The Imbalanced Random Forest model was considered as the best fit with an area under the ROC curve of 0.86 from the train set.
Thresholds and specificities corresponding to the predetermined levels of sensitivity are presented in Table 3. Variable importance obtained from the Imbalanced Random Forest model are presented in Figure 2.

Performance on Test Dataset
The area under the ROC curve from fitting the Imbalanced Random Forest model on the test set was 0.82 (95% CI: (0.71, 0.93)) (Figure 3).

Per Patient Validation
Performance of the final model was evaluated over different time periods.Time periods were defined starting from the first time block and going up to a given time block The specificity and sensitivity obtained after setting different classification thresholds are presented in Table 4.An optimal threshold of 0.41 gave a sensitivity of 79.7% and a specificity of 72.7%, with a positive predictive value (PPV) of 9% and a negative predictive value of 99%, with an accuracy of 79.5% (95% CI: (0.77, 0.82)).

Per Patient Validation
Performance of the final model was evaluated over different time periods.Time periods were defined starting from the first time block and going up to a given time block in the future.The confusion matrices for all the time periods were constructed.False positive rates (FPR), true positive rates (TPR), and area under the curve (AUC) were calculated.The results are presented in Figure 4.The procedure is explained in detail in Supplementary Data S6 "Per patient validation".
The global error rate is presented in Table 5.We observed a lower error rate for patients with at most three time blocks of observations, compared to the ones with at least four time blocks of observations.in the future.The confusion matrices for all the time periods were constructed.False positive rates (FPR), true positive rates (TPR), and area under the curve (AUC) were calculated.The results are presented in Figure 4.The procedure is explained in detail in Supplementary Data S6 "Per patient validation".The global error rate is presented in Table 5.We observed a lower error rate for patients with at most three time blocks of observations, compared to the ones with at least four time blocks of observations.

Discussion
Using an electronic medical record, an algorithm supporting clinicians in the early diagnosis of ventilator-associated pneumonia in PICU had a sensitivity of 80% and specificity of 73%, with the threshold of 0.41.To date, it is the most accurate sensitivity achieved by a CDSS system to provide early detection of VAP.
Ventilator-associated pneumonias is a severe health care disease [2,27,28].To improve the delay and accuracy of this challenging diagnosis, Cirulis et al. [8] evaluated the accuracy of adults' ventilator-associated events (VAE) to early diagnose pediatric VAP

Discussion
Using an electronic medical record, an algorithm supporting clinicians in the early diagnosis of ventilator-associated pneumonia in PICU had a sensitivity of 80% and specificity of 73%, with the threshold of 0.41.To date, it is the most accurate sensitivity achieved by a CDSS system to provide early detection of VAP.
Ventilator-associated pneumonias is a severe health care disease [2,27,28].To improve the delay and accuracy of this challenging diagnosis, Cirulis et al. [8] evaluated the accuracy of adults' ventilator-associated events (VAE) to early diagnose pediatric VAP and developed modified pediatric criteria for VAE (increase in FiO 2 by 20% or PEEP by 2 cm H 2 O sustained for more than one day).VAE and modified pediatric VAE both had a disappointing sensitivity of 23% and 56% for Cirulis et al. [8] and 56% and 66% for Chomton et al. [9], respectively.Our algorithm was based on machine learning methods and improved the sensitivity in this study and could be implemented to screen in real-time patient's data to provide early detection of VAP in children.The prediction of the test set using the Imbalanced Random Forest model is stored in a file and is available on Github [29].
Implementation of a clinical decision system to help physicians is a promising technology aimed at helping the physician to take medical decision [10,30], to analyze chest X-rays [31], or to increases diagnosis sensitivity [32].The development methodology starts with a retrospective classification of analyzed patients to define whether they develop

Figure 2 .
Figure 2. Variable importance used in the clinical decision system.

Figure 2 .
Figure 2. Variable importance used in the clinical decision system.

Figure 3 .
Figure 3. ROC Curve.Black curve represent the efficiency of the training of the algorithm on 2/3 of the dataset.Red curve represents the efficiency of the test of the algorithm on the rest of the data set

Figure 3 .
Figure 3. ROC Curve.Black curve represent the efficiency of the training of the algorithm on 2/3 of the dataset.Red curve represents the efficiency of the test of the algorithm on the rest of the data set.

Figure 4 .
Figure 4. False positive rate and true positive rate over different time periods for different thresholds.Th.Default: default threshold of the model; Th80%: threshold correspond to the 80% sensitivity; Th85%: threshold correspond to the 85% sensitivity.

Figure 4 .
Figure 4. False positive rate and true positive rate over different time periods for different thresholds.Th.Default: default threshold of the model; Th80%: threshold correspond to the 80% sensitivity; Th85%: threshold correspond to the 85% sensitivity.

Table 2 .
Train and test groups' characteristics.

Table 3 .
Imbalanced Random Forest model.Threshold and specificity from predetermined sensitivity for the train set.

Table 2 .
Train and test groups' characteristics.

Table 4 .
Imbalanced Random Forest model.Sensitivity and specificity for the test set corresponding to different thresholds.

Table 4 .
Imbalanced Random Forest model.Sensitivity and specificity for the test set corresponding to different thresholds.

Table 5 .
Error rates (%) for predicted classes.Patients with at most 3 time blocks of observations; G2: Patients with at least 4 time-blocks of observations.E.Pred: Error rate for prediction; E.Pred.the80%:Error rate for prediction with threshold correspond to the 80% sensitivity; E.Pred.th85%:Error rate for prediction with threshold correspond to the 85% sensitivity. G1: