Identification of Distinct Clinical Phenotypes of Heterogeneous Mechanically Ventilated ICU Patients Using Cluster Analysis

This retrospective study aimed to derive the clinical phenotypes of ventilated ICU patients to predict the outcomes on the first day of ventilation. Clinical phenotypes were derived from the eICU Collaborative Research Database (eICU) cohort via cluster analysis and were validated in the Medical Information Mart for Intensive Care (MIMIC-IV) cohort. Four clinical phenotypes were identified and compared in the eICU cohort (n = 15,256). Phenotype A (n = 3112) was associated with respiratory disease, had the lowest 28-day mortality (16%), and had a high extubation success rate (~80%). Phenotype B (n = 3335) was correlated with cardiovascular disease, had the second-highest 28-day mortality (28%), and had the lowest extubation success rate (69%). Phenotype C (n = 3868) was correlated with renal dysfunction, had the highest 28-day mortality (28%), and had the second-lowest extubation success rate (74%). Phenotype D (n = 4941) was associated with neurological and traumatic diseases, had the second-lowest 28-day mortality (22%), and had the highest extubation success rate (>80%). These findings were validated in the validation cohort (n = 10,813). Additionally, these phenotypes responded differently to ventilation strategies in terms of duration of treatment, but had no difference in mortality. The four clinical phenotypes unveiled the heterogeneity of ICU patients and helped to predict the 28-day mortality and the extubation success rate.


Introduction
A mechanical ventilator is one of the most commonly used pieces of medical equipment in intensive care units (ICUs), delivering adequate oxygen to patients' lungs and removing carbon dioxide. Approximately 40% of ICU patients require mechanical ventilation at any given hour [1], and thus mechanical ventilators can be in short supply when the number of critically ill patients increases, such as during the COVID-19 pandemic [2]. Although a ventilator can save lives, prolonged use of mechanical ventilation has risks including severe comorbidities, ventilator dependency, wastage of ICU resources, and additional costs [3]. Additionally, extubation failure is significantly associated with higher mortality, a longer length of stay in the ICU, and a higher likelihood of developing ventilator-associated complications [4]. Therefore, determining the optimal time to wean patients off mechanical ventilation has been a major challenge for intensivists. One of the most common predictors of successful weaning off mechanical ventilation is the rapid shallow breathing index (RSBI) during spontaneous breathing trials (SBTs) [5,6]. Many studies have tried to find better predictors or machine learning models to predict extubation failure that could be applied to every ventilated patient [7,8]. However, general prediction models have failed to achieve high accuracy for clinical practice due to the heterogeneous and dynamically changing nature of diseases in critical care. Therefore, the identification of homogeneous phenotypes is the first step towards general and accurate prediction.
Different phenotypes might respond differently to treatment strategies. Positive end-expiratory pressure (PEEP) is the pressure to prevent alveolar collapse [9]. It has been reported that the use of higher PEEP might be beneficial to patients with heart failure and pulmonary edema, but it is unlikely to improve clinical outcomes in unselected patients with acute respiratory distress syndrome [10]. Therefore, it is clinically relevant to investigate the therapeutic response of patients with different phenotypes to higher or lower PEEP strategies.
In recent years, artificial intelligence (AI) and machine learning (ML) have been widely applied in health-related research, including in the prediction of clinical outcomes in critically ill mechanically ventilated patients [11,12]. Clustering analysis is an unsupervised machine learning task that automatically discovers natural groupings in data, which could be helpful in identifying the underlying patterns in patients compared to simple stratification. Four clinical phenotypes for sepsis were derived which correlated with host-response patterns and clinical outcomes by using K-means clustering analysis [13]. Four subpopulations with distinct patterns of organ dysfunction in septic patients were identified by using hierarchical clustering [14]. Similarly, three clinical phenotypes of acute respiratory distress syndrome (ARDS) have been identified with distinct clinical characteristics and outcomes [15]. However, these studies only focused on identifying phenotypes in one type of heterogeneous condition. In another clinical study of mechanical ventilation in critically ill patients, five clinical phenotypes were correlated with different disease severities and clinical outcomes were derived [16]. However, the authors only analyzed two single-center datasets and the variables relevant to mechanical ventilation.
In this work, we aim to derive the different clinical phenotypes of heterogeneous mechanically ventilated ICU patients to help intensivists to predict the outcome of extubation on the first day of mechanical ventilation for each patient. We hypothesized that phenotypes could be derived from routinely collected clinical data by using clustering analysis and that these phenotypes would be associated with different clinical characteristics and outcomes. We hypothesized that the identification of these clinical phenotypes could help physicians provide early warning and implement personalized treatment strategies, resulting in lower mortality and better prognosis.

Study Design
The study was a retrospective analysis based on two large freely available ICU databases, the eICU Collaborative Research Database (eICU) [17] and the Medical Information Mart for Intensive Care (MIMIC-IV) [18], both accessible through PhysioNet [19]. In the first step, we trained and developed a clustering model to identify clinical phenotypes by using the eICU dataset as a derivation cohort. Second, we tested and validated the clustering analysis results by using the MIMIC-IV dataset as the validation cohort. Finally, we summarized and compared the characteristics and clinical outcomes across different clinical phenotypes.

Population
One of the databases we used was the eICU database, which is a multicenter ICU database with over 200,000 admissions for over 130,000 different patients admitted to one of 335 ICUs at 208 hospitals in the United States between 2014 and 2015 [17]. The other database we used was the MIMIC-IV database, which includes over 380,000 patients admitted to the ICU at the Beth Israel Deaconess Medical Center between 2008 and 2019 [18]. Both databases include patient demographics, vital signs, laboratory measurements, diagnoses, medications and treatments, and provider observations and notes.
We identified all adult ICU patients (aged ≥ 18 years old) who underwent invasive mechanical ventilation for more than 24 h, as suggested by [20], thus excluding the majority of routine postoperative ventilation events. We also filtered out patients with chronic ventilator dependency (ventilated for more than 60 days). For patients with multiple ventilation events per ICU stay, only the first ventilation event was retained. Patients with more than 30% of missing variables were excluded. A patient's weaning event from the ventilator was considered successful if there was no reintubation or death within 48 h after weaning; otherwise, it was considered a failure event. We applied the same filtering criteria to both databases: for eICU, we filtered out 15,256 ventilation events with a failure rate of 23.1%; for MIMIC-IV, we identified 10,813 ventilation events, with a weaning failure rate of about 26.3%. The baseline characteristics of both cohorts are shown in Table A1. The multicenter eICU cohort was used as the cluster derivation cohort and the single-center MIMIC-IV cohort was used as the cluster validation cohort.

Selection of Clinical Variables
We selected 40 candidate variables, including patient demographics, vital signs, blood gas analysis, laboratory measurements, scores, and mechanical ventilator parameters. Clinical variables were selected from each dataset based on their availability, and variables with more than 30% of missing values were removed. Variables with correlations greater than 0.7 were eliminated. The 24 clinical variables selected for phenotyping were age, body mass index (BMI), heart rate, respiratory rate, systolic blood pressure, oxygen saturation (SpO2), temperature, blood urea nitrogen (BUN), creatinine, platelet count, red blood cell (RBC) count, white blood cell (WBC) count, glucose, potassium, sodium, calcium, pH, bicarbonate, partial pressure of oxygen (PaO2), partial pressure of carbon dioxide (PaCO2), Glasgow Coma Scale (GCS) score, tidal volume, positive end-expiratory pressure (PEEP), and fraction of inspired oxygen (FiO2). For each variable, we removed outlier values based on a predefined outlier range and calculated the mean value during the first 24 h of mechanical ventilation.

Statistical Analysis
To prepare the data for phenotyping, we first assessed the distribution, correlation, and missing values of the selected clinical variables. Data cleaning was performed and missing values were imputed by using multiple imputation with chained equations (MICE), which would introduce less bias than the mean imputation. After examining the distribution of the variables, logarithmic transformation was applied to non-normal distributions, followed by standardization. Finally, principal component analysis (PCA) was applied to reduce the dimensionality of the dataset, and the number of principal components which explained at least 80% of the variance in the data was selected. In this study, 14 PCA components were selected as inputs to the clustering model. K-means clustering is one of the most popular clustering models for grouping similar data points and discovering underlying patterns. Starting with a target number k, k randomly selected data points are assigned as cluster centroids; each data point is then allocated to the nearest cluster, and then new cluster centroids are calculated by averaging the data within each cluster. The clustering process is repeated until the centroids are stabilized. In this study, K-means clustering was used to assign patients to different clusters. Several clustering performance evaluation methods such as elbow plot, silhouette coefficient, cluster sizes, and meaningfulness were used to determine the optimal number of clusters k. The 4-cluster (k = 4) model was adopted in this study and the clusters were analyzed and summarized by their characteristics to form different phenotypes.
To compare and visualize the optimal phenotype results, a baseline table was used to compare the characteristics between each phenotype with medians, interquartile ranges (IQRs), means, and standard deviations (SDs), and counts and percentages were presented appropriately. The chi-square test was used for categorical variables and the Kruskal-Wallis test was used for continuous variables with non-normal distributions. Phenotypes were visualized in two dimensions using PCA components, and chord diagrams were used to show the relationship between each phenotype and abnormal clinical variables. Cumulative mortality in the first 28 days was also compared between phenotypes. Analyses were performed using Python 3.8.5 (Python Software Foundation).

Patients in the Study
A total of 15,256 mechanically ventilated patients from the eICU and 10,813 from the MIMIC-IV database were included in this study. The eICU patients were included in the model derivation cohort and the MIMIC-IV patients were included in the model validation cohort. The selected clinical variables were compared between the two cohorts using the Kruskal-Wallis test, and the results are shown in Appendix A Table A2.

Derivation of Clinical Phenotypes
According to the clustering performance evaluation methods, the optimal number of clusters was determined to be four. The derived phenotypes were visualized by two-and three-dimensional plots using PCA components as coordinates. The clinical characteristics and sizes of the four derived phenotypes of A, B, C, and D are presented in Table 1 and Figure 1. All of the continuous clinical variables were not normally distributed, so the Kruskal-Wallis test was used to compare between phenotypes, and the p-values were all significant (<0.001). Phenotype A had the smallest population and phenotype D had the largest population. Phenotype A patients were older, had a lower PaO 2 and a higher PaCO 2 , and were more likely to have respiratory disease; Phenotype B patients had faster heart rate and respiratory rates, higher temperature and white blood cell counts, and were more likely to have cardiovascular disease; Phenotype C patients had lower bicarbonate, pH, and red blood cell counts and higher blood urea nitrogen and creatinine, suggesting that they were more likely to have renal dysfunction; Phenotype D patients were younger, had lower Glasgow Coma Scale scores, fewer abnormal laboratory values, and were more likely to have neurological and traumatic disease. Reproducibility of the phenotypes was assessed by applying the same phenotype derivation process to the MIMIC-IV validation cohort; the clinical characteristics are presented in Table A3. The clinical characteristics of different phenotypes were also compared between the derivation cohort and the validation cohort, as shown in Figure A1. The results from the validation cohort confirmed that the optimal number of phenotypes, phenotype sizes, and clinical characteristics were similar to those observed in the derivation cohort.

Relationship between Phenotypes and Clinical Outcomes
The four derived phenotypes had different clinical outcomes, as shown in Figure 2. In both the derivation and validation cohorts, patients with phenotype A had the lowest 28-day mortality (approximately 15%) and a relatively high extubation success rate (approximately 80%); patients with phenotype B had the second-highest 28-day mortality (nearly 30%) and the lowest extubation success rate (approximately 65%); Phenotype C patients had the highest 28-day mortality (over 30%) and the second-lowest extubation success rate (about 70%); Phenotype D patients had the second-lowest 28-day mortality (just over 20%) and the highest extubation success rate (over 80%). Phenotype A had a significantly lower 28-day mortality compared to phenotype C, and phenotype D had a significantly higher extubation success rate than phenotype B. Patients assigned to phenotype B had the longest ICU length of stay (median: 7-9 days) and duration of mechanical ventilation (median: 3-4 days), whereas patients assigned to phenotype D had the shortest ICU length of stay (median: 5-6 days) and duration of mechanical ventilation (median: 1.5-2.5 days).  In (A), the ribbons connect from one phenotype to one variable group if the phenotype mean is greater or less than the overall mean for the entire cohort. For example, phenotype (C) is more likely to have patients with abnormal renal function. In (B-E), each phenotype is highlighted separately for ease of interpretation.

Relationship between Phenotypes and Clinical Outcomes
The four derived phenotypes had different clinical outcomes, as shown in Figure 2. In both the derivation and validation cohorts, patients with phenotype A had the lowest 28-day mortality (approximately 15%) and a relatively high extubation success rate (approximately 80%); patients with phenotype B had the second-highest 28-day mortality (nearly 30%) and the lowest extubation success rate (approximately 65%); Phenotype C patients had the highest 28-day mortality (over 30%) and the second-lowest extubation success rate (about 70%); Phenotype D patients had the second-lowest 28-day mortality (just over 20%) and the highest extubation success rate (over 80%). Phenotype A had a significantly lower 28-day mortality compared to phenotype C, and phenotype D had a significantly higher extubation success rate than phenotype B. Patients assigned to phe-  (A), the ribbons connect from one phenotype to one variable group if the phenotype mean is greater or less than the overall mean for the entire cohort. For example, phenotype (C) is more likely to have patients with abnormal renal function. In (B-E), each phenotype is highlighted separately for ease of interpretation.

Heterogeneity of Treatment Effects
Among these four phenotypes, only phenotype B had a slightly higher proportion of patients requiring higher PEEP, and these patients had significantly lower extubation success rates (p < 0.05), longer ventilation duration (p < 0.05), and longer ICU stay (p < 0.05), but had no significant difference in hospital mortality (p = 0.248) ( Table 2). Patients in phenotypes A, C, and D tended to require lower PEEP (p < 0.05), but had no difference in hospital mortality and extubation success rates. Patients with higher PEEP had longer ventilation duration in all four phenotypes and had longer ICU stays in phenotypes A, B, and C. mechanical ventilation (median: 3-4 days), whereas patients assigned to phenotype D had the shortest ICU length of stay (median: 5-6 days) and duration of mechanical ventilation (median: 1.5-2.5 days).

Heterogeneity of Treatment Effects
Among these four phenotypes, only phenotype B had a slightly higher proportion of patients requiring higher PEEP, and these patients had significantly lower extubation success rates (p < 0.05), longer ventilation duration (p < 0.05), and longer ICU stay (p < 0.05), but had no significant difference in hospital mortality (p = 0.248) ( Table 2). Patients in phenotypes A, C, and D tended to require lower PEEP (p < 0.05), but had no difference in hospital mortality and extubation success rates. Patients with higher PEEP had longer ventilation duration in all four phenotypes and had longer ICU stays in phenotypes A, B, and C.   Finally, we evaluated the relationship between the derived clinical phenotypes and the APACHE IV (Acute Physiology and Chronic Health Evaluation) score, as shown in Figure 3. The resulting plot confirmed that the conditional distribution of the different phenotypes largely overlapped, indicating that the clinical phenotypes are not a simple reflection of disease severity. In addition, we fitted a logistic regression model to classify the clinical phenotypes, using the APACHE score and RSBI as predictors. The low receiver operating characteristic (ROC) area under the curve (AUC = 0.625) of the model, as shown in Figure A2, indicated that the derived clinical phenotypes were not simply explained by the APACHE score or RSBI. Figure 3. The resulting plot confirmed that the conditional distribution of the different phenotypes largely overlapped, indicating that the clinical phenotypes are not a simple reflection of disease severity. In addition, we fitted a logistic regression model to classify the clinical phenotypes, using the APACHE score and RSBI as predictors. The low receiver operating characteristic (ROC) area under the curve (AUC = 0.625) of the model, as shown in Figure A2, indicated that the derived clinical phenotypes were not simply explained by the APACHE score or RSBI.

Discussion
This retrospective study used a large amount of data from two large publicly available databases, eICU and MIMIC-IV, which included large populations of heterogeneous mechanical ventilated ICU patients. Four clinical phenotypes were identified using only routinely available clinical data. The derived phenotypes were multidimensional, differed in demographics, vital signs, laboratory measures, and patterns of organ dysfunction, and could not be classified by simple stratification of diagnostic groups or severity of illness. In addition, these four phenotypes differed in mortality and other clinical outcomes. Although most of the clinical variables had different distributions in the derivation and validation cohorts (Table A2), the frequency and clinical characteristics of the phenotypes were reproducible in the MIMIC-IV validation cohort, demonstrating the universality of these four derived phenotypes. Only routinely available clinical variables were used in the clustering analysis, allowing the phenotypes to be easily reproduced in other datasets.
For these four phenotypes, we used chord diagrams to show correlations between phenotypes and clinical variable groups. Of the four phenotypes identified, phenotype A was correlated with abnormal values in pulmonary dysfunction; phenotype B was most strongly correlated with abnormal values associated with inflammatory, pulmonary, and

Discussion
This retrospective study used a large amount of data from two large publicly available databases, eICU and MIMIC-IV, which included large populations of heterogeneous mechanical ventilated ICU patients. Four clinical phenotypes were identified using only routinely available clinical data. The derived phenotypes were multidimensional, differed in demographics, vital signs, laboratory measures, and patterns of organ dysfunction, and could not be classified by simple stratification of diagnostic groups or severity of illness. In addition, these four phenotypes differed in mortality and other clinical outcomes. Although most of the clinical variables had different distributions in the derivation and validation cohorts (Table A2), the frequency and clinical characteristics of the phenotypes were reproducible in the MIMIC-IV validation cohort, demonstrating the universality of these four derived phenotypes. Only routinely available clinical variables were used in the clustering analysis, allowing the phenotypes to be easily reproduced in other datasets.
For these four phenotypes, we used chord diagrams to show correlations between phenotypes and clinical variable groups. Of the four phenotypes identified, phenotype A was correlated with abnormal values in pulmonary dysfunction; phenotype B was most strongly correlated with abnormal values associated with inflammatory, pulmonary, and cardiovascular dysfunction; phenotype C was most strongly correlated with renal dysfunction; and phenotype D had the least abnormal values. Cumulative mortality and extubation success rate plots showed that phenotype C had the highest mortality and phenotype B had the lowest extubation success rate.
This study also showed that different clinical phenotypes responded differently to mechanical ventilation strategies. Most patients in phenotypes A, C, and D required lower PEEP, whereas phenotype B had a relatively higher proportion of patients that required higher PEEP. This may be due to pulmonary infiltration due to increased pulmonary capillary pressure during cardiac insufficiency. Additionally, patients in phenotype B who required higher PEEP had lower extubation success rates, longer ventilation times, and longer ICU stays. This suggests that patients with cardiovascular disease and respiratory dysfunction may require prolonged ventilation and intensive care. However, these four phenotypes were not statistically different in terms of mortality. In the era of personalized medicine, implementing different treatment strategies based on patient phenotypes could potentially improve treatment outcomes or help refine the ventilator weaning protocols.
These four phenotypes were identified on the first day of mechanical ventilation, aiming to help intensivists to identify different phenotypes of ICU patients early in the mechanical ventilation period, and potentially lead to different care processes for each phenotype. In eras and in regions where ventilators become a scarce resource, early prediction could not only help to avoid overtreatment but could also maximize the use of medical resources. Therefore, we only analyzed data on the first day of intubation, which may be controversial and one of our study's limitations. To address this, our next study will develop predictive models using cross-sectional data on the day of extubation. Our second limitation was that the eICU database only included records from 2014 to 2015 and the MIMIC-IV database included records from 2008 to 2019. However, in the meantime, the weaning guideline changed in 2016, suggesting that the initial SBT should be performed with inspiratory pressure augmentation (5-8 cm H 2 O) rather than without (T-piece or CPAP). Therefore, we should validate the results using more recent cohorts which are not yet available. However, we reckoned that the revision of the guideline had little impact on this study, because the weaning process may vary according to patients, institutions, care processes, and the skills of different clinicians [21]. An international survey of intensivists' weaning practices found that there were regional differences in several aspects such as screening frequency, SBT techniques, and ventilator mode [22]. The third limitation was that many decisions in the data preprocessing steps, such as the variable selection, time window selection, data cleaning, and data imputation methods, were made according to the other literature or our judgment. Therefore, there may be some bias in our results and changes in the decisions may lead to different clinical characteristics of the phenotypes.

Conclusions
In this retrospective study of mechanically ventilated ICU patients in the eICU and MIMIC-IV databases, four clinical phenotypes were identified which correlate with different clinical characteristics, and outcomes were identified. These phenotypes may help one to understand the heterogeneity of ICU patients, provide more personalized care and treatment strategies for patients, and ultimately lead to better survival and prognosis. Future research is needed to validate the phenotypes in different datasets and to determine the applicability of these phenotypes in clinical settings, including different weaning strategies and other care plans.

Institutional Review Board Statement:
This study was an analysis of the third-party anonymized databases with pre-existing IRB approval.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data are available upon request.

Conflicts of Interest:
The authors declare no conflict of interest.    Figure A1. Descriptive plots of phenotype variables. Abbreviations: WBC, white blood cell; SpO2, oxygen saturation; PaO2, partial pressure of oxygen; PaCO2, partial pressure of carbon dioxide; PEEP, positive end expiratory pressure; FiO2, the fraction of inspiration oxygen; SBP, systolic blood pressure; RBC, red blood cell; GCS, Glasgow Coma Scale Score. Figure A1. Descriptive plots of phenotype variables. Abbreviations: WBC, white blood cell; SpO 2 , oxygen saturation; PaO 2 , partial pressure of oxygen; PaCO 2 , partial pressure of carbon dioxide; PEEP, positive end expiratory pressure; FiO 2 , the fraction of inspiration oxygen; SBP, systolic blood pressure; RBC, red blood cell; GCS, Glasgow Coma Scale Score. Figure A1. Descriptive plots of phenotype variables. Abbreviations: WBC, white blood ce oxygen saturation; PaO2, partial pressure of oxygen; PaCO2, partial pressure of carbon PEEP, positive end expiratory pressure; FiO2, the fraction of inspiration oxygen; SBP, systo pressure; RBC, red blood cell; GCS, Glasgow Coma Scale Score. Figure A2. ROC AUC plot of the logistic regression model. ROC curve of phenotype class model with APACHE IV score and RSBI as predictors.