Machine Learning Consensus Clustering Approach for Patients with Lactic Acidosis in Intensive Care Units

Background: Lactic acidosis is a heterogeneous condition with multiple underlying causes and associated outcomes. The use of multi-dimensional patient data to subtype lactic acidosis can personalize patient care. Machine learning consensus clustering may identify lactic acidosis subgroups with unique clinical profiles and outcomes. Methods: We used the Medical Information Mart for Intensive Care III database to abstract electronic medical record data from patients admitted to intensive care units (ICU) in a tertiary care hospital in the United States. We included patients who developed lactic acidosis (defined as serum lactate ≥ 4 mmol/L) within 48 h of ICU admission. We performed consensus clustering analysis based on patient characteristics, comorbidities, vital signs, organ supports, and laboratory data to identify clinically distinct lactic acidosis subgroups. We calculated standardized mean differences to show key subgroup features. We compared outcomes among subgroups. Results: We identified 1919 patients with lactic acidosis. The algorithm revealed three best unique lactic acidosis subgroups based on patient variables. Cluster 1 (n = 554) was characterized by old age, elective admission to cardiac surgery ICU, vasopressor use, mechanical ventilation use, and higher pH and serum bicarbonate. Cluster 2 (n = 815) was characterized by young age, admission to trauma/surgical ICU with higher blood pressure, lower comorbidity burden, lower severity index, and less vasopressor use. Cluster 3 (n = 550) was characterized by admission to medical ICU, history of liver disease and coagulopathy, acute kidney injury, lower blood pressure, higher comorbidity burden, higher severity index, higher serum lactate, and lower pH and serum bicarbonate. Cluster 3 had the worst outcomes, while cluster 1 had the most favorable outcomes in terms of persistent lactic acidosis and mortality. Conclusions: Consensus clustering analysis synthesized the pattern of clinical and laboratory data to reveal clinically distinct lactic acidosis subgroups with different outcomes.

With the advancement of electronic medical records (EMR) and artificial intelligence, machine learning (ML) algorithms have become more widely utilized in individualized medicine to assist clinical decision-making [22,23]. Consensus clustering is an unsupervised ML technique that is utilized to identify similarities and differences among various data variables, and then assign them into meaningful clusters [24,25]. Recent studies have demonstrated that ML consensus clustering approach can distinguish meaningful disease subtypes that forecast unique clinical outcomes [26,27]. Given the heterogeneity of patients with lactic acidosis on ICU admission [9][10][11][13][14][15][16][17][18][19][20][21], ML consensus clustering approach may help clinicians better identify different phenotypes of critically ill patients with lactic acidosis in order to apply individualized strategies to improve outcomes.
In this study, we aimed to identify clinically meaningful clusters of critically ill patients with lactic acidosis on ICU admission using an unsupervised ML clustering approach. We then assessed these distinct clusters' individual outcomes.

Patient Population
We used the Medical Information Mart for Intensive Care III (MIMIC III) database to identify critically ill adult patients (aged 18 years or older) with lactic acidosis at ICU admission. MIMIC III database is a publicly available critical care database of patients from a large, single-center tertiary care hospital from 2001 to 2012 [28]. Lactic acidosis was defined as the first serum lactate measured within 48 h of ICU admission of ≥4.0 mmol/L. Patients were excluded if they did not have a serum lactate measurement within 48 h of ICU admission or if they were admitted in ICU for ≤24 h. We included only the first ICU admission if patients had multiple ICU admissions. The Mayo Clinic Institutional Review Board approved this study (IRB number 21-009222) and waived the need for informed consent due to the use of publicly and de-identified database.

Data Collection
We abstracted EMR data on patient characteristics, comorbidities, vital signs, organ supports, and laboratory results to identify clinically distinct lactic acidosis clusters. As our goal was to cluster lactic acidosis patients based on available data at the time of ICU admission, we only used data that was present within 48 h of ICU admission for clustering analysis. We selected the first vital sign or laboratory value within the 48-h time frame when there were multiple values. We excluded laboratory results with over 10% missing data. Otherwise, missing data were multiple imputed using the Random Forest method before inclusion in cluster analysis.
We defined comorbidities using the Elixhauser Comorbidity Software, which identifies comorbidities by grouping International Classification of Disease Clinical Modification codes [29]. We calculated comorbidity burden using Charlson Comorbidity Score. [30] We calculated Severity index for the acute illness using Simplified Acute Physiology Score (SAPS) II [31]. We estimated glomerular filtration rate (GFR) using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation [32]. We defined acute kidney injury (AKI) as per the Kidney Disease Improving Global Outcome (KDIGO) guideline' serum creatinine and urine output criteria [33]. For serum creatinine criteria, we used the lowest creatinine seven days prior to ICU admission as the baseline and compared it with the highest creatinine within 48 h of ICU admission. Patients had AKI if there was an increase in serum creatinine of ≥0.3 mg/dL or 1.5 times baseline. For urine output criteria, we split urine output within 48 h after ICU admission into 6-h time periods. Patients had AKI if the total urine output in any 6-h time interval was below the limit of 3 mL/kg. Outcomes of interest included persistent lactic acidosis (defined as all subsequent serum lactate values after initial elevated value within 48 h of its occurrence were ≥4 mmol/L), hospital mortality, and 90-day mortality after ICU admission.

Cluster Analysis
We applied an unsupervised ML approach to consensus clustering in order to identify clinical phenotypes of ICU patients with lactic acidosis [34]. We used a pre-specified subsampling parameter of 80% with 100 iterations and assigned the number of potential clusters (k) to range from 2 to 10 in order to avoid producing an excessive number of clusters that would not be clinically useful. The optimal number of clusters was determined by examining the consensus matrix (CM) heat map, cumulative distribution function (CDF), cluster-consensus plots in the within-cluster consensus scores, and the proportion of ambiguously clustered pairs (PAC) [35,36]. The within-cluster consensus score, ranging between 0 and 1, is defined as the average consensus value for all pairs of individuals belonging to the same cluster [36]. A value closer to one indicates better cluster stability [36]. PAC, ranging between 0 and 1, is calculated as the proportion of all sample pairs with consensus values falling within the predetermined boundaries [35]. A value closer to zero indicates better cluster stability [35]. We calculated the PAC utilizing two criteria: (1) the strict criteria consisting of a predetermined boundary of (0, 1), where a pair of individuals who had consensus value greater than 0 or less than 1 was considered ambiguously clustered, and (2) the relaxed criteria consisting of a predetermined boundary of (0.1, 0.9), where a pair of individuals who had consensus value greater than 0.1 or less than 0.9 was considered ambiguously clustered [35]. This study's detailed consensus clustering algorithms are provided in the Online Supplementary.

Statistical Analysis
After we identified the clusters of lactic acidosis patients, we performed analyses to test the differences among the clusters. First, we compared patient characteristics among the clusters using the analysis of variance (ANOVA) test for continuous variables and Chi-squared test for categorical variables. We determined the clusters' key features using standardized mean differences with a set cut-off of >0.3. We then compared outcomes among the identified clusters. We assessed the association of clusters with persistent lactic acidosis and hospital mortality using logistic regression. We assessed the association of clusters with 90-day mortality using Cox proportional hazard regression. We selected cluster 1 as the reference group because it was associated with the most favorable outcomes. We did not adjust for patient characteristics because these characteristics were utilized to identify clusters through unsupervised ML. We performed all analyses using R, version 4.0.3 (RStudio, Inc., Boston, MA, USA; http://www.rstudio.com/ (accessed on 15 January 2021)), with the packages of ConsensusClusterPlus (version 1.46.0) [36] for consensus clustering analysis, and the missForest package for missing data imputation [37].
The CDF plot displays the consensus distributions for each cluster ( Figure 1A). The delta area plot shows the relative change in the area under the CDF curve ( Figure 1B). The largest changes in area occurred between k = 3 and k = 5, at which point the relative increase in area became noticeably smaller. As shown in the CM heatmap ( Figure 2, Supplementary Materials, Figures S1-S9), the ML algorithm identified cluster 2 and cluster 3 with clear boundaries, indicating good cluster stability over repeated iterations.
The mean cluster consensus score was comparable between a scenario of two or three clusters ( Figure 3A). In addition, favorable low PACs by both strict and relaxed criteria were demonstrated for three clusters ( Figure 3B). Thus, using baseline variables at hospital admission, the consensus clustering analysis identified three clusters that best represented the data pattern of our ICU patients with lactic acidosis. January 2021)), with the packages of ConsensusClusterPlus (version 1.46.0) [36] for consensus clustering analysis, and the missForest package for missing data imputation [37].
The CDF plot displays the consensus distributions for each cluster ( Figure 1A). The delta area plot shows the relative change in the area under the CDF curve ( Figure 1B). The largest changes in area occurred between k = 3 and k = 5, at which point the relative increase in area became noticeably smaller. As shown in the CM heatmap ( Figure 2    The mean cluster consensus score was comparable between a scenario of two or three clusters ( Figure 3A). In addition, favorable low PACs by both strict and relaxed criteria were demonstrated for three clusters ( Figure 3B). Thus, using baseline variables at hospital admission, the consensus clustering analysis identified three clusters that best represented the data pattern of our ICU patients with lactic acidosis.  The bar plot represents the mean consensus score for different numbers of clusters (K ranges from two to ten) for ICU patients with lactic acidosis; (B) The PAC values using the strict criteria (red line) with the predetermined boundary of (0, 1), and the PAC values using the relaxed criteria (black line) with the predetermined boundary of (0.1, 0.9) as the definition for ambiguously clustered pairs. There were 554 patients in cluster 1, 825 patients in cluster 2, and 550 patients in cluster 3. Table 1 shows the patient characteristics of the three identified clusters.  The bar plot represents the mean consensus score for different numbers of clusters (K ranges from two to ten) for ICU patients with lactic acidosis; (B) The PAC values using the strict criteria (red line) with the predetermined boundary of (0, 1), and the PAC values using the relaxed criteria (black line) with the predetermined boundary of (0.1, 0.9) as the definition for ambiguously clustered pairs.
There were 554 patients in cluster 1, 825 patients in cluster 2, and 550 patients in cluster 3. Table 1 shows the patient characteristics of the three identified clusters.
The plot of standardized mean difference in Figure 4 demonstrates the key features of each cluster.

Discussion
ML consensus clustering algorithms offer the ability to efficiently analyze and identify unique clusters of patients with different characteristics in a large amount of data [24,25,38,39]. In this study, we identified three clinically distinct clusters of patients with lactic acidosis at time of ICU admission utilizing a ML unsupervised consensus clustering approach. The three clusters demonstrated different characteristics and were associated with unique clinical outcomes, including persistent lactic acidosis, hospital mortality, and 90-day mortality.
Cluster 1, designated as the reference cluster, consisted of patients of an older age who had underlying valvular heart disease, peripheral vascular disease, and hypertension. Of the three clusters, cluster 1 patients had the highest rate of elective ICU admission to cardiac surgery recovery Unit (CSRU) after cardiac surgery. These patients had the highest rates of vasopressor and mechanical ventilator utilization. While cluster 1 patients had the highest severity of anemia, they also had the highest level of pH, HCO3, and pO2 among all of the clusters. While cluster 1 patients had the lowest Glasgow coma scale (GCS) scores on ICU admission, they were more often admitted electively after cardiac surgery and required mechanical ventilator. This could be due to sedation. While AKI occurred in 76% of cluster 1 patients, severe AKI requiring renal replacement therapy only composed 2% of these patients. Average blood lactate level in cluster 1 was 5.7 mmol/L, and only 9.2% had persistent lactic acidosis 48 h after ICU admission. This corresponds to the literature where elevated lactate levels are frequently noted after cardiac surgery, with an incidence of 10-20% of post cardiac surgical patients [15][16][17][18][19]. The causes of lactic acidosis following cardiac surgery include both hypoxic (type A) and non-hypoxic causes (type B). Type A would include inadequate oxygen delivery during cardiopulmonary bypass (CPB), low cardiac output, severe anemia/hemodilution, and Type B would include exogenous catecholamines (epinephrine, isoproterenol, and salbutamol) [40,41]. Among

Discussion
ML consensus clustering algorithms offer the ability to efficiently analyze and identify unique clusters of patients with different characteristics in a large amount of data [24,25,38,39]. In this study, we identified three clinically distinct clusters of patients with lactic acidosis at time of ICU admission utilizing a ML unsupervised consensus clustering approach. The three clusters demonstrated different characteristics and were associated with unique clinical outcomes, including persistent lactic acidosis, hospital mortality, and 90-day mortality.
Cluster 1, designated as the reference cluster, consisted of patients of an older age who had underlying valvular heart disease, peripheral vascular disease, and hypertension. Of the three clusters, cluster 1 patients had the highest rate of elective ICU admission to cardiac surgery recovery Unit (CSRU) after cardiac surgery. These patients had the highest rates of vasopressor and mechanical ventilator utilization. While cluster 1 patients had the highest severity of anemia, they also had the highest level of pH, HCO 3 , and pO 2 among all of the clusters. While cluster 1 patients had the lowest Glasgow coma scale (GCS) scores on ICU admission, they were more often admitted electively after cardiac surgery and required mechanical ventilator. This could be due to sedation. While AKI occurred in 76% of cluster 1 patients, severe AKI requiring renal replacement therapy only composed 2% of these patients. Average blood lactate level in cluster 1 was 5.7 mmol/L, and only 9.2% had persistent lactic acidosis 48 h after ICU admission. This corresponds to the literature where elevated lactate levels are frequently noted after cardiac surgery, with an incidence of 10-20% of post cardiac surgical patients [15][16][17][18][19]. The causes of lactic acidosis following cardiac surgery include both hypoxic (type A) and non-hypoxic causes (type B). Type A would include inadequate oxygen delivery during cardiopulmonary bypass (CPB), low cardiac output, severe anemia/hemodilution, and Type B would include exogenous catecholamines (epinephrine, isoproterenol, and salbutamol) [40,41]. Among these possible causes, prolonged CPB duration and intraoperative vasopressor requirements have been shown to be strong independent risk factors for lactic acidosis on ICU admission in patients undergoing cardiac surgery [17,[42][43][44]. While mildly elevated lactate levels after cardiac surgery are frequently transient and usually considered benign [41,45], studies have consistently demonstrated that elevated lactate levels > 4 mmol/L are a prognostic marker for worse outcomes, including mortality after cardiac surgery [13,14,17,41]. Nevertheless, cluster 1 patients in our study had the lowest incidence of persistent lactic acidosis, and lowest in-hospital and 90-day mortality risks among the three clusters.
Patients in cluster 2 were the youngest and had the highest baseline eGFR among the three clusters. Cluster 2 consisted of surgical patients, especially those admitted to trauma surgical ICU (TSICU) following trauma. They had the highest blood pressure values, least severe anemia, and lowest vasopressor requirement among the three clusters. While cluster 2 patients had the highest hemoglobin among the three clusters, this could have been due to blood transfusion administered prior to ICU admission for trauma and surgical patients [46][47][48][49]. In addition, cluster 2 had the lowest serum potassium, magnesium, and phosphate levels. Average lactate levels in the cluster 2 were 5.6 mmol/L, which is comparable to cluster 1. Lactic acidosis in patients after trauma often occurs due to the metabolic response from an oxygen supply-demand mismatch in the setting of hypoxia, hemorrhage, and anaerobic metabolism [46][47][48][49]. Lactate level has been shown to be a prognostic biomarker in trauma, even in patients without hypotension [9,10,47] and elevated blood lactate levels (>4 mmol/L) among trauma and surgical patients admitted to the ICU are closely correlated with worse patient outcomes and a reduced chance of survival [50][51][52][53][54][55][56][57][58][59][60]. Our study demonstrated that 9.8% of patients in cluster 2 developed persistent lactic acidosis, which is comparable to cluster 1. Compared to cluster 1, patients in cluster 2 had increased in-hospital and 90-day mortality.
Among all clusters, cluster 3 had the highest degree of lactic acidosis. They also had the lowest arterial pH, pO 2 , and HCO 3 levels. Moreover, cluster 3 patients had the highest prevalence of liver disease, coagulopathy, fluid and electrolyte disorders, and metastatic cancer. More than half of these patients were admitted to the MICU. Cluster 3 patients also had the highest Charlson and SAPs II scores, signifying high comorbidity burden and severity of acute illness. On ICU admission, they had the lowest blood pressure but the highest incidence of positive blood culture, acute kidney injury, and hyperphosphatemia. Thus, lactic acidosis in cluster 3 patients likely occurred in the setting of tissue hypoperfusion due to shock. Additionally, liver disease and metastatic cancer may have contributed to a type B lactic acidosis [61][62][63]. While diabetes was not a main feature that differentiated the three clusters, it was noted to be more prevalent in cluster 3 than other clusters, and metformin is an uncommon but important cause of type B lactic acidosis in the ICU [20,21]. In addition, the higher prevalence of liver disease and greater rate of AKI in this cluster could also lead to a reduction of lactate clearance. Altogether, cluster 3 patients had the highest incidence of persistent lactic acidosis in 48 h, in-hospital mortality, and 90-day mortality among the three clusters.
The strengths of our study include innovative findings via an unbiased and easily reproducible unsupervised ML consensus clustering approach derived from a large sample population of ICU patients with lactic acidosis. Nevertheless, there are several important limitations. First, we did not have information on blood transfusion prior to ICU admission, and thus cluster 2 with its association with the highest hemoglobin could have been affected by blood transfusions that commonly occur in trauma patients. Furthermore, data on medications prior to ICU admission (such as metformin or vasopressors) were lacking. Thus, we could not investigate whether metformin or other known causes of drug-induced lactic acidosis may have played an important role in the clustering approach of our study. Future studies are needed to assess whether the incorporation of these variables could have improved the discriminatory ability of the clusters we identified. Lastly, consensus clustering was performed on ICU admission and did not include clinical data before or during ICU stay, which could affect ICU-related outcomes. Nevertheless, we only included data that were readily available at the time of ICU admission as this would mimic the phenotype that clinicians would initially be provided with. By this method, we successfully identified distinct clusters of lactic acidosis on ICU admission that were associated with unique clinical outcomes, including persistent lactic acidosis, in-hospital mortality, and 90-day mortality.

Conclusions
In conclusion, we present an unsupervised ML consensus clustering analysis of critically ill patients with lactic acidosis on ICU admission. We discovered three distinct phenotypes of lactic acidosis on ICU admission with unique characteristics and hospital outcomes. While our findings provide a better understanding of the associated different outcomes for distinct subtypes of patients with lactic acidosis on ICU admission, future studies are needed to further investigate the impact of ML application clustering analysis to the care of ICU patients with lactic acidosis.