Machine Learning Consensus Clustering Approach for Hospitalized Patients with Dysmagnesemia

Background: The objectives of this study were to classify patients with serum magnesium derangement on hospital admission into clusters using unsupervised machine learning approach and to evaluate the mortality risks among these distinct clusters. Methods: Consensus cluster analysis was performed based on demographic information, principal diagnoses, comorbidities, and laboratory data in hypomagnesemia (serum magnesium ≤ 1.6 mg/dL) and hypermagnesemia cohorts (serum magnesium ≥ 2.4 mg/dL). Each cluster’s key features were determined using the standardized mean difference. The associations of the clusters with hospital mortality and one-year mortality were assessed. Results: In hypomagnesemia cohort (n = 13,320), consensus cluster analysis identified three clusters. Cluster 1 patients had the highest comorbidity burden and lowest serum magnesium. Cluster 2 patients had the youngest age, lowest comorbidity burden, and highest kidney function. Cluster 3 patients had the oldest age and lowest kidney function. Cluster 1 and cluster 3 were associated with higher hospital and one-year mortality compared to cluster 2. In hypermagnesemia cohort (n = 4671), the analysis identified two clusters. Compared to cluster 1, the key features of cluster 2 included older age, higher comorbidity burden, more hospital admissions primarily due to kidney disease, more acute kidney injury, and lower kidney function. Compared to cluster 1, cluster 2 was associated with higher hospital mortality and one-year mortality. Conclusion: Our cluster analysis identified clinically distinct phenotypes with differing mortality risks in hospitalized patients with dysmagnesemia. Future studies are required to assess the application of this ML consensus clustering approach to care for hospitalized patients with dysmagnesemia.

With the advancement of electronic medical record (EMR) and artificial intelligence, machine learning (ML) approaches have been developed as part of precision medicine to assist in clinical decision-making, including disease detection, medical imaging, and explainable risk prediction [21][22][23][24][25][26][27][28][29]. In recent years, unsupervised ML algorithms have been utilized to reveal the patterns of diseases such as diabetes and cardiovascular diseases [30][31][32][33]. Consensus clustering is an unsupervised ML technique used to identify patterns of data, and provides a visualization tool to inspect cluster numbers, membership, and boundaries [34]. It can be utilized to search for similarities and heterogeneities among data and isolate them into clinically meaningful clusters [22,35]. Recent investigations have demonstrated that ML clustering methods can distinguish meaningful disease subtypes associated with different clinical outcomes [36,37]. Given the heterogeneity of patients with dysmagnesemia on hospital admission [8], the application of ML consensus clustering may help identify distinct phenotypic and clinicopathological clusters of dysmagnesemia that are associated with different clinical outcomes.
In this study, we aimed to identify clinically meaningful clusters of hospitalized patients with dysmagnesemia on hospital admission using an unsupervised ML approach and to assess mortality risks among these distinct clusters.

Patient Population
Adult patients (age ≥ 18 years) admitted to Mayo Clinic in Rochester, Minnesota, USA from January 2009 to 31 December 2013 were screened. Patients with serum magnesium outside the normal reference range (1.7-2.3 mg/dL) on hospital admission were included. Patients who did not have serum magnesium measurement within 24 h of hospital admission or had normal serum magnesium on hospital admission were excluded. Patients were divided into 2 cohorts: (1) hypomagnesemia cohort (serum magnesium ≤ 1.6 mg/dL) and (2) hypermagnesemia cohort (serum magnesium ≥ 2.4 mg/dL). The Mayo Clinic Institutional Review Board approved this study (IRB number 21-003088 and date of approval; 30 March 2021). All included patients provided research authorization.

Data Collection
As previously described, we collected pertinent demographic information, principal diagnoses, comorbidities, and laboratory data from our hospital's EMR [8,14]. Only available data within 24 h of hospital admission were incorporated into cluster analysis. If there were multiple laboratory values, the first one within the 24-h time frame was used. We excluded variables with more than 10% missing data. If the variable had missing data less than 10%, the missing data were imputed using Random Forest multiple imputation technique before inputting the data into cluster analysis [38].

Cluster Analysis
Unsupervised ML consensus clustering analysis was applied to identify clinical clusters of hypomagnesemia and hypermagnesemia cohorts [39]. We utilized a pre-specified subsampling parameter of 80% with 100 iterations. The number of possible clusters (k) was selected to be between 2 to 10 in order to avoid excessive numbers of clusters that would not be clinically useful. The ideal number of clusters was ascertained by evaluating the cumulative distribution function (CDF), consensus matrix (CM) heat map, clusterconsensus plots in the within-cluster consensus scores, and the proportion of ambiguously clustered pairs (PAC) [34,40]. The within-cluster consensus score (range 0-1) is defined as the average consensus value for all pairs of individuals belonging to the same cluster [34]. A value closer to one indicates better cluster stability [34]. PAC (range 0-1) is calculated as the proportion of all sample pairs with consensus values falling within the predetermined boundaries [40]. A value closer to 0 signifies higher cluster stability [40]. The details regarding the consensus cluster algorithms can be found in the online supplementary.

Statistical Analysis
After cluster identification, subsequent analyses were performed to characterize differences among the clusters. Clinical characteristics between the clusters were compared using Student's t-test for continuous variables and Chi-squared test for categorical variables. The key features of each cluster were determined using the standardized mean difference in clinical characteristics between each cluster and the overall cohort, and clinical characteristics with absolute standardized mean difference of >0.3 were included. Hospital mortality and one-year mortality were compared among the clusters. Logistic regression was used to assess the association of the cluster with hospital mortality, and odds ratio (OR) with 95% confidence interval (95% CI) was reported. In contrast, Cox proportional hazard regression was used to assess the association of the cluster with one-year mortality, and hazard ratio (HR) with 95% CI was reported. We did not adjust for differences in clinical variables between groups because these variables were utilized through unsupervised machine learning to identify the clusters. We used the ConsensusClusterPlus package (version 1.46.0) for consensus clustering analysis, and the "missForest" package for missing data imputation [41]. We used R, version 4.0.3 (RStudio, Inc., Boston, MA, USA) for all analyses.

Hypomagnesemia Cohort
There were 65,974 hospitalized patients with available admission serum magnesium measurement. A total of 13,320 (20%) patients presented with hypomagnesemia on hospital admission. The mean age was 61 ± 17 years. 47% were male. The mean estimated glomerular filtration rate (eGFR) was 76 ± 31. The mean admission serum magnesium was 1.5 ± 0.2 mg/dL.
The CDF plot displays the consensus distributions for each hypomagnesemia cluster ( Figure 1A, Supplementary Figure S1). The delta area plot, in turn demonstrates the relative change in the area under the CDF curve ( Figure 1B, Supplementary Figure S2). The largest changes in area occurred between k = 2 and k = 4. Beyond this range, the relative increase in area became significantly smaller. The CM heatmap (Figure 2A, Supplementary Figures S3-S11) reveals that the ML algorithm identified k = 2 and 3 with clear boundaries (Figure 2A), indicating good cluster stability over repeated iterations. K = 2 and 3 also had high stability given their high mean cluster consensus score ( Figure 3A). K = 3 exhibited favorably low PACs (Supplementary Figure S12); Thus, the consensus clustering analysis from available hospital admission baseline characteristics identified three clusters that best represented the data pattern of our patients admitted with hypomagnesemia.  Cluster 1 had 3446 (26%) patients. Cluster 2 had 4351 (33%) patients. Cluster 3 had 5523 (41%) patients. As shown in Table 1, baseline characteristics significantly differed among the three clusters in the hypomagnesemia cohort.
Based on standardized mean difference shown in Figure 4, cluster 1 was mainly characterized by higher comorbidity burden and lower serum magnesium, albumin, and calcium. On the other hand, cluster 2 was mainly characterized by younger age, lower comorbidity burden, especially less history of hypertension, diabetes mellitus, coronary artery disease, and leukemia/lymphoma, less use of angiotensin converting enzyme inhibitors (ACEI)/angiotensin receptor blockers (ARB), and diuretics, less acute kidney injury (AKI), higher eGFR, and lower serum albumin and calcium. Lastly, cluster 3 was mainly characterized by older age, more history of hypertension, more use of ACEI/ARB, lower eGFR, higher serum potassium, magnesium, albumin, and calcium.
The hospital-mortality was 3.4% in cluster 1, 1.4% in cluster 2, and 2.3% in cluster 3 (<0.001) ( Figure 5A). Compared to cluster 2, cluster 1 and cluster 3 had higher odds of hospital mortality with OR of 2.45 (95% CI 1.79-3.35), and 1.63 (95% CI 1.20-2.22) respectively. The one-year mortality was 20.1% in cluster 1, 10.5% in cluster 2, and 16.8% in cluster 3 ( Figure 5B).   The bar plot represents the mean consensus score for different numbers of clusters (K ranges from two to ten) for patients with hypomagnesemia; (B) The bar plot represents the mean consensus score for different numbers of clusters (K ranges from two to ten) for patients with hypermagnesemia.
Cluster 1 had 3446 (26%) patients. Cluster 2 had 4351 (33%) patients. Cluster 3 had 5523 (41%) patients. As shown in Table 1, baseline characteristics significantly differed among the three clusters in the hypomagnesemia cohort. The bar plot represents the mean consensus score for different numbers of clusters (K ranges from two to ten) for patients with hypomagnesemia; (B) The bar plot represents the mean consensus score for different numbers of clusters (K ranges from two to ten) for patients with hypermagnesemia. Based on standardized mean difference shown in Figure 4, cluster 1 was mainly characterized by higher comorbidity burden and lower serum magnesium, albumin, and calcium. On the other hand, cluster 2 was mainly characterized by younger age, lower comorbidity burden, especially less history of hypertension, diabetes mellitus, coronary artery disease, and leukemia/lymphoma, less use of angiotensin converting enzyme inhibitors (ACEI)/angiotensin receptor blockers (ARB), and diuretics, less acute kidney injury (AKI), higher eGFR, and lower serum albumin and calcium. Lastly, cluster 3 was mainly characterized by older age, more history of hypertension, more use of ACEI/ARB, lower eGFR, higher serum potassium, magnesium, albumin, and calcium.   . The standardized differences across three clusters for each of baseline parameters for patients with hypomagnesemia and hypermagnesemia. The x-axis represents the standardized differences value, and the y axis represents baseline variables. The dashed vertical lines signify the standardized differences cutoffs of <−0.3 or >0.3. Abbreviations: AG, anion gap; AKI, acute kidney injury; BMI, body mass index; CHF, congestive heart failure; Cl, chloride; COPD, chronic obstructive pulmonary disease; CVA, cerebrovascular accident; DM, diabetes mellitus; ESKD, end stage kidney disease; GFR, glomerular filtration rate; GI, gastrointestinal; Hb, hemoglobin; HCO3, bicarbonate; K, potassium; ID, infectious disease; MI, myocardial infarction; Na, sodium; PVD, peripheral vascular disease; RS, respiratory system; SID, strong ion difference. The hospital-mortality was 3.4% in cluster 1, 1.4% in cluster 2, and 2.3% in cluster 3 (<0.001) ( Figure 5A). Compared to cluster 2, cluster 1 and cluster 3 had higher odds of hospital mortality with OR of 2.45 (95% CI 1.79-3.35), and 1.63 (95% CI 1.20-2.22) respectively. The one-year mortality was 20.1% in cluster 1, 10.5% in cluster 2, and 16.8% in cluster 3 ( Figure 5B).  Compared to cluster 2, cluster 1 and cluster 3 also had higher risk of one-year mortality with HR of 2.04 (95% CI 1.79-2.32) and 1.68 (95% CI 1.49-1.90) respectively (Table  2a).  Compared to cluster 2, cluster 1 and cluster 3 also had higher risk of one-year mortality with HR of 2.04 (95% CI 1.79-2.32) and 1.68 (95% CI 1.49-1.90) respectively (Table 2).

Hypermagnesemia Cohort
A total of 4671 (7%) patients presented with hypermagnesemia on hospital admission. The mean age was 65 ± 17 years. 61% were male. The mean eGFR was 56 ± 34. The mean admission serum magnesium was 2.6 ± 0.4 mg/dL.
The CDF plot displays the consensus distributions for each hypermagnesemia cluster ( Figure 1C, Supplementary Figure S13). The delta area plot, in turn demonstrates the relative change in the area under the CDF curve ( Figure 1D, Supplementary Figure S14). The largest changes in area occurred between k = 2 and k = 4. Beyond this range, the relative increase in area became significantly smaller. The CM heatmap ( Figure 2B, Supplementary  Figures S15-S23) reveals that the ML algorithm identified k = 2 with clear boundaries (Figure 2B), indicating good cluster stability over repeated iterations. K = 2 also had high stability given its high mean cluster consensus score ( Figure 3B). Favorably low PACs were demonstrated for k = 2 (Supplementary Figure S24). Thus, the consensus clustering analysis from available hospital admission baseline characteristics identified two clusters that optimally represented the data pattern of our patients admitted with hypermagnesemia.
Cluster 1 had 2445 (52%) patients while cluster 2 had 2226 (48%) patients. As shown in Table 1, clinical characteristics were significantly different between the two identified clusters in the hypermagnesemia cohort. Based on standardized mean difference shown in Figure 4, the key features of patients in cluster 2, compared to those of patients in cluster 1, included older age, higher comorbidity burden, especially more history of hypertension and diabetes mellitus, more AKI, more hospital admissions primarily due to kidney disease, and lower eGFR and serum albumin but higher potassium and phosphorus.

Discussion
ML consensus clustering algorithms offer the ability to efficiently analyze and identify clusters of patients with different characteristics in a large amount of data [22,35,42,43]. In this study, the unsupervised ML consensus clustering approach was utilized to distinguish patients with dysmagnesemia into distinct clusters. Among patients with hypomagnesemia on hospital admission, age, comorbidity burden, kidney function (with baseline eGFR and AKI on admission as surrogate markers), and degree of hypomagnesemia were important features to differentiate the phenotypes. Similarly, among patients with hypermagnesemia on hospital admission, age, comorbidity burden, kidney function, and principal genitourinary diagnosis were important features to differentiate the phenotypes.
Applying the unsupervised consensus clustering approach to the patient characteristics at the time of hospital admission, we identified three clinically distinct clusters of patients with concomitant hypomagnesemia. The three clusters demonstrated different characteristics and were associated with different hospital and one-year mortality risks. Even though the majority of the patients across all three clusters presented with similar conditions (mainly hematology/oncology-, cardiovascular-, and gastrointestinal-related conditions), the three clusters demonstrated different clinical outcomes.
Cluster 2, the reference cluster of hypomagnesemia, consisted of patients with younger age with the lowest comorbidity burden compared to other clusters. They also had higher eGFR, lower use of ACEI/ARB and diuretics, and lower incidence of AKI. Interestingly, they had the highest prevalence of alcohol use. They also had the lowest serum albumin, calcium, and phosphate compared to other clusters. Calcium and magnesium are electrolytes that normally bind to albumin. Alteration of circulating albumin levels alters the measured levels of these electrolytes. Measured or total calcium and magnesium alike are lower in the setting of concomitant low serum albumin [44][45][46]. In addition, it is possible that alcoholism and malnutrition played important roles in the development of hypomagnesemia in this patient population because of poor oral intake. Furthermore, these patients also had the highest principal diagnosis of gastrointestinal conditions on admission, which could have potentially resulted in gastrointestinal magnesium loss or redistribution of magnesium triggered by acute pancreatitis [47,48]. Given the fact that patients in cluster 2 were younger and had the lowest comorbidity burden, they had the lowest in-hospital and one-year mortality risks among the three clusters.
Compared to the patients in cluster 2 of hypomagnesemia, those in clusters 1 and 3 were older and had higher comorbidity burden, reduced kidney function (lower baseline eGFR and higher incidence of AKI), and higher use of ACEI/ARB and diuretics. Patients in cluster 1 had the highest comorbidity burden and the highest prevalence of cancer among the three clusters. They also presented more frequently with a principal diagnosis of infectious disease on hospital admission. Furthermore, they had the lowest serum magnesium. A number of cancer-specific therapies can cause hypomagnesemia via renal magnesium wasting, including platinum-based chemotherapy, anti-epidermal growth factor receptor (EGFR) monoclonal antibodies, inhibitors of human epidermal growth factor receptor 2 (HER2), and calcineurin inhibitors [49]. In addition, cancer patients frequently use medications that cause or exacerbate hypomagnesemia, such as proton pump inhibitors (PPIs), diuretics, and chemotherapy [49]. These patients in cluster 1 had the highest in-hospital and one-year mortality risks among the three clusters, which could potentially be due to poor outcomes among cancer patients with hypomagnesemia [8,50]. Previous studies also reported that hypomagnesemia is associated with increased mortality risk among patients with infection [10,51,52], which is one of the main characteristics of the patients in this cluster.
Patients in cluster 3 of hypomagnesemia were the oldest among the three clusters. They had the highest prevalence of hypertension, coronary artery disease, congestive heart failure, and principal diagnosis of cardiovascular disease on hospital admission. They also had the highest use of diuretics, which could lead to urinary magnesium wasting. Previous studies have shown that hypomagnesemia in patients with cardiovascular diseases carries a high mortality rate [53,54]. While these patients in cluster 3 had increased in-hospital mortality and one-year mortality compared to those in cluster 2, they had lower mortality when compared to those in cluster 1 (phenotype of cancer patients with hypomagnesemia) despite having the oldest age group among the three clusters.
Within the cohort of hypermagnesemia on hospital admission, we identified two clinically distinct clusters by ML consensus clustering approach. Compared to cluster 1, the key features of cluster 2 included older age, higher comorbidity burden, more hospital admissions primarily due to kidney disease, more AKI, and lower eGFR. Because of the reduction in kidney function, these patients in cluster 2 likely had reduced ability to renally excrete magnesium, resulting in hypermagnesemia. Reported clinical cases include the administration of antacids or magnesium supplements in older patients or those with reduced kidney function [55]. Conversely, patients in cluster 1 were younger but had higher alcohol use and were more likely to be admitted with principal diagnoses of injury/poisoning and hematology/oncology. As such, it is possible that hypermagnesemia among patients in cluster 1 could have resulted from excessive tissue injury or breakdown (such as tumor lysis syndrome and burn injury) [56,57]. Compared to those in cluster 1, patients in cluster 2 had higher hospital mortality and one-year mortality.
The strengths of this study include a large sample size and unbiased data manipulation, and easy reproducibility by unsupervised ML consensus clustering. There are also, however, limitations that should be noted. First, the data were abstracted from a single-center, and our patient population was predominantly Caucasian, which might limit the extrapolation of our findings to other populations. Second, consensus clustering was performed on hospital admission and did not include data before or during hospitalization, which could affect hospitalization-related outcomes. Third, we did not have data on magnesium supplements or other medications that might have affected serum magnesium, such as antibiotics, proton pump inhibitors, and chemotherapy. Fourth, some relevant laboratory results were not available, including 24-h urinary magnesium, fractional urinary excretion of magnesium, and genetic testing (for conditions that might have caused hypomagnesemia in adults, such as Gitelman syndrome). These investigations are not commonly performed on hospital admission and were thus not included in our ML clustering algorithm. Therefore, future studies are required to assess whether these variables could have improved the discriminatory ability of these clusters we identified. Nevertheless, we included readily available data at the time of hospital admission and successfully identified distinct clusters of dysmagnesemia associated with different clinical outcomes.

Conclusions
In summary, we present an unsupervised ML consensus clustering analysis of hospitalized patients with dysmagnesemia. We discovered three distinct phenotypes of admission hypomagnesemia and two distinct phenotypes of admission hypermagnesemia with different hospital and one-year mortality risks. With the advancement of EMR and artificial intelligence, future studies are required to evaluate and validate the application of this ML consensus clustering approach to care for hospitalized patients with dysmagnesemia.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/diagnostics11112119/s1, Figures S1-S23: Figure S1 and S13, consensus CDF; Figure S2 and S14, relative change in area under CDF curve; Figures S3-S11 and S15-S23, consensus matrix heat map depicting consensus values on a white to blue color scale of each cluster; Figures S12 and S24 PAC.  Data Availability Statement: Data is available upon reasonable request to the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.