Novel Phenotyping for Acute Heart Failure—Unsupervised Machine Learning-Based Approach

Acute heart failure (AHF) is a life-threatening, heterogeneous disease requiring urgent diagnosis and treatment. The clinical severity and medical procedures differ according to a complex interplay between the deterioration cause, underlying cardiac substrate, and comorbidities. This study aimed to analyze the natural phenotypic heterogeneity of the AHF population and evaluate the possibilities offered by clustering (unsupervised machine-learning technique) in a medical data assessment. We evaluated data from 381 AHF patients. Sixty-three clinical and biochemical features were assessed at the admission of the patients and were included in the analysis after the preprocessing. The K-medoids algorithm was implemented to create the clusters, and optimization, based on the Davies-Bouldin index, was used. The clustering was performed while blinded to the outcome. The outcome associations were evaluated using the Kaplan-Meier curves and Cox proportional-hazards regressions. The algorithm distinguished six clusters that differed significantly in 58 variables concerning i.e., etiology, clinical status, comorbidities, laboratory parameters and lifestyle factors. The clusters differed in terms of the one-year mortality (p = 0.002). Using the clustering techniques, we extracted six phenotypes from AHF patients with distinct clinical characteristics and outcomes. Our results can be valuable for future trial constructions and customized treatment.


Introduction
Acute heart failure (AHF) is a life-threatening challenge in a clinical approach, causing a growing number of hospitalizations and a high in-hospital as well as post-discharge mortality range [1]. The present epidemiological situation (e.g., aging population, improved myocardial infarction survival) tends to increase the prevalence of chronic HF, resulting in hospitalization in the near future [2]. Over the years, the approach to the clinical manifestation of AHF has changed; however, it was always crucial for phenotype patients to provide them with a better individual treatment. The AHF diagnostic process starts with the first medical contact and is aimed at identifying the clinical presentation [1]. The clinical severity and medical procedures differ according to a complex interplay between the deterioration cause, underlying cardiac substrate, and comorbidities. It is recommended to stratify AHF patients based on the presence of signs of congestion and/or peripheral hypoperfusion at admission. According to the 2021 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure, we can distinguish four clinical presentations of AHF: acute decompensated heart failure, acute pulmonary oedema, isolated right ventricular failure and cardiogenic shock, for which phenotyping may bring therapeutic and prognostic value [3]. It could be, nevertheless, questionable if a physical examination and simple dichotomous subgroups sufficiently reflect the complexity of the pathophysiology of AHF and the heterogeneity of AHF patients. These shortcomings in understanding the underlying correlations may be the reason for poor survivability [4].
Machine learning, especially statistical clustering, which is an unsupervised technique that attempts to learn the internal structure of data, might be a feasible tool for elucidating the hidden phenotypic characteristics for a better understanding of the vital differences between clinically important subpopulations [5][6][7][8]. Machine learning approaches have been successfully used in analyzing molecular data for many years. Recently, used with clinical variables, cluster analysis proved itself to be effective in the study of the phenotype characteristics of diseases in chronic heart failure with reduced ejection fraction [9] (HFrEF) as well as a preserved ejection fraction (HFpEF) [5].
According to the aforementioned studies, we implemented machine-learning algorithms for AHF patients and their clinical variables obtained at admission alone. We blinded them to the outcomes to detect novel patterns by subgrouping the patients at the first medical contact. By identifying them in such a manner, we hypothesized that subpopulations of patients would have different pathophysiological characteristics and varying outcomes.

Study Population
We retrospectively analyzed 381 patients hospitalized due to AHF based on two AHF registries that ran at our institution in 2010-2012 and 2016-2017. Patients were treated and heart failure diagnosis was stated following current ESC guidelines. The inclusion and exclusion criteria were elaborated in our previous references [10]. There were no differences in the collected patients' demographic data or the design of the evaluated registries, except for the criteria of acute heart failure diagnosis, which were slightly varied in the subsequent (2013 and 2016) ESC guidelines.

Machine Learning and Statistical Analysis
As we aimed to evaluate the baseline heterogeneity of AHF patients, only the variables evaluated at admission were included. The analysis was performed blinded to the outcome; therefore, the follow-up variables were excluded ( Figure 1). Initially, 88 variables were divided into domains and selected for the study (Table 1). Then, the automatic preprocessing was performed. The low-quality variables were defined as those with over 90% stability and 10% missing values, and 25 such variables were deleted. Furthermore, remove, which correlated with r = 0.6, was implemented, but 0 variables were found and removed. Sixty-three variables were eventually included in the cluster analysis (Table 1). Due to clustering algorithms' inability to cope with the missing values, they were replaced by mean values. Range transformation normalization (range: 0 to 1) was performed. The nominal parameters were transformed into numerical parameters.

of 20
Biomedicines 2022, 10, x FOR PEER REVIEW 3 of 21 Figure 1. Flowchart of the analyzed variables and patients. The analysis was conducted based on the previously prepared data, therefore, some of the information was duplicated or inadequate for the machine-learning analysis. Figure 1. Flowchart of the analyzed variables and patients. The analysis was conducted based on the previously prepared data, therefore, some of the information was duplicated or inadequate for the machine-learning analysis. Cluster analysis is an unsupervised machine-learning method which divides the set of variables into smaller groups (clusters) based on their similarity. The clusters are composed of cases which are consistent with each other, but not with other collections. Several clustering algorithms have been described. This analysis uses the k-medoids algorithm to obtain clusters (k-medoids operator in RapidMiner). The number of groups has not been assumed in advance. The optimize parameters operator was used to reveal the most accurate cluster quantity and characteristics. The clustering.k and clustering.numerical_measure parameters were used to optimize the clustering, and the Davies-Bouldin index was chosen as the main criterion. The number of clusters was set between 3 and 6 to avoid excessive dataset fragmentation.
K-medoids is a clustering algorithm that requires that the number of resulting clusters (value of parameter K) is specified in advance. Unlike k-means clustering, where the centroids are computed as the average values of data points (examples) within a cluster, the Biomedicines 2022, 10, 1514 5 of 20 centroids in the k-medoids algorithm corresponds to the existing data points. This makes the centroids better interpreted. The clustering is based on measuring the distance between the examples; examples in a cluster are similar to each other. The clustering algorithm repeatedly re-assigns the examples into a given number of clusters by minimizing their distance to a centroid and recomputes the centroids. Thus, the concrete distance measure is another important parameter of the method.
Thanks to the option of the automated parameter tuning implemented in RapidMiner, we allowed the system to change the number of clusters K in the range of 3 to 6 and the numeric distance/similarity measure to take any value from the list: This results in more than 50 particular runs of the clustering algorithm. It seems that the parameter which primarily affects the quality of clustering is the number of clusters. The clustering quality (in terms of the Davies-Bouldin index) improves with an increasing number of clusters. We achieved the best results (lowest Davies-Bouldin index) for clustering into six clusters by using the correlation similarity measure.
The associations between the clusters and clinical features were assessed. The variables which presented a normal distribution were described as a mean ± standard deviation, and the non-normal variables were presented as medians and interquartile ranges. The categorical variables were shown as numbers and percentages ( Table 2). The normality of the distribution was checked using the K-S, Lilliefors and Shapiro-Wilk tests. The statistical significance of differences between groups was assessed using analysis of variance, chisquare and ANOVA. The outcome associations were evaluated using the Kaplan-Meier curves and Cox proportional-hazards regressions ( Figure 1). A p-value below 0.05 was considered statistically significant. Clustering and preprocessing were performed using RapidMiner 9.1 (RapidMiner GmbH, Dortmund, Germany) , and the statistical analysis was performed using STATISTICA 12 ((StatSoft Polska Sp. z o.o., Krakow, Poland)).

Cluster Key Clinical Feature
Cluster 0 Lowest % of chronic HF, most massive lower limbs oedema, highest urine urea, k, creatinine, highest ferritin, highest % of NYHA I, lowest % stroke history, better prognosis-highest % of de novo HF, with preserved renal function.

Cluster 1
Higher % of women than in the rest of the population, highest systolic pressure, highest hypertension, diabetes, chronic obstructive pulmonary disease and stroke history (lowest GFR, lowest urine creatinine, urea and K, lowest NTproBNP), most massive pulmonary congestion and least massive peripheral oedema, highest hypertension etiology, better prognosis-hypertensive, diabetic patients with advanced atherosclerosis and comorbidities, diminished renal function, elderly population with a significant part of de novo HF.

Clustering
The population was divided into six cluster groups by analysis of 63 variables. Clusters have been enumerated from 0 to 5. The variables that were included in the analyses are presented in Table 1.
Cluster 0 (n = 86) This was the largest cluster and included the highest percentage of patients with HF de novo, qualified as NYHA I, presenting with severe lower extremity edema on admission, and the highest urine K+, creatinine and urea levels. Moreover, this cluster had the highest ferritin levels and the lowest percentage of patients with a history of stroke.
Cluster 1 (n = 50) Among the other clusters, this cluster was mostly represented by women with the highest prevalence of hypertension, diabetes, COPD and stroke history. On admission, this cluster presented with the highest systolic blood pressure, the highest percentage of patients with severe pulmonary congestion and the least severe signs of peripheral congestion. The NTproBNP, GFR, urine K+, urea and creatinine levels were the lowest in this cluster.

Clustering
The population was divided into six cluster groups by analysis of 63 variables. Clusters have been enumerated from 0 to 5. The variables that were included in the analyses are presented in Table 1.
Cluster 0 (n = 86) This was the largest cluster and included the highest percentage of patients with HF de novo, qualified as NYHA I, presenting with severe lower extremity edema on admission, and the highest urine K+, creatinine and urea levels. Moreover, this cluster had the highest ferritin levels and the lowest percentage of patients with a history of stroke.
Cluster 1 (n = 50) Among the other clusters, this cluster was mostly represented by women with the highest prevalence of hypertension, diabetes, COPD and stroke history. On admission, this cluster presented with the highest systolic blood pressure, the highest percentage of patients with severe pulmonary congestion and the least severe signs of peripheral congestion. The NTproBNP, GFR, urine K+, urea and creatinine levels were the lowest in this cluster.
Cluster 2 (n = 70) On average, this cluster was represented by the youngest patients, the highest percentage of active smokers, and qualified as NYHA IV and HF etiology was classified as other. Additionally, this cluster presented the lowest percentage of ischemic HF etiology, hypertension, diabetes and pulmonary congestion. On admission, they presented with the highest GFR, NTproBNP, AST, ALT and bilirubin serum levels and the lowest levels of troponin, CRP, and IL-6 serum levels.
Cluster 3 (n = 71) This cluster consisted of the highest percentage of patients who qualified as HFrEF, the highest percentage of patients who decompensated in CHF, and the highest ratio of patients with valvular heart disease. On admission, this cluster was represented by the highest proportion of patients presenting with the most severe pulmonary congestion and lowest WBC, ferritin, TSAT, urine Na+, lactates and highest troponin, INR and albumin in the laboratory measurements.
Cluster 4 (n = 50) This cluster was mostly represented by men, smokers with a CAD HF etiology and the lowest EF. On admission, they presented with the highest ratio of hepatomegaly, ascites, the highest JVP and the least frequent severe pulmonary congestion. They also had the highest creatinine and urea serum levels. For the arterial blood gases, this cluster presented with the lowest pCO 2 and the highest pH.
Cluster 5 (n = 54) The characteristics of these patients appeared to be the oldest population, with the highest percentage of women and the highest EF, lowest body mass, and no CAD history. Moreover, the highest level of CRP and Il-6 serum levels was in this group of patients.

Prognostic Significance of Clusters
The one-year mortality was 27% (104 events). The mean hospital stay was 8.6 ± 6.7 days.
The one-year mortality from cluster 0 to cluster 5 was: 26% vs 22% vs 17% vs 21% vs 40% vs 43%, p = 0.002, respectively (Table 4).   Figure 3 shows the Kaplan-Meier curves for the one-year mortality risks by clusters. Table 5. Hazard ratios for one-year mortality and two-year mortality; each cluster was compared with the rest of the population.  Figures 3 and 4 show the Kaplan-Meier curves for the one-year and two-year mortality risks by clusters.

Discussion
A cluster analysis was applied to the cohort of 381 AHF patients. Both the clinical and biochemical variables were included and were either continuous or numerical. When writing this article, this was the most numerous analysis of such a type done in a European AHF population. Six clinically and pathophysiological relevant phenotypes were distinguished. The clusters varied in outcomes, including mortality and AHF re-hospitalization rates. Notably, the number of groups has not been prespecified, as in previous papers on the AHF population [1], but mathematically assessed. The quantity of the analyzed population allowed us to distinguish the highest number of virtually equally dense clusters [4,8,[11][12][13], which provide the most thorough insight into an AHF's population heterogeneity. Although during the collection of both registries guidelines for the treatment of heart failure have changed and a variety of new drugs have been implemented in therapy, such as angiotensin receptor neprilysin inhibitor, sodium-glucose co-transporter-2 inhibitors, a new class of beta-blockers and mineralocorticoid receptor antagonists, distinguished clusters seem to be resistant to that changes, because we have not included pre-admission treatment into cluster analysis. The decision above was dictated by practical reasons. According to the characteristics of the studied population and the numerous comorbidities with their special treatment, the quality of the analysis would not have been enhanced by including them. Therefore, the new drugs and guidelines are very unlikely to impact our cluster analysis, especially in terms of the cluster composition, which was based on the clinical and biochemical profiles at admission. The new guidelines would rather impact the patients' prognosis. Noteworthy, these changes had very little, if any, impact on the outcomes of the population of patients with AHF. The one-year mortality of the studied population was 27%, which is not very distant from the current numbers (25-30%) [1]. Below, we present a detailed description of the clusters grouped according to their distinguishing clinical feature.

Clusters 1 and 4
Clusters 1 and 4 included patients with a high number of cardiovascular and noncardiovascular comorbidities. In both these groups, coronary artery disease was the predominant etiology of heart failure.
Although these two clusters demonstrated similarities in terms of etiology, their prognosis and clinical outcome were significantly different. Cluster 1 had a relatively good prognosis, while cluster 4 had a poor clinical outcome. The one-year mortality was equal to 22% in cluster 1 and was almost twice as high in cluster 4 (40%), which can be explained by two factors.
The results of this study indicate that gender has a significant impact on the development of coronary artery disease and the progression of heart failure. It is well known that the male gender is itself a risk factor for cardiovascular events, and the prevalence of cardiovascular disease is higher in men than in women of a similar age [14].
In the case of the male population, the risk of cardiovascular disease increases linearly over time, and the atherosclerotic process develops continuously. On the other hand, due to the protective role of estrogen and its beneficial effects on the cardiovascular system, women of a fertile age may be protected from atherosclerosis [15][16][17][18]. This statement is consistent with our observations. Despite many risk factors, only 56% of patients from cluster 1(female-dominated) developed CAD, and only 40% had an MI. These values were significantly higher in the male-dominated population represented by cluster 4.
Additionally, as is commonly known, the incidence of stroke increases significantly in the postmenopausal period [19,20]. This also aligns with our observations, as the highest rate of stroke was reported for cluster 1.
It can, therefore, be inferred that gender plays a significant role in the development of cardiovascular diseases, and we assume that the differences in prognosis and clinical outcomes between these two groups could be partially explained by this fact. However, what determines the differences between these two clusters' prognoses, for the most part, is their renal function.
Importantly, the phenotype of cluster 4 reflects the common problem of cardiorenal syndrome (the highest mean value of creatinine (1.36) and urea (64)) and right ventricular failure with the highest incidence of ascites, JVP and hepatomegaly, which constitute a sign of congestion. Cardiorenal syndrome and volume overload are well-documented predictors of poor outcomes [21] and are strongly associated with each other. Therapy for heart failure patients with cardiorenal syndrome remains a challenge. Its main goal should be reasonable decongestion, which can be achieved by natriuresis-guided diuretic therapy, ultrafiltration, or, in the refractory cases, experimental techniques.

Cluster 2
Patients included in cluster 2 were the youngest (mean age 58.8) and had the highest NTproBNP (7189), bilirubin (1.25), Ast (30), and Alt (34.5), and had the lowest ejection fraction (28%), serum creatinine concentration (1,1) incidence of diabetes (19%), pulmonary congestion (17%), COPD (5.7%), HT (39%), CAD (1.4%) and MI in the past (1.4%). These patients constituted the highest percentage of active smokers (30%) and alcohol consumers (44%). The underlying cause of HF was mostly valvular (21%) or other (73%). We assume that the presented phenotype, especially the elevated concentration of liver enzymes and frequent tobacco and alcohol use, suggests a significant role in toxic myocardial damage. Importantly, cluster 2 was associated with the most positive prognosis. It can be explained by the youngest age, low morbidity, and high potential compensatory reserves. Therefore, these patients represent great therapeutic potential, and clinicians should focus on education in the context of eliminating the harmful impact of xenobiotics.

Cluster 3
The distinguishing features of cluster 3 (n = 71) were the incidence of chronic heart failure (93%) and iron deficiency The prevalence of iron deficiency (defined as a serum ferritin < 100 ng/mL or TSAT < 20) [22] is common in this population. In comparison to the other groups, cluster 3 represented the lowest mean value of ferritin (92) and TSAT (14.8%).
Iron deficiency is a frequent comorbidity in heart failure, present in approximately 30-50% of patients and is associated with worse long-term outcomes [23][24][25].
The detrimental effect of imbalanced iron homeostasis on HF progression has been widely studied; however, it remains unclear what the exact mechanism is by which an iron deficit worsens HF. It appears that there is a wide range of factors involved in this process.
First of all, iron deficiency alters mitochondrial function and impairs the already disturbed energetics of the heart with a reduced ejection fraction [26].
Secondly, in the condition of iron deficiency anemia, depleted oxygen delivery to the metabolizing tissues induces a variety of hemodynamic, renal, and neurohormonal alterations [27]. Volume expansion (caused by sympathetic and RAA activation), as well as vasodilatation, leads to an increase in cardiac output. All these mechanisms result in an increased myocardial workload and further hypertrophy/remodeling of LV, which contributes to worsening HF [28].
We assume that the iron deficiency could explain the mean prognoses of patients in this cluster and constitute a relatively easy-achievable therapeutic goal to improve these patients' outcomes.

Cluster 1 and Cluster 5
Both clusters 1 and 5 include mostly elderly (mean age 76.1 in both clusters) women (46% and 44%, respectively), with the highest ejection fraction (47.5% and 50%) and a high incidence of hypertension (94% and 87%). The presented phenotype corresponds to the well-established HFpEF patient characteristics [29]. The clusters present the most frequent incidence of massive pulmonary congestion (congestion auscultated over twothirds of the lungs in the 22% and 13%), which is reflected by the highest proportion of the NYHA IV (54% and 57%). The highest mean values of the pCO 2 (37.2 mmHg and 36.2 mmHg) reflect the most massive pulmonary oedema or the relatively high incidence of lung comorbidities, especially COPD in the HFpEF population [30]. Despite the apparently similar phenotypes, the clusters significantly differed in outcomes (Figure 3). Cluster 1 presented a relatively good prognosis, and conversely, cluster 5 was associated with an ominous outcome. The features that especially differ the clusters are the types of HF (chronic/de novo) and the concentration of the inflammatory markers. Cluster 1 consisted mostly of the patients who presented with their first episode of HF (56%), and cluster 5 were the patients suffering from chronic HF (70%). The duration of HF is a well-established prognostic factor. Moreover, cluster 1 presented the highest natriuresis, probably due to the effect of the first presentation of HF and frequent loop-diuretics naiveness and, as a sign of adequate diuretic response, predicted a favorable outcome [31]. Subsequently, cluster 5 presented with the highest mean concentration of inflammatory biomarkers-CRP and Il-6, which, with the lowest mean body weight (74.9), suggests frailty syndrome and explains the poor prognosis [32,33].

Novelty and Clinical Implications
We presume that our paper has two significant advantages over the currently published clustering-based analysis of acute heart failure populations. Noteworthy, it is, by now, the most numerous clustering analysis for a European AHF population. Moreover, we have not prespecified the number of the clusters in advance, in order to allow the algorithm to distinguish the optimal, natural number of different subgroups autonomously.
The clustering technology is currently far from being an ideal solution for heart failure phenotyping. Nevertheless, we strongly believe that this technique presents great potential as a tool which can capture the relationships which are too complex to be noticed by a classical statistical analysis but can be visible to the experienced clinician. We believe it will eventually immediately segregate admitted HF patients into previously described groups (clusters). Such a segregation will highlight the therapeutic aspects that clinicians should focus on (e.g., cardiorenal syndrome, iron deficiency, etc.) and initially estimate a prognosis. Further, the patients who would be placed into the group with a worse prognosis could be provided with more careful/insightful treatment from the very beginning of the therapeutic process. For example, clusters 2 and 3 revealed the recognized relationships between HF and, consequently, chronic intoxication and iron deficiency. The precise outpatient care for the cluster 3 patients, with a regular iron level assessment and intravenous supplementation if needed, could reduce the likelihood of HF deterioration [25] Further, proper education and providing cluster 3 patients with specialist psychiatric care regarding their addiction and substance abuse could slow the progress of HF [34]. Noteworthy, the clustering, in that case, does not reveal relationships that are astounding for the experienced cardiologist. The potential value of such algorithms and provided classifications is its ability to immediately categorize the patients into one of the pheno-groups and underline cluster-specific treatment targets which can be accidentally omitted due to, e.g., the doctors' overwork, overfilling the hospitals or the lack of experience of medical professionals.

Limitations
Our study is not free from limitations. First, the study included retrospective data. Therefore, the availability of potentially important clinical parameters was restricted. Variables, such as the echocardiographic parameters, invasive hemodynamic measurements or novel experimental markers, were not collected. New cluster-based trials with a broader biochemical and clinical composition would deliver exciting data. Moreover, the gathered data contained missing values. Notwithstanding restricting the data inclusion to 10% of the missing values, some bias could occur. Second, our analysis was based on single-center data from Poland, which included a relatively small sample size and lacked an external validation cohort. Consequently, the evaluated patients were treated following outdated guidelines. The current clinical presentation of AHF patients and their outcomes can differ from the presented results.
The machine-learning techniques can be associated with the overfitting problem, in which the model performs well on the seen population and poor on the new one. In other words, the model is not generalizable. However, in unsupervised learning, to which clustering belongs, there is no such information about a "true" or "correct" assignment of examples to clusters. The clustering only works with the given data, and the possible generalization of the clusters is rather a question of their interpretation by the domain (medical) experts rather than a question of evaluation on another dataset. Thus, overfitting, in the standard sense, is not an issue for clustering.
What is important in clustering is having a "reasonable" number of clusters. The small number of clusters will produce over-general results-the worst case is just one cluster for everything, a large number of clusters will produce over-specific results-the worst case is that each example creates its own cluster. The problem of over-specific results can be, in some sense, considered similar to the problem of overfitting-we tried to avoid it by automatically tuning the range of clusters. We designed the range of the number of possible clusters from three to six to avoid excessive data fragmentation. Such an approach was consistent with prior studies, which usually consisted of three to four groups [4,8,12,13]. We decided to increase the potential number of the clusters due to the bigger included population. Further analyses to highlight more nuanced phenotypes are warranted.

Conclusions
We successfully extracted six novel phenotypes of acute heart failure patients, providing a fresh insight into their heterogeneity. The proposed clusters were consistent with the latest understanding of pathophysiology (e.g., de novo HF, HT HFpEF, toxic HF, iron reduced left ventricle HF, cardiorenal, inflammatory HFpEF) and previous clustering-based papers, providing a more distinctive classification of the population. Presented results can be valuable for future AHF trial constructions and more customized treatments. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available within the article. Further data are available on request from the corresponding author.