Unsupervised Cluster Analysis in Patients with Cardiorenal Syndromes: Identifying Vascular Aspects

Background/Objectives: Cardiorenal syndrome (CRS) is a disorder of the heart and kidneys, with one type of organ dysfunction affecting the other. The pathophysiology is complex, and its actual description has been questioned. We used clustering analysis to identify clinically relevant phenogroups among patients with CRS. Methods: Data for patients admitted from 1 January 2012 to 31 December 2012 were collected from the French national medico-administrative database. Patients with a diagnosis of heart failure and chronic kidney disease and at least 5 years of follow-up were included. Results: In total, 13,665 patients were included and four clusters were identified. Cluster 1 could be described as the vascular–diabetes cluster. It comprised 1930 patients (14.1%), among which 60% had diabetes, 94% had coronary artery disease (CAD), and 80% had peripheral artery disease (PAD). Cluster 2 could be described as the vascular cluster. It comprised 2487 patients (18.2%), among which 33% had diabetes, 85% had CAD, and 78% had PAD. Cluster 3 could be described as the metabolic cluster. It comprised 2163 patients (15.8%), among which 87% had diabetes, 67% dyslipidemia, and 62% obesity. Cluster 4 comprised 7085 patients (51.8%) and could be described as the low-vascular cluster. The vascular cluster was the only one associated with a higher risk of cardiovascular death (HR: 1.48 [1.32–1.66]). The metabolic cluster was associated with a higher risk of kidney replacement therapy (HR: 1.33 [1.17–1.51]). Conclusions: Our study supports a new classification of CRS based on the vascular aspect of pathophysiology differentiating microvascular or macrovascular lesions. These results could have an impact on patients’ medical treatment.


Introduction
Cardiorenal syndrome (CRS) is defined as a disorder of the heart and kidneys, with one type of organ dysfunction affecting the other [1].
The most used classification of CRS distinguishes five types [2][3][4] and is based on the chronology of cardiac and renal events.However, this classification presents several issues.First, we recently found, in a large nationwide cohort study, that the chronology between kidney and heart events was not associated with different prognoses.Moreover, there was no synergism between the kidney and heart, which supports the fact that CRS is the consequence of a shared pathophysiology among these two organs, rather than one organ's dysfunction caused by the other [5].These pathological pathways have been studied and described and include microvascular and endothelial dysfunction, the activation of the sympathetic and renin-angiotensin-aldosterone (RAA) systems, alterations in nitric oxide bioavailability, or inflammation [6,7], and a consistent hypothesis is that the major common consequence is fibrosis [8].The role of microvascular dysfunction in the development of HF and CKD is well documented, although the mechanism differs between HF with preserved ejection fraction (HFpEF) and that with reduced ejection fraction (HFrEF) [9][10][11][12].In brief, in the first case, cardiac remodeling involves inflammation and endothelial dysfunction, whereas, in HFrEF, remodeling is predominantly the consequence of cardiomyocyte death.Regarding the kidneys, among the potential mechanisms, a central venous pressure increase and a reduction in renal blood flow lead to decreased nitric oxide (NO) production and a vicious circle involving sympathetic and RAA system activation, finally leading to fibrosis [8,9,13] (Figure 1).

Study Design
This longitudinal cohort study utilized the national hospitalization database, which covers hospital care for the entire French population.Data for patients admitted from 1 January 2012 to 31 December 2012 were collected from the national medico-administrative PMSI database ("Programme de Médicalisation des Systèmes d'Information"), a medicalized information system program inspired by the US Medicare system.This program, implemented in 2004, records hospital medical activity in a database, ensuring anonymity, and encompasses over 98% of the French population (approximately 67 million people) from birth (or immigration) to death (or emigration).
The hospitalization details are encoded in a standardized dataset, including age, sex, hospital stay duration, admission and discharge dates, mode of discharge, pathologies, and procedures.The medical information collected routinely comprises principal and secondary diagnoses based on the International Classification of Diseases, Tenth Revision (ICD-10).All medical procedures are also recorded using the national nomenclature, Classification Commune des Actes Medicaux (CCAM).The reliability of the PMSI data has been previously assessed [22], and this database has been effectively used in previous studies related to cardiovascular or cerebrovascular conditions [5,23] The study was conducted retrospectively, and, as the patients were not involved in its conduct, there was no impact on their care.Ethical approval was not required, as all data were anonymized.The French Data Protection Authority granted access to the PMSI data.The procedures for data collection and management were approved by the Commission Nationale de l'Informatique et des Libertés (CNIL), the independent national Moreover, the acute types (1 and 3) and type 5 refer to specific situations with known systemic consequences and should be distinguished.This view was defended by Zoccali et al. in a recent position paper [14].Another view is cardiovascular-kidney-metabolic syndrome, an entity recently recognized by the Amercian Heart Association, expanding the classical cardiorenal syndrome and advocating for a more holistic view [15].While type 2 and type 4 (chronic CRS) should be redefined outside any chronological considerations, CRS may refer, in acute situations, to functional acute kidney injury (AKI) in conditions of decompensated heart failure, improving after the treatment of congestion.This definition of CRS, which would refer to some of the type 1 cases, is, to our knowledge, the most largely applied in clinical practice and in the literature [16][17][18].In this study, we aimed to explore the fact that, rather than the clinical presentation, the pathophysiological mechanism involved could be used to better classify these patients and lead to optimized treatment [8].
Cluster analysis is an unsupervised machine learning method that categorizes complex entities without investigators' supervision by segregating samples into homogenous groups [19][20][21].Unsupervised analysis can have great value, as it allows one to explore data without a priori knowledge.
In this study case, we hypothesized that unsupervised clustering would corroborate the pathophysiological classification of CRS by identifying clinically relevant phenogroups related to known pathophysiological pathways.

Study Design
This longitudinal cohort study utilized the national hospitalization database, which covers hospital care for the entire French population.Data for patients admitted from 1 January 2012 to 31 December 2012 were collected from the national medico-administrative PMSI database ("Programme de Médicalisation des Systèmes d'Information"), a medicalized information system program inspired by the US Medicare system.This program, implemented in 2004, records hospital medical activity in a database, ensuring anonymity, and encompasses over 98% of the French population (approximately 67 million people) from birth (or immigration) to death (or emigration).
The hospitalization details are encoded in a standardized dataset, including age, sex, hospital stay duration, admission and discharge dates, mode of discharge, pathologies, and procedures.The medical information collected routinely comprises principal and secondary diagnoses based on the International Classification of Diseases, Tenth Revision (ICD-10).All medical procedures are also recorded using the national nomenclature, Classification Commune des Actes Medicaux (CCAM).The reliability of the PMSI data has been previously assessed [22], and this database has been effectively used in previous studies related to cardiovascular or cerebrovascular conditions [5,23].
The study was conducted retrospectively, and, as the patients were not involved in its conduct, there was no impact on their care.Ethical approval was not required, as all data were anonymized.The French Data Protection Authority granted access to the PMSI data.The procedures for data collection and management were approved by the Commission Nationale de l'Informatique et des Libertés (CNIL), the independent national ethical committee protecting human rights in France, which ensures that all information is kept confidential and anonymous, in compliance with the Declaration of Helsinki (MR-005 registration number 0415141119).

Patient Selection
This study was based on patients aged 18 years and over, admitted to a French hospital between 1 January 2012 and 31 December 2012, with a diagnosis of HF and CKD.The analysis focused on patients with at least 5 years of follow-up or who died during followup.The patients' medical history in the 2 years before their hospitalization was collected.Patients with CRS were identified as patients with a diagnosis of both HF and CKD.For the cluster analysis, a random sample of 50% of these patients was finally included in this study.

Collected Data
Patient information (demographics, comorbidities, medical history, and events during hospitalization or follow-up) was described using data collected in the hospital records.For each hospital stay, combined diagnoses at discharge were obtained.Each variable was identified using ICD-10 codes (Appendix A Table A1: ICD-10 codes).

Outcomes
We aimed to evaluate the incidence of all causes of death, cardiovascular death, rehospitalization for HF, myocardial infarction, ischemic stroke, and KRT (defined by dialysis or renal transplantation) (Appendix A Table A1: ICD-10 codes).
The endpoints were evaluated with follow-up starting from the first hospitalization until the date of each specific outcome or the date of last news in the absence of the outcome.Information on outcomes during follow-up was obtained by analyzing the PMSI codes for each patient.We focused on the components of Major Adverse Renal and Cardiac Events (MARCE) [24,25], namely all-cause death, cardiovascular death, hospitalization for heart failure, myocardial infarction, stroke, and renal replacement therapy, with the exception of acute kidney injury, which were identified by using their respective ICD-10 or procedure codes.Cardiovascular death was determined based on the primary diagnosis during hospitalization resulting in death (ICD-10 codes: cardiovascular death-I00-I99).Rehospitalization was considered to be attributable to heart failure when heart failure was recorded as the main diagnosis.

Cluster Analysis
Unsupervised cluster analysis using the hierarchical clustering method was used to identify homogenous phenotypic subgroups of patients with CRS, without prior knowledge of the outcomes.All baseline clinical variables were used for the clustering process (Table 1).Agglomerative hierarchical clustering was performed, using Ward's method, and the squared Euclidian distance was used for the variables of interest.Agglomerative hierarchical clustering is based on a 'bottom-to-top' approach where the clustering begins with a single patient, who is then grouped with another based on similarities regarding the specified clinical variables.The dendrogram (Figure 2) displays the clustering process with the vertical lines representing the various clusters and the distance between the clusters equating to the sum of the squared differences within all clusters.Small values of the distance indicate that the merged clusters are similar, and large values indicate the combination of 2 dissimilar (heterogeneous) clusters.The determination of the number of clusters was not prespecified.Two-, three-, and four-cluster models were examined.The four-cluster model formed much clearer patterns than the three-cluster model and was therefore used in this study.Values are n (%) or mean ± SD.CABG = coronary artery bypass graft; CKD = chronic kidney disease; COPD = chronic obstructive pulmonary disease; ICD = implantable cardioverter defibrillator; MI = myocardial infarction; PCI = percutaneous coronary intervention; SD = standard deviation.

Statistical Analysis
Qualitative variables were described as frequencies and percentages and quantitative variables as means (SDs).Comparisons were made using χ2 tests for categorical variables and the Student t test or nonparametric Kruskal-Wallis test, as appropriate, for continuous variables.
The 5-year yearly incidence of all-cause death, cardiovascular death, rehospitalization for HF, myocardial infarction, ischemic stroke, and KRT was calculated.Unadjusted and multivariable-adjusted Cox analyses were used to estimate the associations between clusters and clinical outcomes, and the results were expressed as hazard ratios (HR) and 95% confidence intervals (95% CI).Incidence rates (IR) were also reported in each cluster.Parameters associated with the risk of death, cardiovascular death, myocardial infarction, new episodes of HF, ischemic stroke, and KRT were used as covariates in the multivariable models.

Baseline Characteristics of Patients
A total of 13,665 patients were included in this study, among whom 57% were men and 77% were older than 75 years.Most patients had hypertension (82%), 44% had diabetes, 34% had dyslipidemia, 24% had obesity, and 8% were active smokers.Peripheral artery disease (PAD) was present in 39% of the patients.
The characteristics of all patients are detailed in Table 1 and Figure 3.
The characteristics of all patients are detailed in Table 1 and Figure 3.

Cluster Analysis
Based on the hierarchical cluster analysis, four different clusters were identified (central illustration).
The patients' characteristics per cluster are shown in Table 1 and Figure 4.

Cluster Analysis
Based on the hierarchical cluster analysis, four different clusters were identified (central illustration).
The patients' characteristics per cluster are shown in Table 1 and Figure 4.

Cluster 1
Cluster 1 comprised 1930 patients (14.1% of the population).It could be described as the vascular-diabetes cluster.The mean age was the second lowest (76 years old) and 60% of the patients were over 75 years old.However, 91% of the patients had hypertension, 60% had diabetes, 94% had coronary artery disease, and 80% had PAD.Atrial fibrillation was less frequent than in the other clusters (38%).

Cluster 2
Cluster 2 comprised 2487 patients (18.2% of the population).It could be described as the vascular cluster.The mean age was the highest (82 years old) and 85% of the patients were over 75 years old; 66% of the patients had atrial fibrillation and 38% of the patients had valvular heart disease.Only 33% of the patients had diabetes, but 78% had PAD.

Cluster 1
Cluster 1 comprised 1930 patients (14.1% of the population).It could be described as the vascular-diabetes cluster.The mean age was the second lowest (76 years old) and 60% of the patients were over 75 years old.However, 91% of the patients had hypertension, 60% had diabetes, 94% had coronary artery disease, and 80% had PAD.Atrial fibrillation was less frequent than in the other clusters (38%).

Cluster 2
Cluster 2 comprised 2487 patients (18.2% of the population).It could be described as the vascular cluster.The mean age was the highest (82 years old) and 85% of the patients were over 75 years old; 66% of the patients had atrial fibrillation and 38% of the patients had valvular heart disease.Only 33% of the patients had diabetes, but 78% had PAD.Anemia was also the most frequent in this group (47.3%).

Cluster 3
Cluster 3 comprised 2163 patients (15.8% of the population).It could be described as the metabolic cluster.The mean age was 74 years old and only 50% of the patients were over 75 years old.Diabetes was present in almost 87% of the patients, dyslipidemia in 67%, and obesity in 62%, whereas only 46% had PAD.

Cluster 4
Cluster 4 comprised 7085 patients (51.8% of the population).It could be described as the low-vascular cluster.There were more women than in the other clusters (51%).The mean age was high (82 years old) and 86% of the patients were over 75 years old.These patients had fewer vascular comorbidities than in the other clusters: 31% of the patients had diabetes, 32% had coronary artery disease, 11% had PAD, 56% had atrial fibrillation, 17% had dyslipidemia, and 16% were obese.

Multivariate Analysis
The results of the survival analysis are shown in Table 2.The analyses were adjusted for age and sex.
In the multivariable analyses, the young-metabolic cluster was not associated with a lower risk of death (HR: 1.01 [0.94-1.09])or cardiovascular death (HR: 1.04 [0.92-1.18])than the young-vascular-diabetes cluster, but it was still associated with a higher risk of KRT (HR: 1.33 [1.17-1.51]).The vascular cluster and the low-vascular cluster were still associated with a higher risk of death (HR: 1.40 [1.31-1.50]and HR: 1.20 [1.13-1.28],respectively), but only the vascular cluster was associated with a higher risk of cardiovascular death (HR: 1.48 [1.32-1.66]).The risk of dialysis or renal transplantation was not lower in the vascular cluster or the low-vascular cluster than in the vascular-diabetes cluster.
The risk of myocardial infarction or ischemic stroke was still not different between the vascular cluster and the other clusters.
The risk of rehospitalization for heart failure was higher in the vascular cluster, the young-metabolic cluster, and the low-vascular cluster than in the vascular-diabetes cluster.

Discussion
Our study presents a four-cluster distribution of patients with coupled cardiac and renal involvement, defining CRS.These four clusters correspond to the distribution of patients according to their comorbidities and prognosis, which does not follow the usual CRS classification and thus questions the practical usefulness of this classification and suggests a vascular aspect of pathophysiology in CRS.The objective of this study was to propose a pathophysiological classification of CRS by identifying clinically relevant phenogroups related to known pathophysiological pathways.Its main finding is the identification of four data-driven clusters, which give insights into the different clinical phenotypes of cardiorenal syndromes and can be linked to a known pathophysiology.
The first cluster (14.1% of the population) identified patients with severe vascular damage and frequent diabetes.The second cluster (18.2% of the population) identified patients with severe vascular damage but mostly without diabetes.The third cluster (15.8% of the population) identified patients with metabolic syndrome, diabetes, obesity, dyslipidemia, and fewer macrovascular complications.Finally, the fourth cluster (51.8% of the population) identified patients with fewer comorbidities than in the other clusters, apart from a high rate of atrial fibrillation.This illustrates the variety of clinical phenotypes in cardiorenal syndromes, where the vascular and metabolic history play an important role.Indeed, most comorbidities associated with microvascular dysfunction include diabetes, obesity, dyslipidemia, and hypertension [9,12].However, these are also risk factors for atherosclerosis and macrovascular lesions.As we have already shown in a previous study, diabetes is a major risk factor for CRS [23].However, diabetes with (cluster 1) or without (cluster 3) severe macrovascular damage may lead to CRS by different pathophysiological pathways: primary ischemic lesions in the first case and primary microvascular dysfunction in the second case.Moreover, most of our patients (cluster 4) had low vascular damage and few risk factors.We therefore hypothesize that, in these patients, the primary mechanism is microvascular dysfunction, associated with HFpEF, and heart and kidney fibrosis.This is of importance as it may help physicians to better select interventions aimed at preventing clinical events, such as lipid-lowering therapies, anti-hypertensive therapies, or inflammation-modulating therapies.
Of note, anemia is frequent in our population, which is not surprising, and its association with CKD, HF, and CRS is known.The triad of HF, CKD, and anemia is sometimes referred as cardiorenal anemia syndrome [26].It probably involves hypoxia, nitric oxide secretion, and vasodilation and lowers the blood pressure, which impairs kidney function and can result in sympathetic RAAS activation and renal vasoconstriction and finally fibrosis [27][28][29].
Regarding the impact on the prognoses of these patients, we focused on the components of MARCE, namely death, cardiovascular death, hospitalization for heart failure, myocardial infarction, stroke, and renal replacement therapy, with the exception of acute kidney injury [25].The risks of death and cardiovascular death seemed to be lower in the vascular-diabetes cluster, even after adjustment for age and sex.On the other hand, the risk of dialysis or renal transplantation was the highest in the metabolic cluster, which was consistent with the high proportion of patients with diabetes and the known impact of cardiovascular risk factors in kidney function decline [1].The age and sex ratios were very different within the clusters, which may have had a significant impact on the outcomes, even after statistical adjustment.The frailty index was comparable in clusters 2 to 4 and slightly lower in cluster 1, which could explain the lower risk of death in the vasculardiabetes cluster.As expected, macrovascular damage seemed to be associated with a higher risk of death, cardiovascular death, or heart failure (cluster 2).The high proportion of anemia in this cluster may also be related to this poor prognosis.
Environmental or genetic factors could also be involved [30].Geographical data and the possibility of pollution exposition could shed some light on this question, but, unfortunately, these data were not accessible.The description of CRS based on pathophysiology can greatly impact patients' care, as individualized medical treatments can be proposed according to the patients' phenotypes [31].
The strengths of this study are its size and the absence of selection bias because of the exhaustive extraction of all ICD-10 codes from French patients hospitalized in 2012.Limitations include the lack of biological parameters, imaging parameters, and information on drug treatments.Indeed, the elevation of B-type natriuretic peptide, serum creatinine, or more specific ones such as cystatin C is associated with the diagnosis and also prognosis of CRS [4,32,33].Echocardiography may carry prognostic value in patients with CRS.In a cohort of 30,681 patients, including 2512 patients with CRS, an increasing PA pressure and higher RV diameter were independently associated with a higher incidence of CRS [34].Regarding kidney imaging, renal ultrasonography, namely intrarenal venous flow, is associated with the prognosis of patients with CRS [35].Another limitation is that only in-hospital events and codes were included in this analysis.However, it is plausible that all patients with such events are managed in hospitals.

Conclusions
In conclusion, our unsupervised analysis identified statistically driven groups of patients with different phenotypes but similar prognoses, which supports the classification

Figure 2 .
Figure 2. Dendrogram generated by hierarchical clustering process showing the CRS clusters.The dendrogram graph is a visual representation of the hierarchical clustering process.Vertical lines are clusters that are joined together, and the position of the line on the scale indicates the distance at which the clusters are joined.Clusters are identified by different colors.

Figure 2 .
Figure 2. Dendrogram generated by hierarchical clustering process showing the CRS clusters.The dendrogram graph is a visual representation of the hierarchical clustering process.Vertical lines are clusters that are joined together, and the position of the line on the scale indicates the distance at which the clusters are joined.Clusters are identified by different colors.

Figure 4 .
Figure 4. Heatmap of baseline characteristics of patients with CRS according to patient clusters.

Figure 4 .
Figure 4. Heatmap of baseline characteristics of patients with CRS according to patient clusters.

Table 1 .
Baseline characteristics of patients with CRS according to patient clusters.

Table A2 .
Incident outcomes according to the type of cardiorenal syndrome cluster.