Systemic Inflammatory Biomarkers Define Specific Clusters in Patients with Bronchiectasis: A Large-Cohort Study

Differential phenotypic characteristics using data mining approaches were defined in a large cohort of patients from the Spanish Online Bronchiectasis Registry (RIBRON). Three differential phenotypic clusters (hierarchical clustering, scikit-learn library for Python, and agglomerative methods) according to systemic biomarkers: neutrophil, eosinophil, and lymphocyte counts, C reactive protein, and hemoglobin were obtained in a patient large-cohort (n = 1092). Clusters #1–3 were named as mild, moderate, and severe on the basis of disease severity scores. Patients in cluster #3 were significantly more severe (FEV1, age, colonization, extension, dyspnea (FACED), exacerbation (EFACED), and bronchiectasis severity index (BSI) scores) than patients in clusters #1 and #2. Exacerbation and hospitalization numbers, Charlson index, and blood inflammatory markers were significantly greater in cluster #3 than in clusters #1 and #2. Chronic colonization by Pseudomonas aeruginosa and COPD prevalence were higher in cluster # 3 than in cluster #1. Airflow limitation and diffusion capacity were reduced in cluster #3 compared to clusters #1 and #2. Multivariate ordinal logistic regression analysis further confirmed these results. Similar results were obtained after excluding COPD patients. Clustering analysis offers a powerful tool to better characterize patients with bronchiectasis. These results have clinical implications in the management of the complexity and heterogeneity of bronchiectasis patients.


Introduction
Bronchiectasis is a chronic respiratory disease characterized by a permanent dilatation of the airways of different etiology [1][2][3]. Bronchiectasis is a very heterogeneous and complex disease, in which patients experience recurrent exacerbations mainly due to bronchial infection. Exacerbations negatively impact on the patients' quality of life and disease prognosis [4].
The immune response against infections is crucial in bronchiectasis patients [5]. Neutrophilic inflammation is the predominant phenotype in these patients. In response to bacterial loads, neutrophils are recruited to the lungs, where they secrete antimicrobial peptides to fight against infection [6]. However, other inflammatory cell types such as eosinophils are also involved in the pathobiology of bronchiectasis, particularly in the response to different biological agents [7][8][9][10] as well as to inhaled corticosteroids [11,12]. Lymphocytic infiltration was also demonstrated to take place just beneath the basement membrane of the epithelium in bronchiectasis patients [13].
Nutritional abnormalities and systemic inflammation are common in patients with bronchiectasis, particularly in male patients compared to females, as demonstrated in a recent study [14]. Moreover, body composition and systemic inflammation have a prognosis value in other chronic respiratory conditions such as chronic obstructive pulmonary disease (COPD) [15][16][17]. Disease severity scores usually take into account variables that reflect the status of the lung disease, but not those involving the systemic components. This imposes a challenge in the overall assessment and phenotyping of the patients.
Thus, phenotypic clustering in which variables related to systemic manifestations and inflammatory parameters are analyzed jointly may be of interest in clinical decision processes of patients with bronchiectasis. In this regard, results obtained from clinical tests can be analyzed using different approaches, in which software tools are applied to complex clinical and biological data sets obtained from large cohorts of patients. Phenotypic classification of bronchiectasis patients according to several clinical and biological markers may help predict disease severity, the risk of exacerbations, and disease prognosis. Recent investigations showed that a cut-off value greater than 5 reliably predicted hospitalizations and all-cause mortality according to the FACED (FEV 1 , age, chronic colonization, radiological extension, and dyspnea) and bronchiectasis severity index (BSI) scores [18] and that eosinophil levels defined differential clinical clusters of bronchiectasis patients [19].
The current investigation sought to tease out differential phenotypic characteristics of patients with bronchiectasis using data mining approaches on the basis of the following systemic inflammatory and nutritional parameters: blood neutrophil, eosinophil, and lymphocyte counts, C reactive protein, and hemoglobin in a large-cohort of patients from the Spanish Online Bronchiectasis Registry (RIBRON) [19,20]. Three clusters of patients with differential clinical phenotypes were obtained. Thus, the study objectives were: (1) to identify different clusters of patients included in this registry that could discriminate differential phenotypes on the basis of blood neutrophil, eosinophil, and lymphocyte counts along with C reactive protein and hemoglobin levels, (2) to analyze potential differences between the clusters in several clinical parameters involving the assessment of lung function, nutritional status, and a general clinical evaluation, and (3) to stratify the clusters according to disease severity following the exacerbation FACED (EFACED), FACED, and BSI indices.

Study Design
This was a multicenter, prospective, and observational study, in which 43 centers from Spain participated within the frame of the RIBRON database between February 2015 and October 2019 [21][22][23]. Strengthening the Reporting of Observational studies in Epidemiology (STROBE) reporting guidelines were used to design the current investigation [24]. The quality of the data introduced in the registry was always monitored and ensured by an external contract research organization (CRO).

Study Population
The patient recruitment flow-chart is depicted in Figure 1. Inclusion criteria were as follows: adult patients who had been diagnosed with non-CF bronchiectasis as a result of a high-resolution computerized tomography (HRCT) [21,23,[25][26][27][28]. The participants were stable patients and did not report any acute exacerbation at least in the last four weeks prior to study entry. A total of 1092 patients were analyzed from the registry. Study variables were as follows: anthropometry, smoking history, lung function, hemogram, inflammatory blood cells, and nutritional parameters were analyzed using custom data-analysis software tools. The exclusion criteria were the following: traction bronchiectasis and/or cystic fibrosis (sweat chloride test and/or genetic confirmation), and age younger than 18 years old.
The World Medical Association for Research in Humans (Seventh revision of the Declaration of Helsinki, Fortaleza, Brazil, 2013) guidelines were followed in the study [29]. Ethics approval was obtained from the Ethics Committee at the Hospital Josep Trueta Girona (# 001-2012, Hospital Universitari Dr. Josep Trueta, Girona, Spain) from all the participants. All the patients signed the informed written consent to participate in the registry. The information remained confidential at all times and no personal information related to any of the participants was introduced in the registry.  The World Medical Association for Research in Humans (Seventh revision of the Declaration of Helsinki, Fortaleza, Brazil, 2013) guidelines were followed in the study [29]. Ethics approval was obtained from the Ethics Committee at the Hospital Josep Trueta Girona (# 001-2012, Hospital Universitari Dr. Josep Trueta, Girona, Spain) from all the participants. All the patients signed the informed written consent to participate in the registry. The information remained confidential at all times and no personal information related to any of the participants was introduced in the registry.

Study Variables and Scores
Etiology of the non-CF bronchiectasis, anthropometry (age, sex, and body mass index), lung function, exercise capacity, chronic colonization by Pseudomonas aeruginosa, chronic colonization with other microorganisms, radiologic extension, dyspnea, the number of exacerbations and hospitalizations for exacerbations in the previous year, the Charlson index, smoking history, nutritional status, and systemic inflammatory cells and markers were obtained from all the patients. FACED [30], EFACED [31], and bronchiectasis severity index (BSI) [32] were calculated on the basis of the clinical study variables.

Patient Clustering
The study population was clustered into three major groups of patients on the basis of the following analytical parameters: eosinophils, neutrophils, lymphocytes, C-reactive protein (CRP), and hemoglobin that correlated with EFACED score. A total of 1092 patients met the criteria for these analyses (all these patients had valid values for all five parameters). A hierarchical clustering was performed on the basis of the five biomarkers in the 1092 patients using the scikit-learn library for Python [33]. Agglomerative methods, in which clusters start by a single patient and are subsequently merged and fused from the previous steps into bigger clusters, were used in this study. The criterion of such fusion is the minimal distance between the two clusters to be fused, as specified by the chosen linkage function. The algorithm stopped when the number of clusters fell to 5, which is the level that produced an optimal classification of the patients according to the EFACED score ( Table 1). As seen in Table 1, cluster # 3 was very small and was indistinguishable from cluster # 2 (EFACED score 3.57 in both clusters). In a similar fashion, clusters # 1 and # 4 were similar as no statistically significant difference was observed in EFACED score (2.60 versus 2.88, respectively). Hence, the 1092 patients were finally subdivided into three major clusters 1-3 that were labeled as mild (n = 242 patients), moderate (n = 515 patients), and severe (n = 335 patients) clusters of patients (Table 1). The five-dimensional space containing the patient data was reduced to a two-dimensional representation ( Figure 2). Uniform Manifold Approximation and Projection (UMAP) was applied to transform the data into two linear combinations (UMAP1 and UMAP2, respectively) of the five target markers (blood parameters: neutrophils, lymphocytes, eosinophils, CRP, and hemoglobin) in which all the patients were represented in three colors (mild, moderate, and severe represented in blue, light brown, and green, respectively) [34] ( Figure 2). Moreover, the distribution of the five target markers across the three patient clusters was also analyzed in the study as shown in Figure 3. The diagonal plots represented the distribution of the target variable across the three patient clusters. In addition, the distribution of the individual values in pairs of variables for all three clusters (blue, light brown, and green colors) were represented in scattered plots ( Figure 3). Biomedicines 2022, 10, x FOR PEER REVIEW 5 of 18

Statistical Analysis
The study variables are presented as mean (standard deviation) in tables. A subanalysis in which patients with COPD were excluded in all three clusters was also conducted. Potential differences among the three clusters of patients (cluster # 1, cluster # 2, cluster # 3), including those in which COPD patients were excluded, were assessed using one-way analysis of variance (ANOVA) and Tukey's post hoc for the quantitative variables and the Chi-square test for the categorical variables. Correlations between clinical and biological variables were explored using the Pearson's correlation coefficient. A Bonferroni-type adjustment was performed to considering the effect of having multiple correlations.

Statistical Analysis
The study variables are presented as mean (standard deviation) in tables. A subanalysis in which patients with COPD were excluded in all three clusters was also conducted. Potential differences among the three clusters of patients (cluster # 1, cluster # 2, cluster # 3), including those in which COPD patients were excluded, were assessed using one-way analysis of variance (ANOVA) and Tukey's post hoc for the quantitative variables and the Chi-square test for the categorical variables. Correlations between clinical and biological variables were explored using the Pearson's correlation coefficient. A Bonferroni-type adjustment was performed to considering the effect of having multiple correlations.

Statistical Analysis
The study variables are presented as mean (standard deviation) in tables. A subanalysis in which patients with COPD were excluded in all three clusters was also conducted. Potential differences among the three clusters of patients (cluster # 1, cluster # 2, cluster # 3), including those in which COPD patients were excluded, were assessed using one-way analysis of variance (ANOVA) and Tukey's post hoc for the quantitative variables and the Chi-square test for the categorical variables. Correlations between clinical and biological variables were explored using the Pearson's correlation coefficient. A Bonferroni-type adjustment was performed to considering the effect of having multiple correlations.
Correlations are displayed in graphical correlation matrixes, obtained from R package corrplot (https://cran.r-project.org/web/packages/corrplot/index.html, accessed on 2 December 2021), in different colors: blue for positive correlations and red for negative ones (Penn State University, World Campus, Pennsylvania, PA, USA).
Comparisons among the three patient clusters were also made on the basis of the degree of the disease severity according to the different scores (FACED, EFACED and BSI), in which the percentages of patients in each category were depicted. Potential differences among the three clusters were explored using the Chi-square test.
Multivariate ordinal logistic regression, in which the outcome variable was clusters (cluster # 1, cluster # 2, cluster # 3) was used to assess the potential associations of EFACED score with ordered clusters. The following clinically meaningful confounders were considered: Charlson index, COPD, platelets, ESR, fibrinogen, creatinine, total protein concentration, and albumin levels. The multivariate regression odds ratio (OR) is represented as a black dot in each of the confounders along with the corresponding confidence intervals, which were depicted in a forest plot. In the Y-axis, all the confounder variables are plotted, while in the X-axis, the width of the confidence intervals is represented. The one value is represented as a dotted vertical line. Statistical analyses were performed using SPSS 23.0 (SPSS Inc., Chicago, IL, USA) and Stata 15.1 (StataCorp LLC, College Station, TX, USA). Results were considered as statistically significant at a p-value < 0.05.

General Clinical Characteristics of the Three Patient Clusters
The number of female patients was greater than that of male patients in all three clusters (Table 2. Patients in cluster # 3 were significantly older than those represented in clusters # 1 and 2, respectively ( Table 2). Disease severity scores FACED, EFACED, and BSI were significantly greater in cluster # 3 than in clusters # 1 and # 2 ( Table 2). The proportions of cluster # 3 patients with mild and moderate degree of disease severity (BSI score) was significantly lower than in clusters # 1 and # 2 (mild, Figure 4A). Conversely, BSI score was greater in patients of clusters # 2 and # 3 than those in cluster # 1. Moreover, BSI score was also significantly higher in cluster # 3 than in cluster # 2 ( Figure 4A). The proportions of patients with moderate and severe EFACED and FACED scores were significantly higher in cluster # 3 than in clusters # 1 and # 2 (EFACED), as illustrated in Figure 4B,C, respectively.    The number of exacerbations was significantly higher in cluster # 3 patients than in cluster # 1 ( Table 2). The number of hospitalizations and Charlson index were significantly higher in cluster # 3 than in clusters # 1 and # 2 ( Table 2). The proportions of patients with colonization by Pseudomonas aeruginosa and with concomitant COPD were greater in cluster # 3 patients than in cluster # 1 ( Table 2). Smoking history did not significantly differ among the three clusters ( Table 2). The degree of airflow limitation and diffusion lung capacity was significantly impaired in cluster # 3 patients compared to clusters # 1 and # 2 ( Table 2). When COPD patients were excluded from the analysis, similar results were obtained in the three analyzed clusters (Table 3).

Systemic Inflammatory and Nutritional Parameters in the Three Cluster Patients
Importantly, the levels of leukocytes and other inflammatory cells were significantly greater in cluster # 3 patients than in those represented in clusters # 1 and # 2 (Table 4 and Figure 5A,B and Figure 6A,B). Moreover, levels of other inflammatory parameters such as CRP, ESR, and fibrinogen were also significantly higher in cluster # 3 patients than in clusters # 1 and 2 (Table 4). CRP concentrations significantly and positively correlated with FACED, EFACED, BSI, exacerbations, hospitalizations, and neutrophil levels in all the patients as a whole (all three clusters, Figure 7A). Moreover, when patients from clusters # 1 and # 2, but not cluster # 3, were analyzed separately, significant associations were also detected between CRP levels and the variables FACED, EFACED, neutrophils, and eosinophils (cluster # 1) and FACED, EFACED, BSI, and eosinophils (cluster # 2), as depicted in Figure 7B,C, respectively. Protein levels inversely correlated with FACED, EFACED, BSI, exacerbations, hospitalizations, and neutrophil levels when patients were analyzed as a whole (clusters # 1-3) and when cluster # 3 was analyzed independently ( Figure 7A,D, respectively). When cluster # 2 patients were analyzed independently, significant weaker correlations were observed between total protein levels and FACED, EFACED, BSI, and the number of exacerbations and hospitalizations ( Figure 7C). The number of platelets along with hemoglobin and hematocrit levels was also greater in cluster # 3 patients than in clusters # 1 and # 2 ( Table 4). Blood glucose levels were also higher in cluster # 3 patients than in clusters # 1 and # 2 ( Table 4). Concentrations of total proteins and albumin, however, were significantly lower in cluster # 3 patients than in clusters # 1 and # 2 ( Table 4). The analysis of the potential differences among the three clusters when COPD patients were excluded yielded similar results as shown in Table 5 and Figure 5A,B and Figure 6A,B.     all the study patients. Statistical significance: ***, p ≤ 0.001: Comparisons were assessed between either group # 3 or group # 2 and group # 1 (less severe § § §, p ≤ 0.001: Comparisons between groups # 3 and # 2.  all the study patients. Statistical significance: ***, p ≤ 0.001: Comparisons were assessed between either group # 3 or group # 2 and group # 1 (less severe § § §, p ≤ 0.001: Comparisons between groups # 3 and # 2.

Figure 8.
Multivariate ordinal logistic regression, in which the outcome variable was clusters (ordered from 1, the lowest level, to 3, the highest level) was used to assess the potential associations

Figure 8.
Multivariate ordinal logistic regression, in which the outcome variable was clusters (ordered from 1, the lowest level, to 3, the highest level) was used to assess the potential associations Figure 8. Multivariate ordinal logistic regression, in which the outcome variable was clusters (ordered from 1, the lowest level, to 3, the highest level) was used to assess the potential associations of EFACED score with each of the clusters. The following clinically meaningful confounders were considered: Charlson index, COPD, platelets, ESR, fibrinogen, creatinine, total protein concentration, and albumin levels. The multivariate regression odds ratio (OR) is represented as a black dot in each of the confounders along with the corresponding confidence intervals, which were depicted in a forest plot. In the Y-axis, all the confounder variables are plotted, while in the X-axis, the width of the confidence intervals is represented. The one value is represented as a dotted vertical line.

Discussion
Chronic respiratory diseases are characterized by a complex interaction between the airways pathobiology and systemic manifestations, such as inflammation and nutritional and metabolic abnormalities. Specifically, in a great number of bronchiectasis patients, microbial colonization plays a relevant role in the perpetuation of inflammatory events in the airways and systemic compartment of the patients, further contributing to increasing the number of exacerbations, disease progression, and prognosis [5,35,36]. Identification of the bronchiectasis patients who are placed at a greater risk to progress more rapidly and/or to experience exacerbations is still a major challenge in clinical settings. Currently available tools do not always help identify these specific groups of patients. Thus, other instruments need to be developed with the goal to define patterns of behavior from large data sets, which can be translated into well-structured groups of patients that can be specifically targeted in clinical settings. In the present study, using a data mining approach, five different clusters of patients have been identified in an initial stage. A refined analysis allowed us to merge a few of the clusters into three clinically different ones, whose specific clinical features have been analyzed from different standpoints. We believe that these are novel results that will have clinical implications.
Patients represented in cluster # 3 (n = 335) were older and had a significantly more severe disease as measured by all three disease scores (BSI, FACED, and EFACED) than patients in clusters # 1 and # 2. It is worth mentioning that in the three clusters, the behavior of patients with mild bronchiectasis was similar for all three indices (BSI, FACED, and EFACED). Nonetheless, in the group of patients with a moderate disease, the cluster distribution profile varied between the BSI score on the one hand and FACED and EFACED scores, on the other. The variables contributing to each specific score may account for the differences in disease severity distribution across the three identified clusters [30][31][32]. In addition, other important parameters such as Charlson index, the number of exacerbations and hospitalizations, the prevalence of COPD associated with bronchiectasis, and colonization by PA were also greater in cluster # 3 patients than in the other two clusters. These findings are in line with those recently reported in another investigation in which bronchiectasis patients were also subdivided into three major clusters on the basis of distinct airway phenotypic features [37,38]. The novelty in our investigation relies on the definition of three clinically relevant clusters based on the analysis of systemic variables such as neutrophils, lymphocytes, eosinophils, CRP, and hemoglobin.
Lung function parameters were also significantly impaired in cluster # 3 patients compared to patients classified in the other two clusters. Such an impairment in the degree of the airway obstruction and diffusion capacity was independent of the presence of COPD, as statistically significant differences were maintained among the three study clusters after excluding the COPD patients from the analysis in all three clusters. These results confirm that the differential phenotypes are associated with bronchiectasis per se rather than with COPD or the degree of the airflow limitation as also previously demonstrated [14,19].
A significant rise in the levels of cellular and molecular inflammatory parameters was observed in cluster # 3 patients compared to patients in clusters # 1 and # 2. These findings reveal that disease severity was probably associated with the degree of systemic inflammation in patients belonging to cluster # 3. In fact, significant positive associations were found between several inflammatory parameters (e.g., neutrophils and CRP) and disease severity scores as well as with the number of exacerbations and hospitalizations. Associations between disease severity and systemic inflammation have been previously reported [39][40][41]. The strong association between inflammation and disease progression and severity has also been confirmed in the present study. Importantly, the differential phenotypic characteristics of the three groups of patients identified using a clustering approach were further confirmed by the multivariate ordinal logistic regression analysis, in which the EFACED score was associated with the clustered risk variable (OR: 1.136), indicating that patients in cluster # 3 were more severe than patients in the other two clusters. These are clearly novel findings that validate the methodological approaches used in the present study.

Study Critique
A major strength in the investigation is the use of a large cohort of bronchiectasis patients from whom data have been obtained from 43 participating centers in Spain. Despite these advantages, several limitations also apply. The study of biological insights and disease pathogenesis would have required a different investigational approach, in which biological samples should be obtained from different compartments in the patients. The RIBRON, however, relies on clinical and analytical data obtained from the patients. Future investigations of different approaches based on the use of biological samples from the airways and lungs should be designed with the aim to elucidate specific mechanisms that are involved in the pathobiology of disease severity and progression in patients with bronchiectasis, especially in view of the profile of clinical and analytical parameters that have been analyzed in the current study. An external validation cohort aimed to further confirm the results reported herein would also be a matter of investigation in the near future. Differences between male and female patients were also analyzed previously in this registry [14]. Thus, these differences have not been analyzed in the present investigation. As abovementioned in the tables, the number of female patients was greater than that of male patients.
Another limitation is related to the lack of information on the potential colonization by pathogens other than PA in the airways and lungs of the study participants. As that information was not obtained from all the patients, it has not been analyzed in the present study.
Another limitation is related to the lack of additional analysis of biological parameters including cytokines (interleukin-6 and tumor necrosis factor-alpha) that could have been comprised in the cluster analysis. However, as those parameters were not collected in the RIBRON registry, they were not available for the cluster analysis. Future studies should aim to assess whether other systemic parameters may also contribute to differentiate several clinical phenotypes in patients with bronchiectasis.

Conclusions
Clustering analyses of systemic blood parameters identified three differential clinical phenotypes of bronchiectasis patients that were further confirmed by a logistic regression. These analyses revealed that disease severity was associated with the clusters, particularly with cluster # 3. Patients included in this group had a more severe disease, worse lung function, and a rise in systemic inflammatory parameters compared to patients in clusters # 1 and # 2. These findings were independent of the presence of COPD. Clustering analysis of systemic parameters offers a powerful tool to better characterize patients with bronchiectasis. These results have clinical implications in the management of the complexity and heterogeneity of bronchiectasis patients.  Informed Consent Statement: All patients signed the informed written consent to participate in the registry. The information remained confidential at all times and no personal information related to any of the participants was introduced in the registry.

Data Availability Statement:
The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.