Next Article in Journal
Impact of Pre-Pregnancy Body Mass Index on Pregnancy and Perinatal Outcomes in Liver Transplant Recipients: A Retrospective Cohort Study
Previous Article in Journal
New Insights into the Role of Columellar Strut and Septal Extension Graft: A Comparative Review of Long-Term Outcomes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Respiratory Rehabilitation Index (R2I): Unsupervised Clustering Approach to Identify COPD Subgroups Associated with Rehabilitation Outcomes

IRCCS Fondazione Don Carlo Gnocchi Onlus, 50143 Firenze, Italy
*
Author to whom correspondence should be addressed.
Diagnostics 2025, 15(16), 2053; https://doi.org/10.3390/diagnostics15162053 (registering DOI)
Submission received: 2 July 2025 / Revised: 3 August 2025 / Accepted: 14 August 2025 / Published: 16 August 2025
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

Abstract

Background/Objectives: Chronic obstructive pulmonary disease (COPD) is a progressive condition whose heterogeneous endotypes, clinical manifestations, and recovery pathways complicate the identification of reliable predictors of rehabilitation outcomes. Several respiratory and functional assessments are available with no consensus on the most predictive ones. While univariate markers may miss multifactorial interactions essential for prognosis, data-driven unsupervised clustering methods can integrate complex information from different sources. This study aimed to apply unsupervised clustering to identify pre-rehabilitation characteristics predictive of discharge outcomes for COPD patients undergoing pulmonary rehabilitation. Methods: A total of 126 COPD patients undergoing pulmonary rehabilitation were included in the analysis. Three assessments were performed at admission, namely the forced oscillation technique, spirometry, and the six-minute walk test (6MWT). The outcome was the change in 6MWT distance between admission and discharge. Unsupervised clustering methods were applied to admission variables to identify subgroups associated with outcomes. Results: Among the clustering algorithms tested, k-means (with Ncl = 2) provided the optimal solution. The resulting respiratory rehabilitation index (R2I) was significantly associated with the outcome dichotomized via the minimal clinically important difference of 30 m. Patients with R2I = 1, indicating severe functional and respiratory impairments, were associated with higher post-rehabilitation functional improvement (p = 0.032). While few functional parameters of 6MWT were statistically different between the groups identified by outcome, nearly all variables in the analysis exhibited significant distribution differences among the R2I clusters. Conclusions: These findings highlight the heterogeneity of COPD and the potential of unsupervised clustering to identify distinct patient subgroups, enabling more personalized rehabilitation strategies.

1. Introduction

Chronic obstructive pulmonary disease (COPD) represents a significant global health burden, leading to substantial morbidity, mortality, and health care costs [1]. Current estimates suggest a prevalence of around 3% in the general population, with projections indicating it will become the third leading cause of death and the seventh leading cause of disability-adjusted life years (DALYs) lost by 2050 [2]. In Italy, recent estimates indicate a slightly lower prevalence, affecting approximately 2–2.5% of the general population [2]. COPD is a progressive disease, often resulting in reduced quality of life and increased risk of exacerbations, hospitalizations, and mortality [3]. The current prognostic markers for COPD include a range of clinical, lung function, and imaging parameters. These may include lung capacity measures such as forced expiratory volume in the first second (FEV1) [4], oxygen (O2) saturation, and inflammatory biomarkers such as white blood cell count and C-reactive protein levels [5].
Pulmonary rehabilitation (PR) is a cornerstone intervention in COPD management, encompassing a multidisciplinary approach aimed at improving physical and psychological health. PR programs typically include exercise training, education about COPD management, and psychosocial support [6]. However, the response to PR varies widely across individuals, reflecting the significant heterogeneity of the disease. Indeed, COPD patients present with distinct phenotypes, each characterized by unique etiological, clinical, and prognostic profiles, which complicates efforts to predict rehabilitation outcomes and tailor personalized interventions [7]. Univariate markers such as FEV1 [8], six-minute walk test (6MWT) covered distance [9], and scales assessing symptoms like the Medical Research Council (MRC) Dyspnoea Scale [10], have been widely used to assess COPD severity and response to rehabilitation.
However, these single-dimension measures may oversimplify the multifactorial nature of COPD, failing to capture the complex interconnections between clinical, functional, physiological, and psychological factors that shape individual recovery trajectories. This complexity highlights the need for analytical approaches capable of integrating multiple data sources to provide a more comprehensive characterization of disease variability.
Supervised machine learning methods, although highly effective for prediction tasks, rely on predefined outcome labels and therefore cannot uncover latent structures or describe the variability in patient responses.
Unsupervised machine learning techniques have increasingly been used to address the challenges posed by high-dimensional, heterogeneous data [11,12,13]. Their main strength lies in identifying complex, nonlinear relationships that traditional statistical methods may overlook, enabling a more data-driven stratification of patients. Although their clinical applicability is still constrained by the need for large and high-quality datasets, limited reproducibility, and reduced interpretability, these approaches have shown promise in supporting clinical decision-making and complementing conventional assessments.
Building on this rationale, unsupervised clustering is particularly suited to explore the underlying structure of the COPD population and identify patient subgroups with distinct recovery trajectories [14,15,16]. Recent applications of unsupervised clustering in COPD have demonstrated its potential and further support its use in this clinical context [17,18]. Despite the promise of clustering approaches, challenges remain in ensuring their reproducibility and clinical applicability. Variability in patient cohorts, data sources, and clustering methodologies across studies can lead to inconsistent results, raising concerns about the robustness of identified phenotypes.
This study employs unsupervised clustering methods to stratify COPD patients from clinical information at admission in a rehabilitation unit. The resulting index is then assessed for its predictive value on rehabilitation outcomes at discharge. To address the limitations of earlier studies, a set of variables from three distinct assessment domains, i.e., the 6MWT, forced oscillation technique (FOT), and spirometry, was considered. In addition, different clustering approaches were compared to verify consistency and robustness in subgroup identification.

2. Materials and Methods

2.1. Study Design and Collection

This study was based on both a prospective observational study (conducted from 2021 to 2022) and a retrospective observational study (from 2016 to 2018) carried out at the Pulmonary Rehabilitation Unit of IRCSS Fondazione Don Gnocchi ONLUS in Florence. The studies enrolled COPD patients undergoing an outpatient pulmonary rehabilitation program (PRP). PRP was conducted in accordance with the American Thoracic Society (ATS) and the European Respiratory Society (ERS) recommendations [19] and included education, aerobic exercise training for both upper and lower limbs, and breathing retraining. The studies shared the same inclusion criteria: patients had to meet the COPD definition outlined by GOLD standards [20]; the severity of airflow obstruction ranged from moderate to very severe according to the GOLD classification; participants were former smokers in stable condition for at least four weeks prior to enrollment; and they were receiving optimal standard treatment as recommended by GOLD guidelines. Patients with recent cardiovascular events or with neuromuscular or osteoarticular diseases that limited physical exercise and/or compromised lung mechanical properties were excluded from the PRP. The studies were approved by the Research Ethics Committee (r.n.18765_oss; r.n.15217_oss). All participants provided written informed consent at the time of assessment. The variables of interest were evaluated at two time points, namely admission (T0) and discharge (T1) over 20 sessions of the pulmonary rehabilitation program.
This analysis incorporated three distinct respiratory tests (Figure 1), including the FOT [21], spirometry [22], and the 6MWT [23], each serving a specific purpose in assessing respiratory function. The FOT procedure focused on assessing respiratory impedance by recording multiple measurements while participants breathed normally. This non-invasive technique provided detailed insights into the mechanical properties of the respiratory system, helping to identify potential abnormalities or dysfunctions [21]. Spirometry provided insightful data on lung volume and air-flow dynamics [22]. The 6MWT involved participants walking briskly for six minutes while vital signs and the distance covered were recorded. This test provided insights into participants’ functional capacity and endurance, offering a practical measure of their overall cardiopulmonary health [23].

2.2. Data Preparation

The outcome of the study was the 6MWT covered distance variation between T0 and T1 (namely, Delta meters). Patients with missing outcome data were excluded from the analysis. In line with international clinical guidelines, specifically the ATS/ERS technical standard on field walking tests under chronic respiratory conditions [24], a minimal clinically important difference (MCID) threshold of 30 m was used to define clinically significant improvement (CSI) in the 6MWT. Consequently, the outcome was dichotomized as follows:
6 M W T C S I = 0 ,         i f   Delta   meters < 30 1 ,         i f   Delta   meters 30
The independent variables of the study were collected at T0 from the three different assessment domains mentioned above, for a total of 26 variables. Specifically, within the FOT, respiratory system resistance (RRS) and reactance (XRS) were measured during inspiration at 5 Hz, along with its variation (ΔXRS). Moreover, inspiratory time (TI), the ratio of TI to total time (TI/TTOT), expiratory time (TE), mechanical ventilation (VE), tidal volume (VT), the percentage of respiratory flow (RF%), and respiratory rate (RR) were recorded. In spirometry, functional parameters were included, such as forced expiratory volume divided by slow vital capacity (FEV/SVC), FEV1, total lung capacity (TLC), inspiratory capacity (IC), functional residual capacity (FRC), and residual volume (VR). During the 6MWT, in addition to recording the total distance walked, patients were assessed from multiple perspectives, including O2 levels, O2 saturation, the Borg Dyspnoea Scale, and the Borg Scale for limb fatigue, measured twice, before and after the test.
A preliminary analysis was adopted to discard variables showing a cross-correlation greater than 0.8. Variables with missing values were imputed using a k-nearest neighbors (kNN)-based imputer from the Scikit-learn library [25]. Then, the remaining features were standardized by removing the mean and scaling to unit variance.

2.3. Clustering Methods

Patients were clustered according to four different unsupervised algorithms, including k-means [26], k-medoids [27], a Gaussian mixture model [28], and BIRCH (balanced iterative reducing and clustering using hierarchies) [29]. Input data for the unsupervised models were the independent variables of the analysis.
K-means clusters data by partitioning samples in a number of groups with equal variance [26]. The algorithm was initialized with the k-means++ method (selecting initial centroids using the distribution probability-based sampling technique [30]) with the aim of minimizing the total variance contribution to the cluster. Computation was sped up using the ELKAN method (applying the triangle inequality to avoid computation of unnecessary distances [31]).
The k-medoids algorithm, a variation of k-means, partitions data into clusters by choosing representative points (medoids) and assigning each sample to the nearest medoid [27]. The algorithm was initialized with the k-medoids++ method (following an approach similar to k-means++).
The Gaussian mixture model (GMM) assumes data are generated from a mixture of Gaussian distributions [28]. It employs the expectation–maximization algorithm to estimate the distribution parameters and assigns points to clusters based on the maximum a posteriori probability [32]. The algorithm was initialized with the k-means++ method.
BIRCH constructs a feature tree with each of the nodes representing a subcluster. The feature tree expands dynamically as new data points are added [29].
For each algorithm, the number of clusters varied between 2 and 15. The number of clusters, as well as the different initializations, were compared and selected by choosing the configuration that yielded the highest silhouette score [33]. Once the optimal number of clusters was identified, the clustering algorithm was selected based on the best compromise between the silhouette score and balance in the number of patients assigned to clusters.

2.4. Statistical Analysis

Descriptive statistics were calculated before the imputation to provide a comprehensive overview of the effective absolute values. The median and interquartile range (IQR) values were reported for numerical variables, while for categorical variables, absolute frequencies and percentages were calculated. A comparative analysis was conducted between the subgroups identified by the dichotomized outcome. A Mann–Whitney test was performed for numerical variables, while a chi-squared test was conducted for categorical variables. After computing the cluster centroids, a second comparative analysis was conducted (Mann–Whitney test) to assess whether the outcome distributions in the cluster groups were statistically different. Later, the dichotomized outcome was compared with the cluster labels of each algorithm through a contingency table and a chi-squared analysis. Finally, on the model that reported the best results, a Mann–Whitney test was employed to evaluate whether there were statistically significant variations in the distribution of independent variables between the clusters.

3. Results

3.1. Descriptive and Univariate Results

A total of 166 patients were initially enrolled, of whom 26 were excluded due to comorbidities, resulting in 140 patients included in the study. Among these, 14 patients had missing outcome data, leading to a final sample size of 126 patients analyzed. In this final cohort (median age 77 years [IQR = 10], males: 56), 50% of participants had a 6MWTCSI=1 (the median value of Delta meters was 29.5 [IQR = 61]). The preliminary correlational analysis reduced the cardinality of the variables to 20. All the variables related to the FOT and spirometry did not show significantly different distributions between the two groups stratified by outcome. Conversely, among the variables of the 6MWT, O2 saturation and Borg Dyspnoea Scale rating were measured at the beginning, and total meters significantly differed between the groups. (Table 1).

3.2. Cluster Analysis

The optimization of the number of clusters conducted for each of the clustering algorithms led to identical results for all: the configuration with two clusters was the one with the highest silhouette score (Figure 2). The silhouette scores for the two-cluster configuration were 0.20 for the Gaussian mixture model, 0.14 for BIRCH, 0.12 for k-means, and 0.08 for k-medoids.
The number of patients assigned to each cluster was computed for each clustering method to assess group balancing. K-medoids and k-means clustering resulted in the most balanced distributions (Ncl0 = 61, Ncl1 = 65 and Ncl0 = 60, Ncl1 = 66, respectively); conversely, the Gaussian mixture model and BIRCH showed less cluster balance (Ncl0 = 11, Ncl1 = 115 and Ncl0 = 27, Ncl1 = 99, respectively).
Given these findings, the k-means clustering solution has been considered the most appropriate for the analysis and was referred to as the respiratory rehabilitation index (R2I).
Concerning the comparison of clustering output with the dichotomized outcome, only k-means was statistically significant (χ2 = 4.58, p = 0.032). Conversely, the continuous outcome distribution was significantly different between the two clusters (Mann–Whitney, p < 0.05) for all the proposed solutions. The Delta meters distribution of the two clusters resulted in a median {IQR] of 21 [46.3] and 43.5 [74], 25 [57] and 30 [60], 20 [30.5] and 30 [57], and 21 [29] and 32 [63] for the k-means, k-medoids, GMM, and BIRCH, respectively (Figure 3). A radar plot illustrating the distribution of independent variables in the two clusters has been provided exclusively for the R2I (Figure 4). Several variables significantly differed between the two identified clusters (Table 2).

4. Discussion

This study demonstrated that unsupervised clustering techniques can effectively stratify COPD patients into distinct subgroups based on pre-rehabilitation characteristics, offering valuable insights into rehabilitation outcomes. The outcome measure, defined as the change in 6MWT distance between admission and discharge, was dichotomized based on the MCID threshold of 30 m. The optimal clustering solution was obtained using the k-means algorithm with two clusters, resulting in the R2I. The latter, obtained from T0 data, revealed a significant association with the outcome at T1 (p = 0.032), showing that patients with more severe baseline functional and respiratory impairments (R2I = 1) were positively associated with a post-rehabilitation improvement in walked distance. In particular, patients in R2I = 0, compared to those in R2I = 1, presented at admission with lower overall mechanical impairment (lower respiratory resistance values and smaller variations in reactance during the test), a more favorable ventilatory pattern and lung volumes, and better functional capacity, as indicated by higher walking performance, greater exercise tolerance, and lower perceived dyspnoea. Identifying these profiles through clustering before rehabilitation could help clinicians anticipate which patients are more likely to achieve meaningful functional improvement and adapt the intensity, focus, and monitoring of PR programs accordingly, ultimately aiming to maximize individual benefits. These findings suggested that pre-rehabilitation profiling through clustering can help identify patients who are more likely to benefit from PR in terms of the 6MWT, with a significant increase. While only a few functional parameters of the 6MWT, such as total distance and O2 saturation, showed significant differences between the groups identified by the outcome, the R2I clusters revealed differences in nearly all pre-rehabilitation variables. These included parameters from both the FOT and spirometry, which were not evident in the outcome-based grouping, indicating that these respiratory measures play a critical role in patient stratification and may better capture the underlying heterogeneity in rehabilitation responses. Key variables that contributed most to the discrimination between R2I clusters included ΔXRS, FEV/SVC, and IC. This multidimensional approach goes beyond single-domain assessments used in previous studies by capturing both respiratory mechanics and functional performance, providing a more accurate characterization of patient profiles.
From a methodological perspective, this study compares clustering algorithms, including k-medoids, the GMM, and BIRCH, in addition to k-means, ultimately selected as the most appropriate solution. The use of silhouette scores to choose the optimal number of clusters ensured an objective and reproducible approach, reinforcing the validity of the identified subgroups. These methodological strengths addressed critical gaps in the literature, where clustering solutions were often hindered by inconsistent methods and insufficient validation, resulting in a lack of reproducibility and practical relevance. By applying and comparing different clustering methodologies and achieving consistent subgroup identification across algorithms, this study enhanced confidence in the robustness of the R2I for patient stratification.
The most significant practical implication of this study is the potential to personalize rehabilitation strategies for COPD patients. COPD is a highly heterogeneous condition, with patients presenting diverse clinical profiles and responses to therapy, which often limits the effectiveness of standardized rehabilitation protocols. By stratifying patients into more homogeneous subgroups based on pre-rehabilitation features, unsupervised clustering techniques can contribute to understanding the relationship between pulmonary function impairment and mechanisms of response to PR. This approach enables the design of tailored rehabilitation programs with the potential to improve rehabilitation outcomes, reduce variability in responses, and support more effective patient management in clinical practice.

5. Limitations

The relatively small sample size may limit the generalizability of the findings to broader COPD populations. Moreover, conducting the study in a single rehabilitation center may have introduced bias linked to the specific population characteristics or local rehabilitation protocols.

6. Future Directions

Future research should focus on validating the R2I across larger COPD cohorts to enhance its generalizability and clinical applicability. Further investigations could benefit from the inclusion of additional clinical and functional variables, such as psychosocial factors (e.g., anxiety and depression [34]), comorbidities (e.g., cardiac, metabolic, orthopedic, or behavioral health problems [35]), and markers of skeletal muscle dysfunction [36]. These aspects are well established as influential determinants of rehabilitation outcomes in individuals with COPD. Incorporating them into a multidimensional framework may allow for more accurate patient stratification and could enhance the overall predictive value and clinical utility of the R2I.

7. Conclusions

This study shows that the unsupervised clustering of multidimensional admission data enables the identification of clinically meaningful subgroups of COPD patients undergoing pulmonary rehabilitation. By integrating 6MWT, FOT, and spirometry parameters, the R2I offers a data-driven stratification tool capable of predicting rehabilitation outcomes. Specifically, patients with more severe pre-rehabilitation impairment (R2I = 0) were more likely to achieve clinically significant improvements in functional capacity, as measured by the 6MWT.
The R2I captured differences across a broad range of admission variables, many of which were not univariately associated with the outcome. These findings underscore the potential of unsupervised machine learning approaches to uncover hidden patterns in complex clinical data and support more personalized rehabilitation strategies.

Author Contributions

Conceptualization, P.L. and A.M.; methodology, E.M. and P.L.; software, E.M.; validation, I.R. and F.G.; formal analysis, E.M. and P.L.; investigation, E.M., F.G. and I.R.; resources, A.M., F.G. and I.R.; writing—original draft preparation, E.M.; writing—review and editing, P.L., A.M., I.R. and F.G.; visualization, E.M.; supervision, I.R., F.G. and A.M.; project administration, A.M.; funding acquisition, A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Italian Ministry of Health under the “Ricerca Corrente” program.

Institutional Review Board Statement

The studies were approved by the Research Ethics Committee (approval code: r.n.18765_oss; r.n.15217_oss; approval date: 20 April 2020).

Informed Consent Statement

Written informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author for reproducibility purposes.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

6MWTSix-Minute Walk Test
BIRCHBalanced Iterative Reducing and Clustering using Hierarchies
CSIClinically Significant Improvement
COPDChronic Obstructive Pulmonary Disease
DALYsDisability-Adjusted Life Years
FEVForced Expiratory Volume
FEV1Forced Expiratory Volume in the First Second
FRCFunctional Residual Capacity
FOTForced Oscillation Technique
GMMGaussian Mixture Model
ICInspiratory Capacity
IQRInterquartile Range
kNNk-Nearest Neighbors
MCIDMinimal Clinically Important Difference
MRCMedical Research Council
PRPulmonary Rehabilitation
PRPPulmonary Rehabilitation Program
R2IRespiratory Rehabilitation Index
RFRespiratory Flow
RRSRespiratory System Resistance
RRRespiratory Rate
SVCSlow Vital Capacity
TEExpiratory Time
TIInspiratory Time
TLCTotal Lung Capacity
TTOTTotal Time
VEMechanical Ventilation
VRResidual Volume
VTTidal Volume
XRSRespiratory System Reactance

References

  1. Safiri, S.; Carson-Chahhoud, K.; Noori, M.; Nejadghaderi, S.A.; Sullman, M.J.M.; Heris, J.A.; Ansarin, K.; Mansournia, M.A.; Collins, G.S.; Kolahi, A.-A.; et al. Burden of chronic obstructive pulmonary disease and its attributable risk factors in 204 countries and territories, 1990–2019: Results from the Global Burden of Disease Study 2019. BMJ 2022, 378, e069679. [Google Scholar] [CrossRef]
  2. Wang, Z.; Lin, J.; Liang, L.; Huang, F.; Yao, X.; Peng, K.; Gao, Y.; Zheng, J. Global, regional, and national burden of chronic obstructive pulmonary disease and its attributable risk factors from 1990 to 2021: An analysis for the Global Burden of Disease Study 2021. Resp. Res. 2025, 26, 2. [Google Scholar] [CrossRef]
  3. Wedzicha, J.A.; Seemungal, T.A. COPD exacerbations: Defining their cause and prevention. Lancet 2007, 370, 786–796. [Google Scholar] [CrossRef]
  4. Vestbo, J.; Edwards, L.D.; Scanlon, P.D.; Yates, J.C.; Agusti, A.; Bakke, P.; Calverley, P.M.; Celli, B.; Coxson, H.O.; Crim, C.; et al. Changes in forced expiratory volume in 1 second over time in COPD. N. Engl. J. Med. 2011, 365, 1184–1192. [Google Scholar] [CrossRef] [PubMed]
  5. Fermont, J.M.; Masconi, K.L.; Jensen, M.T.; Ferrari, R.; Di Lorenzo, V.A.P.; Marott, J.M.; Schuetz, P.; Watz, H.; Waschki, B.; Müllerova, H.; et al. Biomarkers and clinical outcomes in COPD: A systematic review and meta-analysis. Thorax 2019, 74, 439–446. [Google Scholar] [CrossRef] [PubMed]
  6. Troosters, T.; Janssens, W.; Demeyer, H.; Rabinovich, R.A. Pulmonary rehabilitation and physical interventions. Eur. Respir. Rev. 2023, 32, 220222. [Google Scholar] [CrossRef] [PubMed]
  7. Corlateanu, A.; Mendez, Y.; Wang, Y.; Garnica, R.d.J.A.; Botnaru, V.; Siafakas, N. Chronic obstructive pulmonary disease and phenotypes: A state-of-the-art. Pulmonology 2020, 26, 95–100. [Google Scholar] [CrossRef]
  8. Jones, P.W.; Agusti, A.G.N. Outcomes and markers in the assessment of chronic obstructive pulmonary disease. Eur. Respir. J. 2006, 27, 822–832. [Google Scholar] [CrossRef]
  9. Jenkins, S.C. Six-minute walk test in patients with COPD: Clinical applications in pulmonary rehabilitation. Physiotherapy 2007, 93, 175–182. [Google Scholar] [CrossRef]
  10. Bestall, J.C.; A Paul, E.; Garrod, R.; Garnham, R.; Jones, P.W.; A Wedzicha, J. Usefulness of the Medical Research Council (MRC) dyspnoea scale as a measure of disability in patients with chronic obstructive pulmonary disease. Thorax 1999, 54, 581–586. [Google Scholar] [CrossRef]
  11. Komorowski, M.; Green, A.; Tatham, K.C.; Seymour, C.; Antcliffe, D. Sepsis biomarkers and diagnostic tools with a focus on machine learning. EBioMedicine 2022, 86, 104394. [Google Scholar] [CrossRef]
  12. Miller, R.J.H.; Bednarski, B.P.; Pieszko, K.; Kwiecinski, J.; Williams, M.C.; Shanbhag, A.; Liang, J.X.; Huang, C.; Sharir, T.; Hauser, M.T.; et al. Clinical phenotypes among patients with normal cardiac perfusion using unsupervised learning: A retrospective observational study. EBioMedicine 2024, 99, 104930. [Google Scholar] [CrossRef]
  13. Alexander, N.; Alexander, D.C.; Barkhof, F.; Denaxas, S. Identifying and evaluating clinical subtypes of Alzheimer’s disease in care electronic health records using unsupervised machine learning. BMC Med. Inf. Decis. Mak. 2021, 21, 343. [Google Scholar] [CrossRef] [PubMed]
  14. Burgel, P.R.; Paillasseur, J.-L.; Caillaud, D.; Tillie-Leblond, I.; Chanez, P.; Escamilla, R.; Court-Fortune, I.; Perez, T.; Carré, P.; Roche, N. Clinical COPD phenotypes: A novel approach using principal component and cluster analyses. Eur. Respir. J. 2010, 36, 531–539. [Google Scholar] [CrossRef] [PubMed]
  15. Pikoula, M.; Quint, J.K.; Nissen, F.; Hemingway, H.; Smeeth, L.; Denaxas, S. Identifying clinically important COPD sub-types using data-driven approaches in primary care population-based electronic health records. BMC Med. Inf. Decis. Mak. 2019, 19, 1–14. [Google Scholar] [CrossRef] [PubMed]
  16. Burgel, P.R.; Quint, J.K.; Nissen, F.; Hemingway, H.; Smeeth, L.; Denaxas, S. A simple algorithm for the identification of clinical COPD phenotypes. Eur. Respir. J. 2017, 50, 1701034. [Google Scholar] [CrossRef]
  17. Chikhanie, Y.A.; Bailly, S.; Amroussa, I.; Veale, D.; Hérengt, F.; Verges, S. Clustering of COPD patients and their response to pulmonary rehabilitation. Respir. Med. 2022, 198, 106861. [Google Scholar] [CrossRef]
  18. Spruit, M.A.; Augustin, I.M.L.; Vanfleteren, L.E.; Janssen, D.J.A.; Gaffron, S.; Pennings, H.-J.; Smeenk, F.; Pieters, W.; van den Bergh, J.J.A.M.; Michels, A.-J.; et al. Differential response to pulmonary rehabilitation in COPD: Multidimensional profiling. Eur. Respir. J. 2015, 46, 1625–1635. [Google Scholar] [CrossRef]
  19. Rochester, C.L.; Vogiatzis, I.; Holland, A.E.; Lareau, S.C.; Marciniuk, D.D.; Puhan, M.A.; Spruit, M.A.; Masefield, S.; Casaburi, R.; Clini, E.M.; et al. An official American Thoracic Society/European Respiratory Society policy statement enhancing implementation, use, and delivery of pulmonary rehabilitation. Am. J. Respir. Crit. Care Med. 2015, 192, 1373–1386. [Google Scholar] [CrossRef]
  20. Agustí, A.; Celli, B.R.; Criner, G.J.; Halpin, D.; Anzueto, A.; Barnes, P.; Bourbeau, J.; Han, M.K.; Martinez, F.J.; de Oca, M.M.; et al. Global Initiative for Chronic Obstructive Lung Disease 2023 report. GOLD executive summary. Eur. Respir. J. 2023, 61, 2300239. [Google Scholar] [CrossRef]
  21. Oostveen, E.; MacLeod, D.; Lorino, H.; Farré, R.; Hantos, Z.; Desager, K.; Marchal, F. The FOT in clinical practice: Methodology, recommendations and future developments. Eur. Respir. J. 2003, 22, 1026–1041. [Google Scholar] [CrossRef]
  22. Graham, B.L.; Steenbruggen, I.; Miller, M.R.; Barjaktarevic, I.Z.; Cooper, B.G.; Hall, G.L.; Hallstrand, T.S.; Kaminsky, D.A.; McCarthy, K.; McCormack, M.C.; et al. Standardization of spirometry: 2019 update. Am. J. Respir. Crit. Care Med. 2019, 200, e70–e88. [Google Scholar] [CrossRef]
  23. Enright, P.L. The six-minute walk test. Respir. Care. 2003, 48, 783–785. [Google Scholar]
  24. Singh, S.J.; Puhan, M.A.; Andrianopoulos, V.; Hernandes, N.A.; Mitchell, K.E.; Hill, C.J.; Lee, A.L.; Camillo, C.A.; Troosters, T.; Spruit, M.A.; et al. An official systematic review of the European Respiratory Society/American Thoracic Society: Measurement properties of field walking tests in chronic respiratory disease. Eur. Respir. J. 2014, 44, 1447–1478. [Google Scholar] [CrossRef]
  25. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  26. MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June–18 July 1965, 27 December 1965–7 January 1966; Le Cam, L.M., Neyman, J., Eds.; University of California Press: Oakland, CA, USA, 1967. [Google Scholar]
  27. Park, H.S.; Jun, C.H. A simple and fast algorithm for K-medoids clustering. Expert. Syst. Appl. 2009, 36, 3336–3341. [Google Scholar] [CrossRef]
  28. Rasmussen, C. The infinite Gaussian mixture model. Adv. Neural Inf. Process. Syst. 1999, 12, 554–560. [Google Scholar]
  29. Zhang, T.; Ramakrishnan, R.; Livny, M. BIRCH: An efficient data clustering method for very large databases. ACM SIGMOD Rec. 1996, 25, 103–114. [Google Scholar] [CrossRef]
  30. Arthur, D.; Vassilvitskii, S. k-Means++: The Advantages of Careful Seeding; Technical Report; Stanford University: Palo Alto, CA, USA, 2006. [Google Scholar]
  31. Elkan, C. Using the triangle inequality to accelerate k-means. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA, 21–24 August 2003; pp. 147–153. [Google Scholar]
  32. Moon, T.K. The expectation-maximization algorithm. IEEE Signal Process Mag. 1996, 13, 47–60. [Google Scholar] [CrossRef]
  33. Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
  34. Gordon, C.S.; Waller, J.W.; Cook, R.M.; Cavalera, S.L.; Lim, W.T.; Osadnik, C.R. Effect of pulmonary rehabilitation on symptoms of anxiety and depression in COPD: A systematic review and meta-analysis. Chest 2019, 156, 80–91. [Google Scholar] [CrossRef] [PubMed]
  35. Tunsupon, P.; Lal, A.; Abo Khamis, M.; Mador, M.J. Comorbidities in patients with chronic obstructive pulmonary disease and pulmonary rehabilitation outcomes. J. Cardiopulm. Rehabil. Prev. 2017, 37, 283–289. [Google Scholar] [CrossRef] [PubMed]
  36. Jaitovich, A.; Barreiro, E. Skeletal muscle dysfunction in chronic obstructive pulmonary disease: What we know and can do for our patients. Am. J. Respir. Crit. Care Med. 2018, 198, 175–186. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Data collection (A) and clustering analysis (B) pipeline.
Figure 1. Data collection (A) and clustering analysis (B) pipeline.
Diagnostics 15 02053 g001
Figure 2. Silhouette score values as the number of clusters varies between 2 and 15.
Figure 2. Silhouette score values as the number of clusters varies between 2 and 15.
Diagnostics 15 02053 g002
Figure 3. Boxplot representation of Delta meters distribution between the two clusters for each algorithm classification.
Figure 3. Boxplot representation of Delta meters distribution between the two clusters for each algorithm classification.
Diagnostics 15 02053 g003
Figure 4. Mean and standard deviation of k-means cluster centroids. Dashed blue and solid red represent R2I equal to 0 and 1, respectively.
Figure 4. Mean and standard deviation of k-means cluster centroids. Dashed blue and solid red represent R2I equal to 0 and 1, respectively.
Diagnostics 15 02053 g004
Table 1. Descriptive statistics of overall analysis samples: 6MWT clinically significant improvement patient sample, and no 6MWT clinically significant improvement patient sample. Comparative analysis between groups was conducted.
Table 1. Descriptive statistics of overall analysis samples: 6MWT clinically significant improvement patient sample, and no 6MWT clinically significant improvement patient sample. Comparative analysis between groups was conducted.
VariablesTotal
(N = 126)
6MWTCSI=0
(N = 63)
6MWTCSI=1
(N = 63)
p-Value
Median [IQR] or N (%)Median [IQR] or N (%)Median [IQR] or N (%)
Sex, male1: 56 (44.4%)1: 32 (50.7%)1: 24 (38%)0.151
Age, yr77.00 [10.00]75.00 [11.00]78.00 [11.00]0.373
RRS4.50 [1.62]4.37 [1.97]4.56 [1.56]0.238
XRS−1.83 [1.33]−1.66 [1.15]−2.02 [1.58]0.230
ΔXRS2.13 [3.45]1.99 [3.87]2.51 [2.74]0.281
TI1.27 [0.53]1.31 [0.48]1.24 [0.55]0.221
TE2.20 [0.85]2.31 [1.01]2.07 [0.68]0.135
TI/TTOT0.37 [0.06]0.36 [0.07]0.38 [0.05]0.756
VT0.66 [0.33]0.73 [0.37]0.62 [0.30]0.275
VE11.37 [4.16]11.27 [3.74]11.37 [4.67]0.560
FEV/SVC39.50 [24.25]40.00 [26.00]39.00 [24.00]0.687
FEV10.91 [0.52]0.91 [0.52]0.92 [0.53]0.811
TLC6.10 [2.68]6.08 [3.00]6.17 [2.54]0.603
IC1.66 [0.92]1.66 [0.85]1.63 [0.97]0.709
O2 saturation i95.00 [4.00]96.00 [3.00]95 [3.00]0.007
Borg Scale i0.50 [2.00]0.00 [1.00]0.50 [2.00]0.036
Borg Scale limbs i0.00 [1.30]0.00 [1.00]0.00 [2.00]0.491
O2 i0.00 [3.00]0.00 [3.00]0.00 [5.00]0.709
O2 saturation f92.00 [7.00]92.00 [7.00]92.00 [7.00]0.805
Borg Scale f5.00 [2.00]5.00 [2.00]5.00 [2.00]0.877
Borg Scale limbs f3.00 [3.00]3.00 [3.00]3.00 [3.00]0.877
Total meters287.50 [148.00]345.00 [120.00]240.00 [112.00]<0.001
Abbreviations: 6MWT, six-minute walk test; Borg Scale f, Borg Dyspnoea Scale at the end of 6MWT; Borg Scale i, Borg Dyspnoea Scale at the beginning of 6MWT; Borg Scale limbs f, Borg Scale for limb fatigue at the end of 6MWT; Borg Scale limbs i, Borg Scale for limb fatigue at the beginning of 6MWT; CSI, clinically significant improvement; FEV1, forced expiratory volume in the first second; FEV/SVC, forced expiratory volume divided by slow vital capacity; IC, inspiratory capacity; IQR, interquartile range; O2 i, oxygen level at the beginning of 6MWT; O2 saturation f, oxygen saturation at the end of 6MWT; O2 saturation i, oxygen saturation at the beginning of 6MWT; RRS, respiratory system resistance; TI, inspiratory time; TI/TTOT, ratio of TI to total time; TLC, total lung capacity; VE, mechanical ventilation; VT, tidal volume; XRS, respiratory system reactance; ΔXRS, change in respiratory system reactance. Statistically significant p-values are in bold.
Table 2. Comparison of median and IQR of all the variables at admission between R2I clusters. p-values associated with Mann–Whitney test were reported. Statistially significant values are in bold.
Table 2. Comparison of median and IQR of all the variables at admission between R2I clusters. p-values associated with Mann–Whitney test were reported. Statistially significant values are in bold.
VariablesR2I = 0R2I = 1p-Value
MedianIQRMedianIQR
RRS3.741.445.061.61<0.001
XRS−1.460.93−2.371.18<0.001
ΔXRS0.881.803.974.51<0.001
TI1.430.451.120.46<0.001
TE2.111.002.170.800.841
TI/TTOT0.400.070.360.07<0.001
VT0.740.310.620.30<0.05
VE11.574.4011.304.140.545
FEV/SVC54.0022.5036.0011.50<0.001
FEV11.230.550.720.31<0.001
TLC5.843.306.182.100.725
IC2.000.851.440.55<0.001
O2 saturation i95.004.0095.003.000.349
Borg Scale i0.000.501.002.000.001
Borg Scale limbs i0.000.000.002.000.004
O2 i0.000.500.006.000.039
O2 saturation f93.006.0090.007.000.003
Borg Scale f3.003.005.003.00<0.001
Borg Scale limbs f2.502.804.002.000.001
Total meters350.00128.00230.00123.00<0.001
Abbreviations: 6MWT, six-minute walk test; Borg Scale f, Borg Dyspnoea Scale at the end of 6MWT; Borg Scale i, Borg Dyspnoea Scale at the beginning of 6MWT; Borg Scale limbs f, Borg Scale for limb fatigue at the end of 6MWT; Borg Scale limbs i, Borg Scale for limb fatigue at the beginning of 6MWT; CSI, clinically significant improvement; FEV1, forced expiratory volume in the first second; FEV/SVC, forced expiratory volume divided by slow vital capacity; IC, inspiratory capacity; IQR, interquartile range; O2 i, oxygen level at the beginning of 6MWT; O2 saturation f, oxygen saturation at the end of 6MWT; O2 saturation i, oxygen saturation at the beginning of 6MWT; RRS, respiratory system resistance; TI, inspiratory time; TI/TTOT, ratio of TI to total time; TLC, total lung capacity; VE, mechanical ventilation; VT, tidal volume; XRS, respiratory system reactance; ΔXRS, change in respiratory system reactance.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Marra, E.; Liuzzi, P.; Mannini, A.; Romagnoli, I.; Gigliotti, F. Respiratory Rehabilitation Index (R2I): Unsupervised Clustering Approach to Identify COPD Subgroups Associated with Rehabilitation Outcomes. Diagnostics 2025, 15, 2053. https://doi.org/10.3390/diagnostics15162053

AMA Style

Marra E, Liuzzi P, Mannini A, Romagnoli I, Gigliotti F. Respiratory Rehabilitation Index (R2I): Unsupervised Clustering Approach to Identify COPD Subgroups Associated with Rehabilitation Outcomes. Diagnostics. 2025; 15(16):2053. https://doi.org/10.3390/diagnostics15162053

Chicago/Turabian Style

Marra, Ester, Piergiuseppe Liuzzi, Andrea Mannini, Isabella Romagnoli, and Francesco Gigliotti. 2025. "Respiratory Rehabilitation Index (R2I): Unsupervised Clustering Approach to Identify COPD Subgroups Associated with Rehabilitation Outcomes" Diagnostics 15, no. 16: 2053. https://doi.org/10.3390/diagnostics15162053

APA Style

Marra, E., Liuzzi, P., Mannini, A., Romagnoli, I., & Gigliotti, F. (2025). Respiratory Rehabilitation Index (R2I): Unsupervised Clustering Approach to Identify COPD Subgroups Associated with Rehabilitation Outcomes. Diagnostics, 15(16), 2053. https://doi.org/10.3390/diagnostics15162053

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop