Subtypes of Patients with Mild to Moderate Airflow Limitation as Predictors of Chronic Obstructive Pulmonary Disease Exacerbation

COPD is a heterogeneous disease, and its acute exacerbation is a major prognostic factor. We used cluster analysis to predict COPD exacerbation due to subtypes of mild–moderate airflow limitation. In all, 924 patients from the Korea COPD Subgroup Study cohort, with a forced expiratory volume (FEV1) ≥ 50% and documented age, body mass index (BMI), smoking status, smoking pack-years, COPD assessment test (CAT) score, predicted post-bronchodilator FEV1, were enrolled. Four groups, putative chronic bronchitis (n = 224), emphysema (n = 235), young smokers (n = 248), and near normal (n = 217), were identified. The chronic bronchitis group had the highest BMI, and the one with emphysema had the oldest age, lowest BMI, and highest smoking pack-years. The young smokers group had the youngest age and the highest proportion of current smokers. The near-normal group had the highest proportion of never-smokers and near-normal lung function. When compared with the near-normal group, the emphysema group had a higher risk of acute exacerbation (OR: 1.93, 95% CI: 1.29–2.88). However, multiple logistic regression showed that chronic bronchitis (OR: 2.887, 95% CI: 1.065–8.192), predicted functional residual capacity (OR: 1.023, 95% CI: 1.007–1.040), fibrinogen (OR: 1.004, 95% CI: 1.001–1.008), and gastroesophageal reflux disease were independent predictors of exacerbation (OR: 2.646, 95% CI: 1.142–6.181). The exacerbation-susceptible subtypes require more aggressive prevention strategies.


Introduction
Chronic obstructive pulmonary disease (COPD) is the third major cause of death worldwide, and its prevalence is expected to rise over the next 30 years to up to 4.5 million cases by 2030, while increasing the risk of death and healthcare costs [1,2].
COPD is traditionally defined as the post-bronchodilator limitation of expiratory airflow, which is indicated by a forced expiratory volume in 1 s (FEV1)/a forced vital capacity (FVC) < 0.7 [3].However, COPD is a heterogeneous group of many diseases with different causes, mechanisms, and structural abnormalities that arise as lung function declines with age [4,5].Recent studies have described a variable lung function trajectory of accelerated FEV1 decline caused by early-life disadvantages and exposure to risk factors, especially smoking, and also highlighted the importance of low FEV1 in early adulthood as an indicator of early COPD [6][7][8].However, it is hard to detect early COPD in the current clinical settings, and it can remain underdiagnosed based on the current COPD definition based on the FEV1/FVC ratio [9].
Based on its natural history, early COPD is more likely to progress and be diagnosed as late, mild-moderate, or severe during middle age or later rather than to remain stable without progressing.Because early-COPD patients rarely seek medical attention for preclinical symptoms, it is difficult to prevent the progression of early COPD, for example, by quitting smoking.Colak et al. reported that early-COPD patients had significantly higher hospitalization and mortality rates over a 14-year follow-up, highlighting the importance of managing early COPD [10].
The category of patients with mild-moderate airflow limitation may include patients who have progressed from early to late COPD, those with mild-moderate obstruction, and patients diagnosed with COPD because of accelerated lung function decline.Of the patients in this group, some progress and develop severe airflow limitation, which leads to a poor prognosis.Therefore, a better understanding of the heterogeneity of patients with mild-moderate airflow limitation may have significant clinical value because it is easier to diagnose than early COPD, and its diagnosis may provide the opportunity to alter the course of the disease before it develops into severe COPD.
The k-means clustering algorithm learns groups or patterns of data using a function called k-means in the R-based library environment.The advantage of k-means clustering is that it provides researchers with a reasonable-similarity group by obtaining a qualitative and quantitative understanding of a large amount of N-dimensional data [11].Previous studies have also used clustering for data-driven patient grouping in medical research [12,13].
In this study, we use k-means clustering to subgroup mild to moderate COPD patients and distinguish the phenotypes of four groups, as "chronic bronchitis", "emphysema", "young smokers", and "near normal" and predict its prognosis.The data were drawn from the Korea COPD Subgroup Study (KOCOSS) cohort.

Data Collection
The KOCOSS cohort is a multicenter observation cohort study that enrolled COPD patients from 45 Korean tertiary and university hospitals from December 2011 to October 2014.The inclusion criteria were COPD diagnosis by a pulmonologist, being ≥40 years old, COPD symptoms (e.g., coughing, sputum, and dyspnea), and a post-bronchodilator FEV1/FVC value of <0.7 [3,14].The medical history taken at the first hospital visit included the frequency and severity of exacerbations in the previous 12 months, smoking status, treatment (including previously prescribed COPD medications), and comorbidities.All of the data were reported using case-report forms (CRFs) completed by physicians or trained nurses, and the patients were to be evaluated at regular 6-month intervals after the initial examination.
The main exclusion criteria were asthma, bronchiectasis and tuberculosis lung damage included in obstructive lung disease, incapability to complete the pulmonary function test, myocardial infarction or a cerebrovascular event in the previous three months, pregnancy, rheumatoid disease, malignancy, inflammatory bowel disease, and steroid use for conditions other than COPD exacerbation within eight weeks before enrollment [14].
The pulmonary function, disease severity, and exercise capacity were examined using spirometry and the six-minute walk test (6MWT), following standard procedures [15,16].
The smoking status was defined as never (someone who had smoked less than 100 cigarettes in their lifetime and did not currently smoke), former (someone who had smoked at least 100 cigarettes in their lifetime but had not smoked in the previous month), or current smokers (someone who had smoked at least 100 cigarettes in their lifetime and had smoked within the previous month) [17].
COPD exacerbations were defined as acute respiratory symptoms requiring either antibiotics or systemic steroids and severe respiratory events requiring hospitalization [18].

Study Population
Of the 2181 subjects in the KOCOSS cohort, 93 with missing baseline pulmonary function test data, 588 with insufficient follow-up data, and 532 with measured postbronchodilator FEV1 values of <50% were excluded from the analyses.
The variables age, body mass index (BMI), smoking status, smoking amount, predicted post-bronchodilator FEV1 (%), and COPD assessment test (CAT) score were selected for cluster analysis (Figure 1).Forty-four subjects, who were missing at least one of the six key input variables, were excluded.Our cluster analysis of COPD subtypes included 924 patients, and it identified four groups: Cluster 1 (n = 224), Cluster 2 (n = 235), Cluster 3 (n = 248), and Cluster 4 (n = 217), which were named the chronic bronchitis group, the emphysema group, the young smokers group, and the near-normal group, respectively (Figure 2).
age included in obstructive lung disease, incapability to complete the pulmonary function test, myocardial infarction or a cerebrovascular event in the previous three months, pregnancy, rheumatoid disease, malignancy, inflammatory bowel disease, and steroid use for conditions other than COPD exacerbation within eight weeks before enrollment [14].
The pulmonary function, disease severity, and exercise capacity were examined using spirometry and the six-minute walk test (6MWT), following standard procedures [15,16].
The smoking status was defined as never (someone who had smoked less than 100 cigarettes in their lifetime and did not currently smoke), former (someone who had smoked at least 100 cigarettes in their lifetime but had not smoked in the previous month), or current smokers (someone who had smoked at least 100 cigarettes in their lifetime and had smoked within the previous month) [17].
COPD exacerbations were defined as acute respiratory symptoms requiring either antibiotics or systemic steroids and severe respiratory events requiring hospitalization [18].

Study Population
Of the 2181 subjects in the KOCOSS cohort, 93 with missing baseline pulmonary function test data, 588 with insufficient follow-up data, and 532 with measured post-bronchodilator FEV1 values of <50% were excluded from the analyses.
The variables age, body mass index (BMI), smoking status, smoking amount, predicted post-bronchodilator FEV1 (%), and COPD assessment test (CAT) score were selected for cluster analysis (Figure 1).Forty-four subjects, who were missing at least one of the six key input variables, were excluded.Our cluster analysis of COPD subtypes included 924 patients, and it identified four groups: Cluster 1 (n = 224), Cluster 2 (n = 235), Cluster 3 (n = 248), and Cluster 4 (n = 217), which were named the chronic bronchitis group, the emphysema group, the young smokers group, and the near-normal group, respectively (Figure 2).

Statistical Analysis
A k-means clustering analysis was conducted on a subset of KOCOSS subjects with complete data on the six variables.We used normalized mutual information to determine the optimal number of clusters.The "NbClust" machine learning library was used to determine the appropriate number of clusters for clustering [19].After clustering, the clinical characteristics were compared across the groups.The continuous variables are presented as mean ± standard deviation.The categorical variables are presented as proportions (%).One-way ANOVA or the Kruskal-Wallis test was used to analyze continuous variables.The Pearson chi-square test or the Fisher s exact test was used to analyze categorical variables.Acute exacerbation risk factors, including COPD subtypes, were analyzed using multiple logistic regression analysis and reported as odds ratios (ORs) with 95% confidence intervals (CIs).Based on univariate analysis results, variables with p < 0.1 were subjected to multivariate analysis.To handle the multicollinearity of correlated independent variables, we excluded variables with a variable inflation factor of >10.Furthermore, p < 0.05 (two-tailed) indicated statistically significant differences.All statistical analyses were performed on SPSS version 24.0 (SPSS Inc., Chicago, IL, USA).

Baseline Characteristics
Clustering analysis was used to group the 924 subjects included in this study into the putative chronic bronchitis (n = 224), emphysema (n = 235), young smokers (n = 248), and near-normal (n = 217) groups.
Table 1 shows the baseline characteristics of the subjects based on the cluster groups.The proportion of males was high in all groups.Of the four groups, the chronic bronchitis group had the highest mean BMI, whereas the emphysema group had the highest mean age, lowest BMI, and the highest smoking pack-years (all p < 0.001).The young smokers group had the lowest mean age and the highest number of current smokers.The nearnormal group had the highest proportion of never-smokers and the lowest number of current smokers (all p < 0.001).

Statistical Analysis
A k-means clustering analysis was conducted on a subset of KOCOSS subjects with complete data on the six variables.We used normalized mutual information to determine the optimal number of clusters.The "NbClust" machine learning library was used to determine the appropriate number of clusters for clustering [19].After clustering, the clinical characteristics were compared across the groups.The continuous variables are presented as mean ± standard deviation.The categorical variables are presented as proportions (%).One-way ANOVA or the Kruskal-Wallis test was used to analyze continuous variables.The Pearson chi-square test or the Fisher's exact test was used to analyze categorical variables.Acute exacerbation risk factors, including COPD subtypes, were analyzed using multiple logistic regression analysis and reported as odds ratios (ORs) with 95% confidence intervals (CIs).Based on univariate analysis results, variables with p < 0.1 were subjected to multivariate analysis.To handle the multicollinearity of correlated independent variables, we excluded variables with a variable inflation factor of >10.Furthermore, p < 0.05 (two-tailed) indicated statistically significant differences.All statistical analyses were performed on SPSS version 24.0 (SPSS Inc., Chicago, IL, USA).

Baseline Characteristics
Clustering analysis was used to group the 924 subjects included in this study into the putative chronic bronchitis (n = 224), emphysema (n = 235), young smokers (n = 248), and near-normal (n = 217) groups.
Table 1 shows the baseline characteristics of the subjects based on the cluster groups.The proportion of males was high in all groups.Of the four groups, the chronic bronchitis group had the highest mean BMI, whereas the emphysema group had the highest mean age, lowest BMI, and the highest smoking pack-years (all p < 0.001).The young smokers group had the lowest mean age and the highest number of current smokers.The near-normal group had the highest proportion of never-smokers and the lowest number of current smokers (all p < 0.001).Pulmonary function analysis revealed that the near-normal group had the best lung function, and their mean FEV1 was above 80% of the normal predicted value.The chronic bronchitis and emphysema groups had the lowest predicted post-bronchodilator FEV1 (%), and the chronic bronchitis group had the lowest predicted post-bronchodilator FVC (%) (p < 0.001), whereas the emphysema group showed the lowest diffusing capacity (DLCO) (p < 0.001).
The median CAT score and the St. Georges Respiratory Questionnaire (SGRQ) score were highest in the emphysema group (all p < 0.001).Based on the 6MWT, the emphysema group had the highest dyspnea score (p < 0.001) and the shortest distance walked (p < 0.001).
Although not all subjects had chest computed tomography (CT) results, emphysema was most frequent in the emphysema group (p = 0.017), whereas bronchiectasis was most common in the chronic bronchitis group (p = 0.002, Supplemental Table S1).

Comorbidities and Medical History
Among the four groups, the proportion of diabetes mellitus, hypertension, and asthma was the highest in the chronic bronchitis group (all p < 0.001, Table 2).The most commonly prescribed medications, regardless of COPD clusters, were longacting muscarinic antagonists, followed by a combination of inhaled corticosteroids (ICSs) and long-acting beta-2 agonists (LABAs).The use of methylxanthine was the highest in emphysema group (p < 0.001, Table 3).

Occurrence of Acute Exacerbation and Mortality
The number of exacerbations during one year prior to the first visit was significantly higher in the emphysema group (p < 0.001).In case of severe exacerbation, there was no significant difference between the groups.Mortality was the highest in the emphysema group (p < 0.05, Table 4).
Among the four groups, the emphysema group was associated with a greater risk of acute exacerbation when compared to the near-normal group (OR, 1.93, 95% CI, 1.29 to 2.88, p < 0.001) (Figure 3).

Occurrence of Acute Exacerbation and Mortality
The number of exacerbations during one year prior to the first visit was significantly higher in the emphysema group (p < 0.001).In case of severe exacerbation, there was no significant difference between the groups.Mortality was the highest in the emphysema group (p < 0.05, Table 4).Among the four groups, the emphysema group was associated with a greater risk of acute exacerbation when compared to the near-normal group (OR, 1.93, 95% CI, 1.29 to 2.88, p < 0.001) (Figure 3).A multiple logistic regression analysis was performed including non-clustered variables with a p-value < 0.1 in univariate analysis (Supplement Table S2).As a result, for the chronic bronchitis group (OR, 2.887; 95% CI, 1.065-8.192,p = 0.040), functional residual capacity (FRC) % predicted (OR, 1.023; 95% CI, 1.007-1.040,p = 0.007), fibrinogen (OR, 1.004; 95% CI, 1.001-1.008,p = 0.015), and gastroesophageal reflux disease (GERD) were independent predictors of exacerbation in patients with mild to moderate airflow limitation (OR, 2.646; 95% CI,1.142-6.181,p = 0.023) (Table 5).

Discussion
This study subtyped patients with a mild to moderate airflow limitation through cluster analysis using variables closely related to the prognosis of COPD.Our emphysema group showed characteristics such as the oldest age, the lowest BMI, the highest pack-years of smoking over 50 pack-years, the second-lowest FEV1 % predicted, and the highest CAT score.In this group, emphysema was observed in almost half of the cases for which a chest CT was available.Predictably, the emphysema group had more moderate exacerbations and higher mortality than the other groups.In particular, when compared with the nearnormal group, which had the highest ratio of never-smokers and the best lung function, only the emphysema group had a significantly higher risk of exacerbation.However, it was not an independent predictor of exacerbation in the results of the multiple logistic regression analysis.
Our chronic bronchitis group appeared to be independently susceptible to exacerbation in the multiple logistic regression analysis model.The chronic bronchitis group had the lowest FEV1 and the highest BMI and had many findings of bronchiectasis on chest CT, suggesting similar findings to chronic bronchitis, which is commonly described as one of the clinical phenotypes of COPD.In addition, the chronic bronchitis group in our study had a high prevalence of diabetes mellitus, hypertension, and asthma, which are well known to be related to a high BMI.It is likely that these comorbidities accelerated inflammation in the chronic bronchitis group, making them more susceptible to acute exacerbations.In previous studies by Kim et al., the chronic bronchitis group was younger, had lower FEV1, had more males, had more current smokers, and had a higher exacerbation rate than those without chronic bronchitis [20,21].From our chronic bronchitis group, we too infer that chronic bronchitis features are predictors of COPD exacerbations even in patients with mild to moderate airflow limitation, even though the emphysema group, unlike the chronic bronchitis group, was the oldest, had the lowest BMI, and had the lowest quality of life.Furthermore, FRC [22], fibrinogen [23,24], and GERD [25][26][27], previously known as predictors of the exacerbation of COPD, were also identified as independent predictors in our study, indicating the validity of our cluster analysis results.
The clinical meaning of acute exacerbation is important because it is not just an acute, worsening respiratory event but an event that leads to more frequent non-respiratory adverse events [28] and a relapse of acute exacerbation with hospitalization, healthcare costs [29], and finally COPD-related mortality [30].Previous studies have reported the clinical importance of acute exacerbation in mild COPD [25].Dransfield et al. reported that exacerbation is associated with greater lung function decline in subjects with COPD, especially those with mild COPD (GOLD stage 1 or 2), than in those with GOLD stage 3 or 4. In the GOLD 1 group, the additional FEV1 decline was 23 mL/yr; otherwise, it was 10 mL/yr in GOLD 2, 8 mL/yr in GOLD 3, and no change in GOLD 4, respectively.This implies that mild COPD can potentially progress to more advanced lung disease when confronted with acute exacerbation [31].
Through this study, we hypothesis that, among COPD clusters, a specific cluster could be susceptible to acute exacerbation leading to pulmonary function decline, reducing the quality of life and resulting in increasing mortality.These results are useful for the early detection of high-risk subtypes susceptible to exacerbation and for drug use to reduce the risk of acute exacerbation and mortality.For example, if medical resources are limited and only a subset of COPD patients with mild to moderate airflow restriction should be selected with priority, patients with chronic bronchitis features with a low FEV1 or emphysematous feature with a low FEV1 and a poor quality of life should be prioritized.For these patients, pharmacologic interventions such as mucous removal or inhalers and education on smoking cessation should be conducted more actively.
The strength of this study is that we identified which clinical phenotype easily causes exacerbation in patients with mild to moderate airflow limitation.The previous literature has suggested some types of COPD that were susceptible to exacerbation independent of disease severity [25], but our study has the advantage of dividing the groups more systematically through a clustering technique based on variables that can be easily confirmed clinically or by interview.The second advantage of our study is that it used a large representative COPD cohort.The KOCOSS cohort is composed of a relatively large number of patients with mild to moderate airflow limitation (mean post-bronchodilator FEV1 55.8% predicted) [14] compared to other larger cohorts such as the genetic epidemiology of COPD cohort (COPDGene, mean FEV1 48.3% predicted) [32] or the ECLIPSE cohort (mean FEV1 48.9% predicted) [5].In addition, numerous factors affecting acute exacerbation were comprehensively analyzed.
Our study has some limitations.First, because this study was conducted in Korea, and the sample size of 924 participants is relatively small, generalization to other ethnic groups and countries could be difficult.Also, this study focused on patients with mild to moderate airflow limitation; therefore, generalization could be difficult across the GOLD stage.Second, we provide evidence based on cross-sectional analysis of the data collected at the time of enrollment in KOCOSS, so longitudinal data are limited.Third, methylxantine is not indicated for COPD; however, it is commonly used in Korea, so data related to its use were also included in the analysis, hampering generalizability.Further, well-designed studies could aim to include larger and more diverse samples to improve the external validity of the findings.

Conclusions
COPD heterogeneity exists in patients with mild to moderate airflow obstruction, and the "chronic bronchitis" phenotype showed a high risk of exacerbation.For subtypes at a high risk of the exacerbation of COPD, active intervention efforts are needed to prevent exacerbation.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10.3390/jcm12206643/s1,Table S1: Chest computed tomography (CT) findings, Table S2: Univariate logistic regression analysis of risk factors associated with exacerbation of COPD.Institutional Review Board Statement: This study was approved by the institutional review board of Ewha Woman's University Hospital (approval number: ECT 2012-04-006-001).
Informed Consent Statement: All study participants provided written informed consent before recruitment into the cohort.

Figure 2 .
Figure 2. Overview of study design.

Figure 2 .
Figure 2. Overview of study design.

Figure 3 .
Figure 3. Odds ratio of moderate acute exacerbation among cluster groups.** are presented as p < 0.001.

Figure 3 .
Figure 3. Odds ratio of moderate acute exacerbation among cluster groups.** are presented as p < 0.001.

Table 1 .
Comparison of baseline characteristics according to cluster analysis groups.

Table 2 .
Comparison of comorbidities according to cluster analysis groups.
Data are presented as number (%) or mean ± SD.GERD, gastroesophageal reflux disease.

Table 3 .
Comparison of COPD drug prescription status according to cluster analysis groups.

Table 4 .
Comparison of exacerbation and mortality according to cluster analysis groups.
Data are presented as number (%) or mean ± SD.

Table 4 .
Comparison of exacerbation and mortality according to cluster analysis groups.
Data are presented as number (%) or mean ± SD.

Table 5 .
Multiple logistic regression analysis of risk factors for exacerbation of COPD in patients with mild to moderate airflow obstruction.