Health-Screening-Based Chronic Obstructive Pulmonary Disease and Its Effect on Cardiovascular Disease Risk

Chronic obstructive pulmonary disease (COPD) is considered a major cause of death worldwide, and various studies have been conducted for its early diagnosis. Our work developed a scoring system by predicting and validating COPD and performed predictive model implementations. Participants who underwent a health screening between 2017 and 2020 were extracted from the Korea National Health and Nutrition Examination Survey (KNHANES) database. COPD individuals were defined as aged 40 years or older with prebronchodilator forced expiratory volume in 1 s/forced vital capacity (FEV1/FVC < 0.7). The logistic regression model was performed, and the C-index was used for variable selection. Receiver operating characteristic (ROC) curves with area under the curve (AUC) values were generated for evaluation. Age, sex, waist circumference and diastolic blood pressure were used to predict COPD and to develop a COPD score based on a multivariable model. A simplified model for COPD was validated with an AUC value of 0.780 from the ROC curves. In addition, we evaluated the association of the derived score with cardiovascular disease (CVD). COPD scores showed significant performance in COPD prediction. The developed score also showed a good effect on the diagnostic ability for CVD risk. In the future, studies comparing the diagnostic accuracy of the derived scores with standard diagnostic tests are needed.


Introduction
Chronic obstructive pulmonary disease (COPD) is the third leading cause of death in the world, and its global burden is expected to further increase [1]. Several studies suggest that comorbid conditions such as cardiovascular disease (CVD) and poor health-related quality of life influence the worsening of respiratory symptoms in COPD patients [2]. The extrapulmonary effect showing symptoms of skeletal muscle dysfunction and osteoporosis is also reported to be frequent in COPD patients with some diseases, such as chronic infections and CVD [3]. COPD exacerbation is important because it accelerates the progression of other closely related diseases, and if prevented, it can improve the health-related quality of life, prolong life and reduce health care costs [4][5][6].
As noted in the 2006 update of the Global Initiative for Obstructive Lung Disease (GOLD) guidelines, the definition of COPD is a preventive and treatable disease [7]. However, it is difficult to find out whether an individual has COPD due to low hospital visit rates, lack of knowledge about symptoms and disease, and the complexity of the COPD diagnosis method [8]. According to a recent nationwide survey conducted in Taiwan, the incidence of COPD is expected to be about 6 percent, but less than half of those may have COPD have undergone a spirometry test which is one of the most noninvasive tests used to diagnose COPD [9]. Regarding COPD diagnostic and definition issues, the GOLD recommended that COPD management and treatment should be based not only on spirometric findings but also on scores, such as the Chronic Respiratory Questionnaire [10,11]. Another well-known scoring system is the COPD Assessment Test (CAT), a scoring method using eight preliminary symptoms associated with COPD [12]. Although it is a sophisticated scoring system on a multidimensional scale, there is the limitation whereby it is required that the subject has individual awareness of the preliminary symptoms of COPD used in scoring, which are difficult to understand alone. With these issues, several previous prediction models for COPD were based on a combination of information from medical history, clinical characteristics and laboratory biomarkers [13]. Likewise, it has limitations due to the problem of clinical application, because the used prediction method is unclear, not validated or limited due to the diverse variables used for prediction.
Herein, we conducted this study to develop a simple COPD score directly applicable to the general population who undergo health screenings, which was validated in an independent cohort. In addition, the association of the developed score with CVD, including coronary heart disease (CHD) and stroke, was evaluated using the Korean National Health and Nutrition Examination Survey (KNHANES) database.

Study Population
KNHANES, conducted by Korea Disease Control and Prevention Agency (KCDC), is the representative national cross-sectional surveillance system that provides data for the evaluation of nutritional status, health policy effectiveness and trends in the prevalence of health risk factors and major chronic diseases [14]. The sampling population of each survey year consists of approximately 10,000 people. The database consists of three components: a health interview, a health examination and a nutrition survey. The health interview is conducted during an interviewer's home visit. The health examinations and questionnaires include socioeconomic status, health behavior, quality of life, medical use, biochemical profile using fasting serum and urine, dental health, visual acuity, hearing, bone density measurements and X-ray test results. It also collects detailed information about food intake and eating habits. This study analyzed participants who underwent a health screening between 2017 and 2020 using the KNHANES database. Figure 1 depicts the flow diagram for the inclusion of the study population after exclusion of participants aged below 20 and with missing information. We selected 7037 individuals in 2017-2018 for the training set and 3674 individuals in 2019-2020 for the validation set, respectively. This study was conducted in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRI-POD) guidelines (Supplementary Table S1) [15]. The Institutional Review Board of CHA Bundang Hospital approved this study (No. 2022 04 041). Informed consents were waived because the database was provided for research purposes in an anonymized form under strict confidentiality guidelines.

Definition of Variables
COPD individuals were defined as aged 20 years or older with prebronchodilator forced expiratory volume in 1 s/forced vital capacity (FEV1/FVC < 0.7) [16]. Drinkers were defined as participants who drank at least once a week. Physical activity was defined as at least 150 min of moderate-intensity aerobic physical activity or 75 min of vigorous-intensity aerobic physical activity per week according to the modified Global Physical Activity Questionnaire [17]. CVD was defined as doctor-diagnosed CHD or stroke.

Statistical Analysis
All statistical analyses were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA). Continuous variables and categorical variables were presented as means (standard deviation (SD)) and numbers (%), respectively. The t-test was used for the continuous analyses, and the chi-squared test was used for the categorical analyses. Univariable and multivariable analyses were performed using the logistic regression model, which included odds ratios (ORs), 95% confidence intervals (CIs) and the concordance index (C-index). We used the purposeful selection used by Zhang et al. [18]. The variable selection with univariable analysis for the multivariable regression model was based on a significant level of p < 0.001. If more than one variable with a significant level of p < 0.001 was considered related, only variables with a higher C-index were included in the derivation after using the partial likelihood ratio test. For example, a multivariable model 1 with waist circumference (WC) and body mass index (BMI), and a multivariable model 2 with only WC are not significantly different in their fits for data. We chose model 2 for the principal of parsimony. After the univariable analyses, variables that required a hospital visit to be obtained were excluded for simplification and generalization. The following variables were selected for multivariable analyses: age (continuous; years), sex (categorical; men and women), WC (continuous; cm) and diastolic blood pressure (DBP; continuous; mmHg). The derived COPD score was validated in independent participants who underwent a health screening between 2019-2020. A receiver operating characteristic (ROC) curve with an area under the curve (AUC) value was performed using R version 4.1 (R Foundation for Statistical Computing, Vienna, Austria) and generated for the evaluation of sensitivity (Sens), specificity (Spec), positive predictive value (PV+) and negative predictive value (PV−). In addition, the validation cohort was stratified according to the interquartile range of the derived COPD score to determine the score-dependent ORs. Moreover, unadjusted ORs were calculated using logistic regression to confirm whether the derived score was informative in the stratification of individuals at higher risk of CVD, CHD and stroke.

Results
The numbers of participants with COPD and non-COPD at baseline were 949 and 6088, respectively (Table 1). Weight, triglyceride (TG), aspartate aminotransferase (AST), urinary glucose, urinary pH, smoking status and physical activity were not significantly different between COPD and non-COPD groups. Compared with non-COPD individuals, those with COPD were older men with higher systolic blood pressure (SBP), fasting serum glucose (FSG), blood urea nitrogen (BUN) and creatinine levels, but there was a lower proportion of smoking and physical activity. The descriptive characteristics of men and women are shown in Supplementary Table S2. Nineteen variables from demographic characteristics, measurement, habits, blood pressure and tests for diabetes mellitus, dyslipidemia, liver function, kidney function and urine were evaluated in the univariable analyses ( Table 2). The results indicate that the involved factors were generally and significantly reflective of COPD, except for weight (p = 0.508), TG (p = 0.911) and AST (p = 0.176). For covariate selection, the tests for diabetes mellitus, dyslipidemia, liver function, kidney function and urine were excluded in the simplification process, since these variables require a hospital visit. Subsequently, age, sex, BMI, WC, smoking, SBP and DBP remained potential candidates for the development of the COPD score. Between BMI and WC, WC had a higher C-index. As for SBP and DBP, DBP had a higher C-index. Finally, age, sex, WC and DBP were selected as components for the COPD score. In addition, the univariable analyses of variables among men and women are shown in Supplementary Table S3 and Supplementary Table S4, respectively.   According to the intercept and estimate values, the COPD score can be calculated as shown below. This is the equation for men: This is the equation for women: We then validated the derived score in the independent validation cohort, which revealed satisfactory accuracy (Figure 2). In Figure 2A, the AUC for the non-CVD validation cohort with no smoking covariate was 0.784, obtaining a Sens of 67.7%, a Spec of 76.5%, a PV+ of 6.2% and a PV− of 69.1%. In Figure 2B, the AUC for the non-CVD validation cohort with smoking covariate was 0.798, obtaining a Sens of 64.3%, a Spec of 81.9%, a PV+ of 6.4% and a PV− of 64.5%. In Figure 2C, the AUC for the validation cohort including CVD with no smoking covariate was 0.782, obtaining a Sens of 67.6%, a Spec of 76.2%, a PV+ of 6.3% and a PV− of 68.9%. In Figure 2D, the AUC for the validation cohort including CVD with smoking covariate was 0.795, obtaining a Sens of 70.2%, a Spec of 75.9%, a PV+ of 5.9% and a PV− of 68.4%.

Discussion
We conducted this study to develop a self-diagnosis tool for COPD by simplification of the multivariable logistic model that exempted variables requiring hospital visits. The KNHANES COPD score consists of age, sex, WC and DBP, all of which are self-evaluable. The prediction score was developed with a C-index of 0.802 and validated with an AUC of 0.784 for the validation cohort including CVD. In addition, we developed a gender-stratified multivariable logistic model, a separate model due to the difference in COPD incidence and smoking rates between men and women. Through univariable analyses, it was confirmed that smoking was significantly different in the male group, and models when smoking was added and when smoking was not added are here presented. The developed COPD score was in direct proportion to the OR for COPD. Moreover, significant estimates were made for CVD risks using the COPD score. The effect of the developed COPD prediction score on CVD occurrence prediction was confirmed, and COPD prediction was possible with only the results of a general examination report. This suggests that there is potential to predict overall health, such as CVD and CHD.
COPD is a disease that exhibits airflow restriction due to airway resistance induced by the destruction of alveolar attachments as a result of the destruction of the pulmonary system and emphysema [19]. These pathological changes are caused by chronic inflammation of the periphery of the lungs; obstruction of the small airways before the occurrence of emphysema occurs first and gradually increases as the disease progresses [20,21]. The COPD severity stages are classified into five stages based on the FEV1-specific cut-point according to the GOLD guidelines, and as the stage increases, the inflammatory response is amplified through innate immune inflammatory response, airway remodeling and adaptive immune response [10,20]. When harmful inhalants, such as smoking, enter the airways, the first reaction is innate immunity. When harmful factors invade the airway, damage to lung epithelial cells is caused by recognition of TLR4 or TLR2 through innate immunity and the activation of NFkB, and epithelial cells produce and secrete many inflammatory mediators [22]. These inflammatory agents activate alveolar macrophages and neutrophils, and when proteases are released from them, they cause lung damage along with reactive oxygen specifications [23]. In addition, damage to epithelial cells, vascular endothelial cells and extracellular matrix, which are necrotized or self-destructed by lung damage, leads to many autoantigens, which the adaptive immune system recognizes as external antigens and causes an immune response [24]. Chronic immune-inflammatory reactions in these repeatedly damaged lung tissues lead to tissue repair and airway remodeling, which ultimately leads to airflow limitations such as pulmonary fibrosis [25]. Airflow restrictions can be seen as decreased FEV1/FVC and increased airway resistance and lung compliance [26].
COPD can increase the risk of other comorbidities. Severe airway obstruction has been reported to show a higher correlation [2], and it is claimed that inflammation in the lungs overflows into a systemic pattern [27]. In particular, a typical symptom of airway obstruction is gas exchange disorders such as pulmonary perfusion-ventilation imbalance, which causes hypoxemia, and in patients with actual emphysema, increased pulmonary vascular resistance and decreased alveolar area are also commonly observed [28]. Increased pulmonary vascular resistance in patients with COPD has been reported to play an important role in the development of pulmonary hypertension, one of the CVD risk factors [29], which increases the production and secretion of endothelin, a vasoconstrictor [30], and causes the smoothness of the channel [31]. Similarly, COPD is known to share risk factors such as coronary artery disease and age in old age, including smoking and lack of exercise [32]. Specifically, COPD patients are known to contribute to arteriovenous wall hardening due to decreased nitric oxide production of vascular endothelial cells and are considered risk factors for systemic hypertension and CVD [33].
The strength of this study is that it is the first study to develop and verify a COPD score system using only basic variables that do not require hospital visits. There are previous studies of various approaches using multivariable models or machine learning models for outcome exacerbation prediction. Of the 25 studies using the regression model or cox regression model, most of the models had overall lower performance than this study, and higher performance models also used predictors such as IgG titter and Gold stage, which are difficult to understand alone [13,34]. Clinical-data-based machine learning prediction models showed superior performance over an AUC of 0.80, but the variables used for prediction used 100-300 clinical features [35]. A machine learning model using data containing self-reports similar to this study also showed good performance, but the cat score, which is difficult to know immediately, was used [36]. Unlike previous studies, including systematic reviews related to COPD prediction using machine learning and statistical models, a practical self-diagnosis prediction model was here developed through the simplification of variables used for prediction and external verification. In addition, it showed good predictive performance for the diagnosis of COPD and the estimation of CVD risks. Taken together, the derived COPD score may support public health issues regarding socioeconomic costs and the application of non-face-to-face diagnosis for COPD [37].
This study includes several limitations that need to be considered. First, since this study is a cross-sectional and questionnaire-based study, it is difficult to identify causal relationships. Second, the study population consisted of the Korean population only. It is worth evaluating our model's generalizability to other health care systems in other regions. Third, the golden criterion for detecting COPD is based on post-bronchodilator spirometry, and other diseases related to airflow disorders, such as bronchodilation and tuberculosis destruction, are likely to be included because FEV1/FVC <0.7 was used without utilizing chest radiography [7]. Finally, this model does not include biomarkers, such as blood eosinophils and fibrinogen, which are known to predict deterioration or hospitalization due to COPD, which should be the focus of future research.

Conclusions
In conclusion, the KNHANES COPD score, composed of age, sex, WC and DBP, satisfactorily predicted COPD. The developed score may be supportive in the stratification of individuals at high risk of COPD who require further screening for pulmonary diseases. Future studies comparing the diagnostic accuracy of the derived score with standard diagnostic tests are necessary to validate its accuracy and cost-effectiveness.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/jcm11113181/s1, Table S1: TRIPOD Checklist: Prediction Model Development and Validation, Table S2: Descriptive characteristics of the participants by sex, Table S3: Univariable analyses of variables involved in the health examination for COPD in male participants, Table S4: Univariable analyses of variables involved in the health examination for COPD in female participants, Table S5: Multivariable model for prediction of COPD with further addition of smoking as a component, Table S6: Multivariable model for prediction of COPD in male participants, Table S7: Multivariable model for prediction of COPD in female participants. Informed Consent Statement: Informed consent was waived because anonymously managed data were used at every step. Data Availability Statement: All data used in the present study were obtained from KNHANES, conducted by KCDC, which is open to members of the public and is accessible at https://knhanes. kdca.go.kr/knhanes/sub03/sub03_01.do (accessed on 10 April 2022).