1. Introduction
Hypertension is a major disease burden worldwide and an important risk factor for cardiovascular disease (CVD), chronic kidney disease (CKD) and death [
1,
2,
3,
4,
5,
6,
7], and prehypertension is also a risk factor that can lead to CVD, stroke and CKD [
8,
9,
10,
11]. Hypertension is highly associated with poor diet, low physical activity and excessive alcohol consumption [
7,
12]. The treatment of hypertension reduces the risk of stroke, coronary artery disease, and congestive heart failure [
1].
Methods for managing hypertension include antihypertensive therapy and exercise therapy. Antihypertensive therapy reduces the mortality rates associated with stroke, myocardial infarction, CVD and heart failure [
13,
14,
15,
16,
17], and exercise therapy is effective for lowering systolic blood pressure (SBP), diastolic blood pressure (DBP), and the levels of total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), triglyceride (TG) and high-density lipoprotein cholesterol (HDL-C) [
18,
19,
20,
21,
22].
The obesity indices have been mainly used in association studies with hypertension. For example, waist circumference (WC) is a risk factor for hypertension in African populations, Caribbean populations, Brazilian women, and Filipina women [
23,
24,
25,
26]. The body mass index (BMI) is a risk factor for hypertension in China, the Philippines, the United States (US), Australian women and India [
27,
28,
29]. The WC ratio (WHR) is associated with hypertension in Chinese women in Hong Kong and in Australian men [
29,
30], and the waist-to-height ratio (WHTR) is the best indicator of hypertension in Chinese men in Hong Kong [
30], a Taiwanese population [
31] and a Korean population [
32]. WC, BMI, and WHR are also associated with prehypertension in a Taiwanese population [
33].
Several published studies have investigated blood parameters and hypertension. For example, compared with normotensive individuals, hypertensive patients show higher levels of fasting plasma glucose, serum high-sensitivity C-reactive protein (hs-CRP), TG, TC, LDL-C, uric acid (UA), white blood cells (WBCs), red blood cells (RBCs), hemoglobin (HGB), hematocrit (HCT) and mean corpuscular hemoglobin and lower serum HDL-C, mean corpuscular volume and RBC distribution width [
34,
35]. A higher glycated hemoglobin (HbA1c) level is correlated with a higher prevalence of hypertension [
36], and inadequate hypertension treatment elevates serum creatinine level [
37]. Markers of inflammation (CRP, WBC, amyloid-a, and homocysteine) are present at high levels in men and women with prehypertension [
38].
In previous studies of spirometry and hypertension, the forced vital capacity (FVC) was identified as a negative predictor of hypertension, and lower FVC values were found to be a risk factor for future hypertension [
39,
40]. In studies of Beijing and Guangzhou populations, the FVC and forced expiratory volume in 1 s (FEV1) were found to be inversely proportional to SBP and DBP in women in both populations and in men in Beijing. A follow-up study conducted 2 or 4 years later showed a low incidence of hypertension with low lung function, but this effect was found only in Guangzhou women [
41]. In Swedish men aged 55–68 years, BP increased with decreasing FVC. A lower FEV1 was correlated with higher SBP and DBP [
42].
Previous studies have reported the associations of anthropometry, blood parameters, and spirometry with hypertension or prehypertension, but no studies have yet described the relationships between each of prehypertension and hypertension and anthropometric indices, blood parameters, and spirometric indices. The purposes of this study are to analyze risk factors of hypertension and prehypertension and to present a machine-learning-based prediction model to reduce the risks of diseases (CVD, CKD, stroke) caused by hypertension and to prevent diseases. First, we present risk factors for hypertension and prehypertension with statistical significance using demographic indices, anthropometric indices, blood parameters, and spirometric indices. Second, we develop predictive models of hypertension and prehypertension based on machine learning using correlation-based feature selection (CFS) and wrapper-based feature selection (WFS) methods and logistic regression (LR), naïve Bayes (NB), and decision tree (DT) prediction algorithms. Last, we propose the best hypertension and prehypertension prediction model through the performance evaluation between the developed prediction models. To the best of our knowledge, this study provides the first demonstration of the associations of prehypertension and hypertension with obesity indices, blood parameters, and spirometric indices in a Korean population. The findings of the present study provide basic information for the treatment and prevention of prehypertension and hypertension.
3. Results
The normal BP group included 1068 (13%) men and 1967 (24%) women, whereas the prehypertension group included 983 (12%) men and 1019 (12.4%) women, and the hypertension group included 1586 (19.3%) men and 1589 (19.3%) women. In the statistical analyses,
p-values, odds ratios (ORs), and 95% confidence intervals (CIs) for each feature were obtained using binary LR.
Table 2 and
Table 3 show the significance of differences in the studied variables between normotension and prehypertension or hypertension after adjustment for age in men and women.
3.1. Statistical Analysis of Prehypertension
In men, BMI (OR = 1.429, 95% CI = 1.304–1.567) was identified as the best indicator of prehypertension. WT (OR = 1.365, 95% CI = 1.242–1.500) and WHTR (OR = 1.335, 95% CI = 1.219–1.462) were also significantly associated with prehypertension. Among the blood parameters, hemoglobin (HGB; OR = 1.323, 95% CI = 1.204–1.453) exhibited an association with prehypertension in men. No spirometric indices were significant. In women, BMI (OR = 1.428, 95% CI = 1.322–1.542) was found to be the best indicator, and WHTR (OR = 1.425, 95% CI = 1.313–1.546) and WC (OR = 1.386, 95% CI = 1.282–1.498) were also significantly associated with prehypertension. Among the blood parameters, glucose (GLU) (OR = 1.290, 95% CI = 1.180–1.410) was associated with prehypertension in women, and among the spirometric indices, the FVCP (OR = 0.814, 95% CI = 0.755–0.878) exhibited an association with prehypertension in women.
3.2. Statistical Analysis of Hypertension
In men, BMI (OR = 1.993, 95% CI = 1.818–2.186) was found to be the best indicator of hypertension, and WHTR (OR = 1.903, 95% CI = 1.735–2.087) and WC (OR = 1.790, 95% CI = 1.638–1.956) were also significantly associated with hypertension. Among the blood parameters, TG (OR = 1.434, 95% CI = 1.304–1.577) exhibited a significant association with hypertension in men, and among the spirometric indices, FVCP (OR = 0.791, 95% CI = 0.728–0.859) showed a significant association with hypertension in men. In women, WHTR (OR = 2.071, 95% CI = 1.884–2.276) was identified as the best indicator of hypertension, and BMI (OR = 2.034, 95% CI = 1.861–2.222) and WC (OR = 1.927, 95% CI = 1.764–2.105) were also significantly associated with hypertension. Among the blood parameters, GLU (OR = 1.676, 95% CI = 1.508–1.861) exhibited a significant association with hypertension in women. Among the spirometric indices, the FVCP (OR = 0.682, 95% CI = 0.629–0.739) was found to be significantly associated with hypertension in women.
3.3. Performance Evaluation of the Prehypertension Prediction Model Combined with Feature Selection
We developed prediction models for prehypertension and hypertension using feature selection methods and classification algorithms. Features were selected using the CFS and WFS methods, and the predictive models were developed by applying the LR, NB, and DT algorithms with the selected features. The AUC was used to evaluate the performance of each prediction model.
The analysis of the prehypertension prediction model revealed that the WFS-LR model with AGE, BMI, GLU, TC, HDL-C, TG, aspartate aminotransferase (AST), HGB and blood urea nitrogen (BUN) showed the best predictive power (AUC = 0.635) for men, with a sensitivity of 0.52 and a1-specificity of 0.338. In contrast, the CFS-DT model showed the lowest predictive power (AUC = 0.559). For women, the WFS-LR model with AGE, WHTR, BMI, GLU, TC, TG, WBC, RBC and FVCP and peak expiratory flow (PEF) showed the best predictive power (AUC = 0.700), with a sensitivity of 0.308 and a 1-specificity of 0.11, whereas the CFS-DT model exhibited the lowest predictive power (AUC = 0.622).
The predictive performance of the prehypertension prediction model is compared and shown in
Figure 2. The analyses of the prehypertension prediction model for men showed that the AUCs of LR, NB, and DT based on CFS were 0.610, 0.602 and 0.559, respectively, and that the AUCs of LR, NB, and DT based on WFS were 0.635, 0.626, and 0.580, respectively. In contrast, the AUCs of the prehypertension prediction model for women generated using LR, NB, and DT based on CFS were 0.698, 0.691 and 0.622, respectively, and the AUCs of LR, NB, and DT based on WFS were 0.700, 0.699 and 0.646, respectively.
3.4. Performance Evaluation of the Hypertension Prediction Models Combined with Feature Selection
Analyses of the hypertension prediction models revealed that the WFS-LR model with WHTR, BMI, GLU, TC, HDL-C, TG, AST, BUN, creatinine (CRT), WBC, FVC, the FEV1 to FVC ratio (FEV1FVC), FEV in 6 s (FEV6) and PEF showed the best predictive power (AUC = 0.777) for men. The sensitivity and 1-specificity of this model were 0.813 and 0.401, respectively. The CFS-DT model showed the lowest predictability (AUC = 0.666). For women, the WFS-LR model with AGE, WC, BMI, GLU, TC, HDL-C, TG, alanine aminotransferase (ALT), CRT, WBC, RBC, FVC, FEV6 and forced expiratory flow 25–75% (FEF 25–75) showed the best predictive power (AUC = 0.845), with a sensitivity of 0.724 and a 1-specificity of 0.191, and the CFS-DT model exhibited the lowest predictive power (AUC = 0.796).
The predictive performance of the hypertension prediction model is compared and shown in
Figure 3. The analysis of the hypertension prediction model for men showed that the AUCs of LR, NB, and DT based on CFS were 0.749, 0.732 and 0.666, respectively, and that the AUCs of LR, NB, and DT based on WFS were 0.777, 0.748 and 0.698, respectively. In contrast, the AUCs of the hypertension prediction model for women generated through LR, NB, and DT based on CFS were 0.843, 0.819 and 0.761, respectively, and the AUCs of LR, NB, and DT based on WFS were 0.845, 0.833 and 0.796, respectively.
The features and performance results of the prehypertension and hypertension prediction models are summarized in
Table 4. In men, the WFS-LR showed satisfactory performance (AUC = 0.635) in the prehypertension prediction model and the best performance (AUC = 0.777) in the hypertension prediction model. In contrast, in women, the WFS-LR showed satisfactory performance (AUC = 0.700) in the prehypertension prediction model and the best performance (AUC = 0.845) in the hypertension prediction model. Among the classification methods, LR exhibited higher prediction performance than did NB and DT. The hypertension prediction model performed better than the prehypertension prediction model and showed better performance in women than in men.
4. Discussion
In this study, anthropometric indices, blood parameters, and spirometric indices were examined to identify risk factors for prehypertension and hypertension. The features for the prehypertension and hypertension prediction models were selected using the CFS and WFS methods. Prediction models were then developed using the LR, NB, and DT classification algorithms.
In a previous study, Ko and colleagues analyzed the associations of BMI, WHR, WC, and WHTR with hypertension in a Chinese population in Hong Kong and found that WHTR was the strongest indicator in men (OR = 1.18, 95% CI = 1.14–1.23) whereas WHR was the strongest indicator in women (OR = 1.26, 95% CI = 1.18–1.35) [
30]. Lee and colleagues demonstrated that among obesity factors, WC, WHR, and WHTR, were more predictive of hypertension than was BMI and that WHTR was the best obesity-related predictor of hypertension, regardless of gender, ethnicity and age [hazard ratio (HR) = 1.49, 95% CI = 1.35–1.65 in men and HR = 1.48, 95% CI = 1.33–1.64 in women] in middle-aged Korean adults [
32]. Chang and colleagues found that BMI in men (OR = 2.07, 95% CI = 1.44–2.99) and abdominal obesity in women (OR = 2.04, 95% CI = 1.54–2.71) were associated with an increased risk of prehypertension [
45]. Grievink and colleagues evaluated BMI, WC, and WHR as predictors of hypertension in a Caribbean population and identified WC (OR = 1.7, 95% CI = 1.4–2.0) as the best independent predictor of hypertension [
24]. Tsai and colleagues reported that WHR, BMI, and WC were associated with prehypertension, particularly high BMI in men (OR = 1.106, 95% CI = 1.051) and high WC in women (OR = 1.031, 95% CI = 1.012–1.051) [
33]. In this study, BMI was identified as the best predictor of prehypertension in men (OR = 1.429, 95% CI = 1.303–1.567) and women (OR = 1.427, 95% CI = 1.321–1.542). The risk factors that best predicted hypertension were BMI in men (OR = 1.993, 95% CI = 1.817–2.185) and WHTR in women (OR = 2.071, 95% CI = 1.884–2.276). Our findings are consistent with those of previous studies [
27,
28,
29] and indicate that BMI is the best indicator of hypertension in men and of prehypertension in men and women.
Several studies of blood parameters and hypertension have been conducted. Cirillo and colleagues reported that hematocrit level was positively correlated with SBP and DBP in men and women [
34]. In addition, Emamian and colleagues performed a multivariate LR analysis of demographic, biochemical, and hematological parameters and found that hematocrit (OR = 1.02, 95% CI = 1.003–1.04) was an independent predictor of hypertension [
35]. Daniel and colleagues demonstrated that high HbA1c levels were associated with increased hypertension rate and that the rate of CVD (OR = 1.39, 95% CI = 1.06–1.83) increased by 1% with each increase in HbA1c level [
36]. Christina and colleagues showed that men and women with prehypertension presented 31% higher CRP, 32% higher tumor necrosis factor-a, 9% higher amyloid-a, 6% higher homocysteine, and 10% higher WBC levels [
38]. In this study, the best predictor of prehypertension was found to be HBG (OR = 1.322, 95% CI = 1.204–1.452) in men and GLU (OR = 1.289, 95% CI = 1.180–1.410) in women. HCT (OR = 1.262, 95% CI = 1.151–1.383) and TG (OR = 1.259, 95% CI = 1.162–1.365) were also highly associated with prehypertension in men and women, respectively. The best predictor of hypertension was TG (OR = 1.434, 95% CI = 1.304–1.576) in men and GLU (OR = 1.675, 95% CI = 1.508–1.861) in women. GLU (OR = 1.363, 95% CI = 1.247–1.489) and HbA1c (OR = 1.539, 95% CI = 1.393–1.700) were also highly associated with hypertension in men and women, respectively. Our findings are consistent with those of previous studies [
36] and indicate that the HbA1c index is significantly associated with hypertension in women.
Through a study of hypertension and spirometry, Sarah and colleagues demonstrated that FVC was significantly associated with hypertension and a negative predictor [
39]. Follow-up studies showed that hypertension could develop in the future, and an OR of approximately 0.7 was found in an LR analysis [
39]. Jacobs and colleagues performed an HR analysis and found that the risk of hypertension (HR from 1 to 2.21) increased by more than 2-fold with decreasing FVC and that a low FVC might result in cardiovascular morbidity and mortality [
40]. In this study, FVCP was identified as the best predictor of prehypertension and hypertension. Low FVCP indices were associated with prehypertension in women (OR = 0.814, 95% CI = 0.755–0.877), hypertension in men (OR = 0.791, 95% CI = 0.728–0.859) and hypertension in women (OR = 0.681, 95% CI = 0.629–0.739). FVC predictors were also significantly associated with hypertension, but the associations were slightly less significant than the association of FVCP with hypertension. Our findings are consistent with those of previous studies [
39,
40] and indicate that the FVC index is significantly associated with prehypertension in women and hypertension in men and women.
Prior to the present study, several researchers have proposed hypertension prediction model based on data mining techniques [
50,
51,
52,
53]. For instance, Tayefi and colleagues proposed a hypertension prediction model based on DTs in the Iranian population. The hypertension DT model suggested that demographics and selected biochemical markers (such as age, BMI, fasting blood GLU, TG, UA, hs-CRP, TC and LDL-C) have higher predictive power than other biochemical markers [
50]. Ture and colleagues compared the performance of DTs, statistical algorithms, and neural networks using features such as age, sex, family history, smoking habits, lipoprotein, TG and UA and found that the neural network algorithm had the best predictive power for hypertension [
51]. The evaluation of the performance of DT, NB, and LR performed in this study identified LR as the best classification algorithm. The model combining the demographic index, blood parameters and spirometric indices showed the best predictive power. Among the prediction models of prehypertension and hypertension, the WFS-LR prediction models were identified as the best for both men and women. A hypertension prediction model was then developed by combining the obesity index, blood parameters, and spirometric indices, whereas previous studies [
50,
51] used demographic characteristics, BMI, and blood parameters.
This study has several limitations. First, it is difficult to identify cause-and-effect relationships because we used data from a cross-sectional survey. Second, the most significant risk factor for hypertension was the obesity index, but hip circumference was not measured and could not be compared to WHR. Third, we did not have information on disease (diabetes, dyslipidemia, and hyperlipidemia) secondary to hypertension, so we did not consider it in this study. Finally, in this study, the predictive model was designed considering only anthropometric indices, blood parameters, and spirometric indices, and indicators such as smoking, drinking, and physical activity were excluded.