Identiﬁcation of Metabolic Syndrome Based on Anthropometric, Blood and Spirometric Risk Factors Using Machine Learning

: Metabolic syndrome (MS) is an aggregation of coexisting conditions that can indicate an individual’s high risk of major diseases, including cardiovascular disease, stroke, cancer, and type 2 diabetes. We conducted a cross-sectional survey to evaluate potential risk factor indicators by identifying relationships between MS and anthropometric and spirometric factors along with blood parameters among Korean adults. A total of 13,978 subjects were enrolled from the Korea National Health and Nutrition Examination Survey. Statistical analysis was performed using a complex sampling design to represent the entire Korean population. We conducted binary logistic regression analysis to evaluate and compare potential associations of all included factors. We constructed prediction models based on Naïve Bayes and logistic regression algorithms. The performance evaluation of the prediction model improved the accuracy with area under the curve (AUC) and calibration curve. Among all factors, triglyceride exhibited a strong association with MS in both men (odds ratio (OR) = 2.711, 95% conﬁdence interval (CI) [2.328–3.158]) and women (OR = 3.515 [3.042–4.062]). Regarding anthropometric factors, the waist-to-height ratio demonstrated a strong association in men (OR = 1.511 [1.311–1.742]), whereas waist circumference was the strongest indicator in women (OR = 2.847 [2.447–3.313]). Forced expiratory volume in 6s and forced expiratory ﬂow 25–75% strongly associated with MS in both men (OR = 0.822 [0.749–0.903]) and women (OR = 1.150 [1.060–1.246]). Wrapper-based logistic regression prediction model showed the highest predictive power in both men and women (AUC = 0.868 and 0.932, respectively). Our ﬁndings revealed that several factors were associated with MS and suggested the potential of employing machine learning models to support the diagnosis of MS.


Introduction
Metabolic syndrome (MS) is a collection of at least three of the five risk factors that increase health problems (e.g., cardiovascular disease [CVD], stroke, cancer, and type 2 diabetes [T2D]) [1][2][3][4].In the presence of MS, the risk of CVD is more than twofold, and the risk of T2D increases more than tenfold [5].Thus, MS is a major cause of death and a high-risk disease in many people.Favorably, identifying the conditions and associated risk factors of comorbid severe illnesses is easy and inexpensive.Further, these routine checkups are more likely to keep the perturbed physiology in check as compared to the efforts necessary for overcoming a severe disease [6].The incidence of MS is increasing worldwide.In the United States, it increases by 1-2% every year [7].The incidence is also steadily increasing in Asia-Pacific countries, such as China, Korea, and Taiwan [8].MS is related to several factors, and lung disease, in particular, is related to various diseases that can cause MS [9][10][11].
Numerous studies have supported the association between anthropometric factors and MS [12][13][14][15][16][17].For instance, forearm circumference and bioelectric-impedance-measured visceral fat are associated with MS.However, waist circumference (WC) is not.[12].Some studies have shown several anthropometric factors to be predictors of components of MS.As a result, no single index was consistently the strongest predictor [13].Waist-to-hip ratio (WHtR) is the most commonly associated risk factor for MS in Japan, whereas WC and body mass index (BMI) had the strongest associations with MS among other ethnicities [14,15].Furthermore, WHtR was more strongly associated with MS than WC and BMI, in the identification of metabolic risk factors [16].
The associations of blood parameters and spirometric factors with MS have also been examined [18][19][20][21][22][23][24], wherein it was established that dietary patterns were associated with glucose (GLU) intolerance and MS [18].Additionally, abnormal white blood cell (WBC) count was a vascular risk factor for MS [19].Further, the incidence of MS was higher in subjects with chronic obstructive lung disease than in those with normal lung function, as determined by the Global Initiative for Chronic Obstructive Lung Disease (GOLD II-IV) guidelines.WC and blood pressure (BP) were also associated with the disease [20].Low pulmonary function was associated with restrictive lung disease and MS risk factors [22].Some studies have investigated the association between chronic obstructive pulmonary disease (COPD) and MS in men and women [23].Lin et al. [24] studied the association between restrictive lung impairment and an increased risk of MS.
In medicine and genomics, numerous studies are conducted using machine learning models [25][26][27][28][29][30][31][32][33][34][35][36][37].However, research that compares and analyzes anthropometric factors, blood parameters, urinary parameters, and spirometric factor anomalies for diagnosis of MS is lacking.Such evaluation is critical because in Asian countries, particularly, MS has become a concern due to rapid and constant changes in diet and lifestyle.Moreover, the study of MS has progressed through the use of machine learning [38].
We analyzed the association between various indicators and MS.Further, we aimed to compare and evaluate data sets for machine learning models to provide an algorithm for predicting MS.Measurements of variables are presented with their respective p-values and odds ratios (ORs), calculated using binary logistic regression analysis to identify the factors associated with MS.For evaluation of the machine learning models, Naïve Bayes (NB) and logistic regression algorithm predictive models were created for estimating predictive power.The results of this study provide basic knowledge on the association between MS and anthropometric and spirometric factors that can be used to predict and facilitate the prevention and management of MS.Moreover, the application of machine learning in medicine is expected to support the diagnosis of MS.

Subjects and Dataset
We obtained the data for this study from the Korea National Health MS was defined by the presence of three or more of the five conditions mentioned below [42].The conditions are as follows (Table 1): We conducted an experimental study in three stages.First, we performed data integration based on the measured data (spirometric factors) from 2010 to 2015.Second, we extracted data according to the participants' demographic and clinical characteristics, including age, sex, alcohol consumption, smoking, income, anthropometric factors, blood parameters, and spirometric factors.MS was defined relative to five conditions, and we performed data translation and clarification (addressing missing, uncalculated, and unconverted values).
Finally, we performed data standardization for comparison and analysis.In the statistical analysis, we implemented three methods to analyze the associations of various indices with MS.A t-test was conducted to evaluate significant differences between men and women.We conducted a binary logistic regression analysis to identify significant associations with the measured variables (anthropometric and spirometric factors and blood parameters) and determined differences between normal subjects and those with MS.We used an area under the curve (AUC) from the receiver operating characteristic (ROC) curve to assess whether significant improvement in the identification of MS was achieved based on anthropometric factors, blood parameters, and spirometric factors.Since no method exists to calculate the AUC in a complex sampling analysis, we analyzed the recognized performance by the general AUC analysis.Finally, we constructed a prediction model using machine learning algorithms based on the feature selection method for predictive power by sensitivity, 1-specificity, precision, and AUC [43]. Figure 1 shows the design of this study for preprocessing and statistical analysis.We focused on participants aged ≥ 40 years in this study because the prevalence of MS increases rapidly in this age group [44].This yielded a sample size of 26,499 subjects.We excluded subjects with the following measurements: demographics (n = 3409), anthropometric factors (n = 85), blood parameters (n = 4930), uric test results (n = 394), and spirometric factors (n = 3703).Thus, the study comprised 13,978 subjects, and the final dataset consisted of 9893 normal subjects (4489 men and 5404 We focused on participants aged ≥ 40 years in this study because the prevalence of MS increases rapidly in this age group [44].This yielded a sample size of 26,499 subjects.We excluded subjects with the following measurements: demographics (n = 3409), anthropometric factors (n = 85), blood parameters (n = 4930), uric test results (n = 394), and spirometric factors (n = 3703).Thus, the study comprised 13,978 subjects, and the final dataset consisted of 9893 normal subjects (4489 men and 5404 women) and 4085 subjects with MS (1749 men and 2340 women).Figure 2 presents a flow chart showing details of the sample selection procedure.
Table 2 shows the demographic and clinical characteristics of each group and detailed descriptions of all experimental factors in the t-test.* p < 0.05 and † p < 0.0001 indicate significant differences between men and women using the two-stage sample t-test.All data are presented as means ± standard deviations (±SD).However, the subjects are presented as the number of participants (%).

Measurements
All factors were measured according to established guidelines, as described previously [40,41].Height measurements were quantified to the nearest 1 mm using a Seca 225 portable stadiometer (Seca, Hamburg, Germany).Weight measurements were quantified to the nearest 0.1 kg using an electronic scale (GL-6000-20; Caskorea, Seoul, Korea).WC measurements (between the sternum and the hips) were quantified to the nearest 1 mm using Seca 200.BMI was calculated as weight/height 2 .The WHtR (a new indicator in this study) was calculated as WC/height.Blood parameters were measured using Hitachi Automatic Analyzer 7600-210 (Hitachi, Tokyo, Japan) with Pureauto SCHO-N (Sekisui, Tokyo, Japan) and S TG-N, S AST, S ALT, S GLU, and Cholestest N HDL, LDL (Sekisui, Tokyo, Japan).Glycated hemoglobin was measured using HLC-723G7 (Tosoh, Tokyo, Japan).White and red blood cell counts were measured using XE-2100D (Sysmex, Tokyo, Japan).Spirometric factors were measured using the Vmax series 2130 (SensorMedics, Yorba Linda, CA, USA).Spirometric factors measured included forced vital capacity, forced expiratory volume in 1s, ratio of forced expiratory volume in 1 s to forced vital capacity, forced expiratory volume in 6s, forced expiratory flow 25-75% (FEF25-75), and peak expiratory flow by pulmonary function test.We focused on participants aged ≥ 40 years in this study because the prevalence of MS increases rapidly in this age group [44].This yielded a sample size of 26,499 subjects.We excluded subjects with the following measurements: demographics (n = 3409), anthropometric factors (n = 85), blood parameters (n = 4930), uric test results (n = 394), and spirometric factors (n = 3703).Thus, the study comprised 13,978 subjects, and the final dataset consisted of 9893 normal subjects (4489 men and 5404 women) and 4085 subjects with MS (1749 men and 2340 women).Figure 2   Table 2 shows the demographic and clinical characteristics of each group and detailed descriptions of all experimental factors in the t-test.

Statistical Analysis
We used a complex sampling design to examine associations between MS and related factors.Stratified two-stage sampling was performed with the primary sampling unit (PSU), cluster, and weight values that are taken into consideration.The results could be biased due to the estimated factors, variance in the means, and prevalence rate from simple random sampling.Thus, we performed a complex sampling data analysis that utilized weighting, as described previously [40,41,45].
Statistical analyses were performed using SPSS 22 software for Windows (SPSS, Inc., Armonk, NY, USA).Binary logistic regression was used to predict or classify machine learning and data mining fields, but this algorithm is primarily used to analyze and examine the associations between diseases and various factors in medicine, public health, and epidemiology studies.It can provide p-values, OR values, and confidence intervals (CIs) for association analyses.Thus, it was suitable for this algorithm and both crude and adjusted analyses of our data and objectives.In the adjusted analyses, we used age, BMI, frequency of alcohol consumption, smoking, income, recognized stress rate, and education level as the value adjustment factors.We analyzed and evaluated the association between normal subjects and those with MS.We used the data mining tool of the Waikato Environment for Knowledge Analysis (WEKA), constructed the prediction models, and evaluated the prediction performance.We used 10-fold cross validation to efficiently distribute training, test data sets and evaluate prediction models [46].
In the overall crude analysis, TG presented the strongest association with MS in men, whereas GLU exhibited the strongest association in women.WHtR was associated with both men and women, but WC was associated with adjusted analysis.FEV6 was also associated with both men and women.In the adjusted analysis, FEV6 and FEF25-75 associated with MS.These results suggested sex-based differences in blood parameters; however, other risk factors showed similar results in both sexes.
From the AUC analysis, Figures 3 and 4 show the predictive power of anthropometric and spirometric factors, blood and urinary parameters for MS in Korean men and women.Among all factors, TG (AUC = 0.787 [0.775-0.799])showed strong predictive power in men, whereas WHtR (AUC = 0.813 [0.802-0.823])demonstrated the highest AUC value in women.GUL and HBA1C exhibited strong predictive powers among the blood parameters in men (AUC = 0.776 [0.764-0.789]and AUC = 0.712 [0.697-0.726],respectively) and women (AUC = 0.810 [0.799-0.821]and AUC = 0.776 [0.765-0.788],respectively).Among the spirometric factors, FEV6 and FVC showed strongest negative predictive power for men (AUC = 0.413 [0.398-0.429]and AUC = 0.423 [0.407-0.438],respectively) and women (AUC = 0.460 [0.447-0.474]and AUC = 0.465 [0.452-0.479],respectively).Each prediction model was constructed using NB and logistic regression algorithms by a subset of features using the wrapper and filter feature selection method for performance evaluation of the MS prediction model for both men and women.Table 5 shows detailed results of feature selection by the wrapper and filter method.The model built by Wrapper NB in men included 13 features: age, WT, WC, SBP, GLU, HDL, TG, ALT, WBC, USG, UCREA, FVC, and FEV6.Our results revealed a higher predictive power in women than in men for NB and logistic regression predictions.The performance of models was compared and evaluated based on sensitivity, 1-specificity, F-measure, AUC, area under the precision-recall curve (AUPRC), and root mean square error (RMSE) [47].As shown in Table 6, for men, the sensitivity, 1-specificity, and F-measure of the prediction model (Wrapper LR [Logistic Regression]) were 0.926, 0.449, and 0.882 in the normal group and 0.551, 0.074, and 0.633 in the MS group, respectively.Among women, the sensitivity, 1-specificity, and F-measure were 0.930, 0.261, and 0.911 in the normal group and 0.739, 0.070, and 0.778 in the MS group, respectively.Figure 6 shows the calibration curve for each prediction model.The x-axis of the calibration plot is a predicted class, and the y-axis is plotted as the true class.In men, Wrapper_LR suggests a better calibration plot than Wrapper_NB.In women, Wrapper_LR suggests a better calibration plot than Medeniyet University Goztepe Training and Research Hospital in Turkey.The purpose of their study was to evaluate associations of WC, hip circumference, WHtR, waist-to-hip ratio (WHR), mid-upper arm circumference, forearm circumference, calf circumference, and body composition with MS.The OR of visceral fat, hip circumference, forearm circumference, and WHR were 2.19 [95% CI, 1.30-3.71],1.89 [1.07-3.35],2.47 [1.24-4.95],and 2.11 [1.26-3.53],respectively.WC was not related to MS.However, forearm circumference and bioelectric-impedance-measured visceral fat were associated with the disease.
Mooney et al. [13] determined that other anthropometric factors, such as WC, WHtR, percent body fat, fat mass index (FMI), and fat-free mass index (FFMI), were consistently better predictors as MS-associated factors than BMI.They obtained their data from 12,294 adults who took part in annual physical examinations provided by EHE International, Inc., New York, NY, USA.They showed that each anthropometric factor was related to metabolic risk factors using Pearson correlation analyses, linear regression analyses, and ROC curves, and no single index exhibited the strongest prediction consistently.BMI was identified as the strongest predictor of BP.
Shen et al. [15] evaluated whether WC correlated more strongly with MS components than percent fat and other related anthropometric factors, such as BMI, in 1010 healthy white and African-American men and women.Their results demonstrated that WC was most strongly associated with MS, followed by BMI.
Williams et al. [18] demonstrated the association between GLU and MS.Their study enrolled 802 subjects aged 40-65 years who were randomly selected from a population-based sampling frame.Their study identified four dietary patterns using principal component analysis.These dietary patterns were related to other lifestyle factors, such as the socioeconomic group, smoking, alcohol intake, and physical activity.In component 1, there was a negative association with diabetes as one of the causes of MS.According to the results, dietary patterns were associated with GLU intolerance and MS.
Lao et al. [19] investigated the association between white blood cell (WBC) count and MS in old Chinese patients.The analyzed dataset (obtained from a medical checkup record) consisted of 3020 men and 7256 women aged 50-85 years.Vascular risk factors (e.g., WC, BMI, TG, TC, LDL, C-reactive protein, SBP, and DBP) were associated with WBCs both in men and women.The risk of MS increased significantly with higher WBC counts (OR = 1.86 [1.43-2.42]).There was a strong association between WBC count and vascular risk factors of MS.
Funakoshi et al. [20] investigated the association of airflow obstruction with MS in Japanese men.Their dataset consisted of 7189 subjects aged 45-88 years from spirometric lung function tests at medical checkups.The airflow obstruction was divided into two parts (GOLD I and GOLD II-IV) according to GOLD guidelines.The incidence rate of MS was higher in subjects with GOLD II-IV than in those with normal lung function (OR = 1.33 [1.01-1.76]).Additionally, the MS component WC was associated with MS (OR = 1.76 [1.24-2.50])and BP (OR = 1.37 [1.08-1.74]).
Paek et al. [22] evaluated the association between impaired lung function and metabolic risk factors, enrolling 4001 subjects aged > 18 years in 2001 from the KNHANES dataset.Using multiple linear regression, they analyzed the association of low pulmonary function with MS.They also examined the associations of restrictive lung disease and obstructive lung disease with MS using multiple logistic regression adjusting for WHtR, sex, age, smoking, physical activity, alcohol consumption, and socioeconomic status.WC, SBP, and TG were associated with FVC.They showed that the association of low pulmonary function with MS risk factors and restrictive lung disease (OR = 1.40 [1.01-1.98])was also related to MS.
Park et al. [23] investigated the association between chronic obstructive pulmonary disease (COPD) and MS.Their dataset comprised 1215 subjects aged > 40 years from the KNHANES in 2001.The prevalence of MS was significantly high among COPD patients in both men and women (33.0% and 48.5% higher, respectively).In men, the risk of COPD (OR = 2.03 [1.08-3.80])was associated with MS and abdominal obesity (OR = 1.95 [0.93-4.11]).
Lin et al. [24] studied the relationship between impaired lung function and MS in adults.Their study assessed 46,514 patients (21,669 men and 24,845 women) aged > 20 years, recruited from four nationwide MJ Health Screening Centers in Taiwan.The investigators examined associations between lung function test results and MS using multivariate logistic regression.They demonstrated the association between restrictive lung impairment and an increased risk of MS (p < 0.01, OR = 1.221), adjusting for age, sex, BMI, smoking, alcohol consumption, and physical activity.
This study showed results similar to those of previous studies [12][13][14][15][16][17][18][19][20][21][22][23][24] regarding the association of MS with anthropometric, blood, urinary, and spirometric factors.Specifically, TG exhibited the strongest association among all parameters in both men and women.With regard to anthropometric factors, WHtR showed the strongest association in both men and women.Concerning blood parameters, GLU showed a strong positive association.With respect to urinary parameters, UPH showed a negative association with MS.Regarding spirometric factors, FEV6 and FVC exhibited negative associations with MS.
Choe et al. [38] constructed and analyzed the five MS prediction models including the MLP (multilayer perceptron), NB, RF (random forest classification), CT (decision tree), and SVM (support vector machine).As for the AUC, the NB model showed the highest predictive power at 0.690.Kopitar et al. [47] evaluated and validated T2DM (type 2 diabetes mellitus) prediction model including lm (linear regression model), Glmnet (regularized generalized linear model), RF, XGBoot (extreme gradient boosting), and LightGBM (light gradient boosting machine) by RMSE, AUC, AUPRC, R 2 , and calibration plot.RF showed the best performance according to RMSE.In our study, as for the results of performance analysis and calibration plot, the wrapper_LR model showed the best performance in both men (AUC = 0.868) and women AUC = 0.932), respectively.
Overall, our current findings show that anthropometric, blood, urinary, and spirometric factors might be involved in the induction of MS.Specifically, there was increased association of WC and WHtR, and decreased association of TG, UPH, FEV6, and FVC with MS.The wrapper based logistic regression model showed a high predictive power for both men and women.
Collectively, all the studies, including ours, identified different potent indicators of MS, demonstrating that no single factor could serve as an effective marker for the identifying MS.This confirms the concept that co-occurring conditions contribute to development of the disease.Thus, a subgroup of individuals with shared pathophysiology, and hence a common strong indicator, might be at high risk of a specific comorbid disease.Therefore, subgroup stratification of MS might be necessary for partitioning risk factor clusters involved, which should provide insight into personalizing pharmacological and lifestyle modifications as treatment approaches for managing the condition.
This study had several limitations: First, it was difficult to determine cause-and-effect associations due to the cross-sectional design of the study.Second, our results were limited to Korean adults because of the KNHANES dataset employed.Nonetheless, despite the identification of specific indicators of MS, which confirmed the findings of previous reports utilizing machine learning, our results indicated sex-based differences in MS risk factors.

Conclusions
MS is closely related to well-known major diseases, such as CVD, stroke, cancer, and T2D.Thus, it is increasingly attracting attention as an objective health indicator worldwide.In this and Nutrition Examination Survey (KNHANES V and VI) from 2010 to 2015.The KNHANES is a cross-sectional survey conducted initially by the Korea Centers for Disease Control and Prevention [39-41].Datasets from the survey were approved by the Korea Ministry of Health and Welfare (2010-02CON-21-C, 2011-02CON-06-C, 2012-01EXP-01-2C, and 2013-07CON-03-4C).National Health and Nutrition Examination was conducted without deliberation by the Research Ethics Review Committee, as it corresponds to research conducted by the state for public welfare, according to Article 2 (1) of the Bioethics Law and Article 2 (2) 1 of the Enforcement Regulations of the same law.The research was conducted in accordance with principles of the Helsinki Declaration update of 2008.This study was approved by the Institutional Review Board of the Korea Research Institute of Standards and Science, and included approval for the access and analysis of open-source data from the KNHANES V and VI with a waiver for the documentation of informed consent (IRB No. KRISS-IRB-2019-14).

18 Figure 1 .
Figure1.The metabolic syndrome prediction study was designed for preprocessing and statistical analysis.WHtR, waist-to-height ratio; OR, odds ratio; CI, confidence interval.

Figure 1 .
Figure1.The metabolic syndrome prediction study was designed for preprocessing and statistical analysis.WHtR, waist-to-height ratio; OR, odds ratio; CI, confidence interval.

Figure 1 .
Figure1.The metabolic syndrome prediction study was designed for preprocessing and statistical analysis.WHtR, waist-to-height ratio; OR, odds ratio; CI, confidence interval.
presents a flow chart showing details of the sample selection procedure.

Figure 2 .
Figure 2. Sample selection procedure for association analysis.

Figure 2 .
Figure 2. Sample selection procedure for association analysis.

Figure 3 .
Figure 3. AUC analysis of metabolic syndrome in men.

Figure 4 .
Figure 4. AUC analysis of metabolic syndrome in women.

Figure 3 .
Figure 3. AUC analysis of metabolic syndrome in men.

Figure 3 .
Figure 3. AUC analysis of metabolic syndrome in men.

Figure 4 .
Figure 4. AUC analysis of metabolic syndrome in women.

Figure 4 .
Figure 4. AUC analysis of metabolic syndrome in women.

Figure 5 .
Figure 5. AUC of metabolic syndrome prediction models in men and women.

Figure 5 .
Figure 5. AUC of metabolic syndrome prediction models in men and women.

Table 1 .
Conditions of metabolic syndrome.

Table 2 .
Primary characteristics of all factors analyzed for metabolic syndrome.

Table 4 .
Associations of anthropometric factors, blood parameters, urinary parameters, and spirometric factors with metabolic syndrome in women.
Adjusted for age, body mass index (BMI), alcohol consumption, smoking, income, recognized stress rate, and education level.The results are from crude and adjusted analyses using binary logistic regression.

Table 5 .
Feature selection using the wrapper and filter method for each model.

Table 6 .
Predictive power analysis of the four models in men and women.This table was created using data transformed by standardization.The results of detailed classification performance were grouped by class (normal and metabolic groups) using a confusion matrix.NB: Naïve Bayes, LR: logistic regression, metabolic: metabolic syndrome.