Network Analysis of Demographics, Dietary Intake, and Comorbidity Interactions

The aim of this study was to elucidate the complex interrelationships among dietary intake, demographics, and the risk of comorbidities. We applied a Gaussian graphical model to calculate the dietary scores of the participants. The network structure of dietary intake, demographics, and comorbidities was estimated in a mixed graphical model. The centrality indices of the nodes (strength (S), closeness (C), and betweenness (B)) were measured to identify the central node. Multinomial logistic regression was used to examine the association between the factors and comorbidities. Among 7423 participants, the strongest pairwise interactions were found between sex and smoking (1.56), sex and employment (0.66), sex and marital status (0.58), marital status and income (0.65), and age and employment (0.58). Among the factors in the network, sex played a central role (S = 4.63, C = 0.014, B = 41), followed by age (S = 2.81, C = 0.013, B = 18), smoking (S = 2.72, C = 0.013, B = 0), and employment (S = 2.17, C = 0.014, B = 22). While the odds of hypertension and diabetes were significantly higher among females than males, an inverse association was observed between high cholesterol and moderate chronic kidney disease. Among these factors, dietary intake was not a strongly interacting factor in the network, whereas age was consistently associated with the comorbidities of hypertension, high cholesterol, diabetes, and chronic kidney disease.


Introduction
Several studies have reported significant associations between comorbidities and demographic and dietary variables. For example, a recent comprehensive meta-analysis of 93 individual studies found that an unhealthy diet was associated with a higher body mass index (BMI), whereas a healthy diet was associated with a higher education level, greater physical activity, and reduced smoking behaviors [1]. Additional pooled individual-level data of American adults showed that obesity was significantly associated with a higher risk of cardiovascular disease (CVD) [2]. Moreover, other demographic factors, such as smoking tobacco and alcohol consumption, have been shown to result in several negative health consequences [3][4][5]. Regarding disease development, smoking is not only a risk factor itself but may also be associated with many socioeconomic factors, including income, educational level, and employment status [6,7]. Furthermore, accumulating evidence has indicated that dietary intake contributes to the risk of chronic diseases [8][9][10]. Julibert et al. recently reported a higher prevalence of metabolic syndrome-which is associated with CVD and type 2 diabetes mellitus [11]-among participants consuming higher amounts of total fat and lower amounts of carbohydrates and fiber in their diets [8]. Another network meta-analysis of 3595 participants showed a beneficial effect of nut intake on low-density lipoprotein cholesterol and triglycerides [12], supporting the negative association between nut intake and CVD risk [13].
The association between exposures and outcomes, such as the aforementioned associations, can be measured statistically using linear, logistic, Cox, and Poisson regression techniques [14]. Although variable selection approaches, such as stepwise, backward, and forward selection, may be sufficient to address many interdependencies among predictors in epidemiological data, it is difficult to demonstrate the accuracy of the obtained results [15]. These conventional approaches also have limitations in explaining complex relationships, such as biological pathways in systems epidemiology [16]. Additionally, due to the development of technologies, large-scale "omic" data sets of genome, transcriptome, proteome, metabolome, and microbiome data may include numerous variables [16,17]. In this case, a network analysis with graphical theory, which consists of nodes (vertices) that indicate factors and edges (links) that represent relationships, such as correlation coefficients among the factors, can provide insights into the interactions among all the variables and explore how a single variable is impacted by multiple factors [15,18]. In nutritional epidemiology, Iqbal et al. applied a Gaussian graphical model (GGM) to derive networks of dietary patterns in a German population, in which the partial correlations between two food groups conditioning the remaining food groups were estimated [19]. Solmi et al. recently applied a mixed graphical model (MGM) to investigate the interrelationship of various factors in a cohort of elderly adults at risk of osteoarthritis [20]. In contrast to the GGM, the MGM is able to identify a network structure of regularized interactions among both categorical and continuous variables. However, both dietary score and comorbidities in Solmi et al.'s study were measured using fixed scales, such as the Mediterranean dietary adherence score and the Charlson comorbidity index, which might not generalize to different populations [20].
Given that the complex associations among demographics, dietary behaviors, and comorbidities may provide valuable insight into disease development, we conducted this study to describe the interactions among these factors using an MGM. Additionally, we investigated the associations among demographics, dietary intake, and the risk of chronic diseases.

Data and Participants
This study used baseline survey data from the Cancer Screening Examination Cohort at the National Cancer Center (NCC), South Korea, from 16 October 2007 to 24 May 2019 (n = 16,188). Further details of this process were previously described [21]. Data from a structured questionnaire, clinical tests, physical functions, and blood tests were extracted for this study [21]. In addition, data on dietary behaviors were collected via a validated food frequency questionnaire (FFQ) with 106 food items [22,23].
The participants who did not complete the questionnaire that assessed general characteristics or the FFQ, and those who reported unrealistic data for energy consumption (<500 or >4000 kcal) were excluded (n = 5378). Of the 10,810 remaining participants, 7423 participants were ultimately included in the final network analysis after removing those with missing values for demographic and comorbidity information (Figure 1). The study protocol was approved by the Institutional Review Board of the NCC (number NCCNCS-07-077).

Variable Measurements
The demographic variables included age (years), sex (male and female), marital status (married, cohabitation, and others), education (<high school, high school graduate, and ≥college), monthly income (<2 million, 2-4 million, and ≥4 million KRW), smoking status (never, past, and current), alcohol consumption (never, past, and current), and regular exercise (yes and no). Although the World Health Organization recommends the standard BMI levels for underweight, normal, overweight, and obese individuals (i.e., <18. 5, 18.5-24.9, 25-29.9, and ≥30 kg/m 2 , respectively), the cutoff BMI for the Korean population was identified as 23 kg/m 2 and 25 kg/m 2 , considering the higher body fat percentages in Asians than in non-Asians, and the increased risks of any comorbidities, including diabetes, hypertension, and dyslipidemia [24,25]. Additionally, the participants in the standard underweight and obesity groups accounted for relatively small proportions of the total study population (approximately 2% each), with 175 (2.4%), 3216 (43.3%), 1978 (26.6%), 1870 (25.2%), and 184 (2.5%) participants with BMIs of <18. 5, 18.5-22.9, 23-24.9, 25-29.9, and ≥30 kg/m 2 , respectively. Therefore, we selected the following cutoffs for the BMI for the final analysis: <23, 23-24.9, and ≥25 kg/m 2 . The dietary intake (g/day) of 16 food groups was calculated using the Computer-Aided Nutritional Analysis Program (CAN-Pro) 4.0 (Computer-Aided Nutritional Analysis Program, The Korean Nutrition Society, Seoul, Korea). Then, these intakes were logtransformed, and their pairwise correlations were estimated in the GGM [19]. Weights according to food groups were obtained as the eigenvector centrality of the GGM-estimated network to compute the dietary intake score [26]. The dietary score was calculated as the sum of the amount of each food group consumed (g/day) by their respective weights. The higher dietary scores were categorized into tertiles, representing the higher GGM-weighted food consumption and implying the light, normal, and heavy eating behaviors.

Network Analysis
The network analysis of the demographics, dietary intake, and comorbidity factors was performed using an MGM, in which nodes reflect both categorical and continuous variables, and edges reflect their pairwise interactions [31]. In particular, we applied the "mgm" package, which was developed by Haslbeck and Waldorp, to obtain the network estimation of time-varying k-order MGMs [31]. Lasso regularization with extended Bayesian information criteria (EBIC) model selection, which is considered to be more conservative and have slightly higher precision than the cross-validation procedure, was applied and set at 0.5 to estimate the network structures [32][33][34]. The parameter of the interaction between two continuous variables indicates their partial correlation after controlling for the remaining variables [31]. In the case of a continuous variable and a categorical variable, the parameter of the interaction indicates the relationship between the continuous variable and the probability of observing category 1 of the categorical variable [31]. The parameter between two categorical variables corresponds to the interaction between two corresponding indicator variables [31]. When combining the coefficients estimated by the nodewise regression procedure into one edge parameter, all the estimates are required to be nonzero [31].
Regarding the importance of the nodes, we assessed the centrality indices, including strength (S; how well a node is directly connected to the other nodes), closeness (C; how well a node is indirectly connected to the other nodes), and betweenness (B; how important a node is in the mediation between two other nodes) [35,36]. The network accuracy was assessed by bootstrapping 80% of the original sample with a replacement [36,37].
In the sensitivity analysis, we additionally constructed an MGM-identified network, including 16 food groups instead of the dietary score. We also considered the interactions among the remaining variables after excluding nonmodifiable factors, such as age and sex.

Association Analysis
The statistical differences among the comorbidity statuses according to the demographic factors and dietary intakes were assessed using an ANOVA test and t-test (for continuous variables) or a Chi-square test (for the categorical variables). Multiple and multinomial logistic regressions were used to explore the associations of demographics and dietary intake with comorbidities.
All statistical analyses were performed in R version 3.6.0 (Foundation for Statistical Computing, Vienna, Austria).

Dietary Score Measurements
The GGM identified the dietary intake network (Figure 2), and the adjacency matrix of the regularized partial correlation is shown in Table S1. Sugars and sweets were observed to exhibit the strongest partial correlation with oils and fats (0.67), followed by seasonings with vegetables (0.42) and potatoes with starches (0.33). The negative partial correlations weakly ranged from −0.10 to −0.04. The personalized dietary scores were calculated based on the eigenvector centrality of these edge weights and the raw dietary intake amount (g/day) of the node sizes. As a result, the dietary score of the whole study population was 592.4 ± 308.6.

Characteristics of Study Participants
The characteristics of the study participants are presented in Tables 1-4. Age, employment status, smoking status, alcohol consumption, and BMI significantly differed among the BP, total cholesterol, fasting glucose, and GFR comorbidity marker groups (p < 0.05). The other demographic factors and the intake of 16 food groups were equally distributed under at least one comorbidity condition. Additionally, the GGM-identified dietary scores were classified into low, medium, and high quantiles, and variations among the BP and GFR groups were observed (p ≤ 0.01).

Network Structure
The MGM-identified network structure of the pairwise interactions among the dietary score, demographics, and comorbidity markers is shown in Figure 3, and the weighted adjacency matrix is presented in Table S2. In general, all the factors were pairwise related to each other, except for BMI, which was independent of the network of interactions. Regarding dietary intake, an interaction was found only with regular exercise (0.05).
Marital status was observed to interact with income (0.65), sex (0.58), and employment (0.23) and was slightly related to alcohol consumption (0.06). There was an interaction between educational level and income (0.42), age (0.31), sex (0.19), and employment (0.10). Educational level was also found to be slightly related to total cholesterol (0.03) and BP (0.02).
Regarding comorbidities, the interaction weights between chronic disease markers and demographic factors ranged from 0.02 to 0.32, and approximately half of the interaction weights were lower than 0.10.

Network Inference
The centrality indices are presented in Figure 4 and Table

Network Stability
The accuracy of the network inference was investigated by nonparametric bootstrapping. The variability in the edge weights is shown in Figure S1; many bootstrapped confidence intervals (CIs), which were sufficiently small, suggested the high accuracy of the estimated edge weights. Additionally, the edges between sex and smoking and between sex and marriage were suggested as the two strongest edges because their bootstrapped CIs did not overlap with the bootstrapped CIs of any other edges. Figure S2 shows the stability of the centrality indices in a subset of the data. Although the node strength seemed to be stable and the node betweenness tended to gradually decrease, the CS coefficients were still high, with values of 0.75 and 0.60. In this case, node closeness was not appropriate since BMI was independent of the other nodes. The significant differences in the edge weights and node strengths are presented in Figures S3 and S4. Generally, approximately half of the edge weights did not differ from one another ( Figure S3), while most of the node strengths significantly differed from one another ( Figure S4).

Discussion
Given that the complex correlations among dietary intake, demographics, and chronic diseases might contribute to the progression of other health conditions, we first described the pairwise interactions among age, sex, marital status, educational level, employment status, monthly income, smoking status, alcohol consumption, physical activity, BMI, dietary score, and chronic diseases related to BP, total cholesterol, fasting glucose, and GFR. The strongest pairwise interactions were found between sex and smoking, sex and employment, sex and marital status, marital status and income, and age and employment. Among the factors in the network, sex had a central role, followed by age, smoking, and employment. Second, we found that aging was a consistent risk factor associated with hypertension, high cholesterol, diabetes, and chronic kidney disease. While the odds of hypertension and diseases were significantly higher among females than males, the inverse association was observed in terms of high cholesterol and moderate chronic kidney disease.
Regarding the interaction between sex and smoking, Peters et al. recently investigated the sex dissimilarity in tobacco consumption in the UK Biobank study, which included approximately 500,000 participants. It was shown that differences in smoking habits have decreased over time in the Western population [38]. Several meta-analyses also reported the considerable risk of chronic diseases and cancer associated with smoking regardless of sex [39,40]. In a pooled meta-analysis that involved approximately one million Asian participants, although tobacco smoking among males accounted for approximately 60% of lung cancer mortality, the findings among females still varied according to country and region [41]. Nevertheless, the prevalence of a daily smoking habit was 25.0% among males, compared with 5.4% among females worldwide [42]. Similarly, the smoking prevalence was much higher in males (46.6% in 2005 and 42.3% in 2014) than in females (4.6% in 2005 and 5.1% in 2014) among Korean adults, which may support the strong interaction between sex and smoking status in our study [43,44].
In addition, the interaction between sex and employment also reflected an imbalance in employment rates between males (approximately 70%) and females (approximately 50%) during the 2007-2018 period in South Korea [45]. Although Korean women are highly skilled and educated, many of them are encouraged to discontinue permanent employment due to social expectations [46]. Furthermore, a high monthly income not only interacted with marital status but was also significantly associated with a decreased risk of marriage dissolution [47]. In East Asian countries, such as Korea, Japan, China, and Taiwan, this finding could be explained by the high financial cost of raising children as the main reason for delayed marriage [48].
Regarding factors associated with chronic diseases, age was determined to play an important role in disease pathology [49]. In the body, senescent cells are induced in normal aging, age-related disease, and therapeutic intervention contexts [49,50]. In aging people, senescence, which is normally limited to some organs and tissues, becomes dysfunctional due to aging, resulting in a much higher accumulation of senescent cells than in normal aging individuals and can cause age-related chronic diseases [49]. Moreover, senescent cells were reported to contribute to metabolic syndromes regulated by AMPK, GSK3, and mTOR-signaling kinases [51].
To the best of our knowledge, this study was the first in which the interactions among dietary intake, demographics, and comorbidity markers in the Korean population were investigated. In general, stationary assumption is defined as the stability of the mean, variance, and autocorrection of the data over time. Although all the information was recorded at the initial time of enrollment in the study, the time-varying model of MGM can reduce the assumption of stationary data such that the parameters are allowed to vary at the same time point, and thus, the results may represent the long-term relationship among the variables [52]. Several dietary score scales have been developed to assess a healthy diet [53]; however, such methods may not be appropriate due to natural differences in the dietary behaviors of different populations. In this study, we applied the data-driven approach of the GGM to generate an overall dietary score that represents the eating behavior of light eating, normal eating, and heavy eating for each individual based on the amount of food consumed and the eigenvector centrality of the identified network as weights. Giving weights to each node of a food group in the calculation of the dietary score may better estimate the eating behavior than just simply adding or subtracting the food group consumption when combining food groups in previous studies [54]. Furthermore, chronic comorbidities were identified based on not only self-reports but also clinical test results, which increased the accuracy of disease status identification.
Despite several strengths, this study has some limitations. Although the study recruited a large number of participants from a health screening program over a period of ten years, the cross-sectional study design may not have allowed for a full investigation of the causal relationship between the demographic and dietary factors and metabolic markers. Additionally, selection bias could have occurred due to the hospital-based setting; therefore, the results may not represent the entire population. When using structured questionnaires as tools to obtain information, recall of habit information can differ between males and females and between those with and without diseases because of different levels of health compliance. However, the validated and reproducible FFQ was administered by well-trained staff, which could have minimized the risk of collecting inaccurate information [21,23]. Furthermore, approximately 30% of the participants were excluded from the final analysis due to missing values that could not be accommodated. Finally, since the dietary score represents the overall dietary habits, it is difficult to interpret the role of specific food groups in the development of comorbidities.

Conclusions
In conclusion, this study investigated the comprehensive interaction network among dietary intake, demographics, and comorbidities in Korean adults. Among the factors studied, age, sex, smoking, and employment were found to play central roles in the multidimensional network, and aging was consistently associated with the risk of comorbidities. Further prospective population-based studies should be conducted to confirm these findings.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/nu13103563/s1. Supplementary Figure S1: Bootstrapped confidence intervals for the edge weight accuracy, Figure S2: Stability of the centrality indices of the mixed graphical model-identified network, Figure S3: Bootstrapped difference test between nonzero edge weights, Figure S4: Bootstrapped difference test between node strength, Table S1: Adjacency matrix of pairwise correlation of dietary intake of 16 food groups, Table S2: Adjacency matrix of pairwise interaction of demographics, dietary intake, and comorbidity markers, Table S3: Centrality indices and prediction estimates of nodes in the network of demographics, dietary intake, and comorbidity markers, Table S4: Adjacency matrix of pairwise interaction of demographics, food groups, and comorbidity markers, Table S5: Centrality indices and prediction estimates of nodes in the network of demographics, dietary intake, and comorbidity markers, Table S6: Adjacency matrix of pairwise interaction of demographics, dietary intake, and comorbidity marker, Table S7: Centrality indices and prediction estimates of nodes in the network of modifiable demographics, dietary intake, and comorbidity markers. Informed Consent Statement: Informed consent was obtained from all participants involved in the study.

Data Availability Statement:
The data are available from the corresponding author upon reasonable request.