The Structure of Relationships between the Human Exposome and Cardiometabolic Health: The Million Veteran Program

The exposome represents the array of dietary, lifestyle, and demographic factors to which an individual is exposed. Individual components of the exposome, or groups of components, are recognized as influencing many aspects of human physiology, including cardiometabolic health. However, the influence of the whole exposome on health outcomes is poorly understood and may differ substantially from the sum of its individual components. As such, studies of the complete exposome are more biologically representative than fragmented models based on subsets of factors. This study aimed to model the system of relationships underlying the way in which the diet, lifestyle, and demographic components of the overall exposome shapes the cardiometabolic risk profile. The current study included 36,496 US Veterans enrolled in the VA Million Veteran Program (MVP) who had complete assessments of their diet, lifestyle, demography, and markers of cardiometabolic health, including serum lipids, blood pressure, and glycemic control. The cohort was randomly divided into training and validation datasets. In the training dataset, we conducted two separate exploratory factor analyses (EFA) to identify common factors among exposures (diet, demographics, and physical activity) and laboratory measures (lipids, blood pressure, and glycemic control), respectively. In the validation dataset, we used multiple normal regression to examine the combined effects of exposure factors on the clinical factors representing cardiometabolic health. The mean ± SD age of participants was 62.4 ± 13.4 years for both the training and validation datasets. The EFA revealed 19 Exposure Common Factors and 5 Physiology Common Factors that explained the observed (measured) data. Multivariate regression in the validation dataset revealed the structure of associations between the Exposure Common Factors and the Physiology Common Factors. For example, we found that the factor for fruit consumption was inversely associated with the factor summarizing total cholesterol and low-density lipoprotein cholesterol (LDLC, p = 0.008), and the latent construct describing light levels of physical activity was inversely associated with the blood pressure latent construct (p < 0.0001). We also found that a factor summarizing that participants who frequently consume whole milk are less likely to frequently consume skim milk, was positively associated with the latent constructs representing total cholesterol and LDLC as well as systolic and diastolic blood pressure (p = 0.0006 and <0.0001, respectively). Multiple multivariable-adjusted regression analyses of exposome factors allowed us to model the influence of the exposome as a whole. In this metadata-rich, prospective cohort of US Veterans, there was evidence of structural relationships between diet, lifestyle, and demographic exposures and subsequent markers of cardiometabolic health. This methodology could be applied to answer a variety of research questions about human health exposures that utilize electronic health record data and can accommodate continuous, ordinal, and binary data derived from questionnaires. Further work to explore the potential utility of including genetic risk scores and time-varying covariates is warranted.


Background
Cardiovascular disease (CVD) is the leading cause of adult mortality globally [1] Strategies aimed at reducing CVD rates involve modulation of markers of CVD risk. In particular, elevated circulating cholesterol and triglyceride levels are associated with higher CVD risk and mortality and are principal targets for risk reduction [2,3]. Further, elevated blood pressure is one of the leading non-communicable disease risk factors [4], and elevated glycated hemoglobin is also a predictor of cardiovascular disease [5][6][7][8]. As such, strategies aimed at improving lipid profile, blood pressure, and glycemic control are urgently needed.
The array of external factors an individual is exposed to, referred to as the exposome, represents a complex network of interrelationships within and between different components comprising diet, lifestyle, and demographics. Despite the well-documented ability of individual exposome components, or groups of exposome components, to influence many aspects of human physiology [9][10][11], surprisingly little is known about how the complex array of exposures as a whole shape the cardiometabolic risk profile. Reductionist approaches, such as single-exposure models, are unable to account for the complex interactions between the many potential exposome components and their effect on physiology [12]. The absence of models that integrate the many different components undermines the capacity of current studies of exposome components to draw robust generalizations. This project therefore aims to model the system of relationships underlying the way in which the exposome, as a whole, shapes the cardiometabolic risk profile. To achieve this aim, we utilized a truly unique dataset generated from an exposome assessment, as well as longitudinally assessed markers of cardiometabolic health in the Million Veteran Program.

Study Population
Between January 2011 and November 2019, approximately 800,000 Veterans enrolled in the Million Veteran Program (MVP) [13]. The current prospective study draws from the approximately 350,000 Veterans that had enrolled in the MVP between January 2011 and 2016. Of the 297,937 participants that had exposure data, 182,363 participants were excluded if they were using antilipemic, antihypertensive, and/or hypoglycemic medications during either the exposure-assessment or outcome-assessment periods (Supplementary  Table S1). A further 79,078 participants were excluded as they had incomplete exposure and/or physiology data. Consequently, the final analysis included 36,496 MVP participants. Consent was obtained in accordance with all VA policies and under the authority of the VA Central IRB [13].

Exposome Assessment
Variables representing the exposome were observed (measured) using various questionnaires. At enrollment, participants completed a baseline questionnaire in which they reported their date of birth, height, smoking status, gender, ethnicity, and race. Participants also completed a lifestyle questionnaire in which they reported body weight, frequency of light, moderate, and vigorous physical activities both during and outside of work hours, smoking status, and number of hours per week spent in sedentary activities (watching television, DVDs, or videos, using a computer, or playing video games). Body mass index (BMI) was calculated as weight (kg)/height (m) [2].
A food frequency questionnaire (FFQ) was used to assess habitual dietary intake over the year preceding lifestyle questionnaire administration. Participants were asked to describe the average consumption frequency of 67 different foods. "For each food listed, please mark the column indicating how often, on average, you have used the amount specified during the past year". Pre-specified answers were as follows: "Never or less than once a month; 1 to 3 per month; once a week; 2 to 4 per week; 5 to 6 per week; once (1) a day; 2 to 3 per day; 4 to 5 per day; or 6+ per day".
For this study, we included 83 measured (observed) variables to represent the exposome. This is a large number of individual variables, many of which are not independent from one another due to patterns of behaviors amongst participants. In order to make sense of the numerous individual measured (observed) exposome variables, it was imperative that we reduce the dimensionality of the exposome variables to a smaller set of common factors representing groups of covarying measured (observed) variables. The methods to achieve this dimension reduction are detailed in the statistical analysis section.

Assessment of Physiological Markers of Cardiometabolic Health
A panel of clinicians reviewed lab measurements across all VA locations and adjudicated discrepancies to ensure lab measure consistency. Electronic medical records, adjudicated by two clinicians, were used to ascertain the markers of cardiometabolic health, which included systolic and diastolic blood pressures, as well as the concentration of total cholesterol, low-density lipoprotein cholesterol (LDLC), high density lipoprotein cholesterol (HDLC), triglycerides, HbA1c, and glucose.
This study included the physiological variables that were assessed during the 1-4 years post lifestyle questionnaire administration. For each individual marker of cardiometabolic health, analyses utilized the mean measurement of all of the assessments taken during this period, as well as the maximum recorded measurement during this time period. For each physiology parameter, measurements were excluded if they were taken within 4 days of a previous measurement [14], if they had negative values, or if they had values that were three interquartile ranges from the 25th and 75th percentiles. Data from casually obtained specimens were used in the analysis without regard to fasting status.

Assessment of Medication Usage
Electronic medical records were used to ascertain antilipemic agent usage during the 1 year preceding exposure assessment and during the outcome assessment period. Two clinicians adjudicated the list of antilipemic agents and included specific medications and doses from the following Generic Adjudication Classes: alirocumab; atorvastatin; atromid; bezafibrate; bezalip retard; cholestyramine; choloxin; clofibrate; colesevelam; colestipol; dextrothyroxine; evolocumab; ezetimibe; ezetimibe/simvastatin; fenofibrate; fenofibric acid; fish oil; gemfibrozil; icosapent ethyl; lomitapide; mevacor; mipomersen; niacinamide; omega-3; omega-3 acid; probucol; rosuvastatin; and statin. The same method of adjudication was used to ascertain usage of antihypertensive medications and combinations, as well as use of hypoglycemic agents.

Statistical Analysis
Before analysis, participants were randomly divided into one of two groups: a training dataset containing 66.6% of participants (n = 24,411) or a validation dataset containing the remaining 33.3% of participants (n = 12,085). See Supplementary Figure S1 for details.
The advent of multidimensional cohorts that both assess the exposome and are linked to electronic health records has resulted in large and complex datasets that comprise normally and non-normally distributed data that can be continuous, ordinal, categorical, and binary. There are many techniques available for reducing data dimensionality, which can be broadly categorized into supervised analyses (such as decision trees) and unsupervised analyses. Of the unsupervised analytic approaches, methods such as cluster analysis were not implemented as they would have reduced the number of observations (participants) by grouping them into a smaller set of clusters. Instead, we were aiming to achieve a reduction in the number of variables by grouping them into a smaller set of factors. We achieved this aim through implementation of common exploratory factor analysis. In fact, the use of common exploratory factor analysis in biomedical research is well-tested and effective [15,16], representing an established method whereby "hidden" relationships between the assumed latent variables and the initial observed (measured) variables can be uncovered [17]. To make sense of these data, this study aimed to (i) holistically examine the complex networks of interrelationships that define the exposome and clinical cardiometabolic risk profile; (ii) represent theoretical constructs that are unmeasurable or unmeasured; (iii) include parameter-specific measurement error; and (iv) integrate a number of techniques into one framework, accounting for the range of distributions, units, and relations within and between exposures and cardiometabolic health. Through applying tetrachoric and polychoric, common exploratory factor analysis followed by multivariable-adjusted regression analysis, our methods allowed us to observe the structure of relationships within and between the human exposome and subsequent markers of cardiometabolic health in this large, metadata-rich, prospective cohort of adult US Veterans.
The first stage of the analysis identified the latent constructs (common factors) that best described the shared covariance of the observed (measured) exposures and the physiological variables in the training dataset. These unobservable latent constructs are essentially hypothetical constructs that are used to represent groups of interconnected measured variables [18]. Exploratory factor analyses were used to evaluate the latent constructs and underlying structure because there were multiple hypotheses and extremely limited a priori knowledge of how observed (measured) variables might cluster, and because we aimed to develop a measurement model of latent variables, and not to merely identify a linear combination of variables, as is the case in principal component analysis.
For exposure variables, the exploratory factor analysis used tetrachoric and polychoric correlation coefficients between measures of exposure and oblique promax rotation and the varimax prerotation method to make exposure factors more parsimonious [19]. For physiology variables, Spearman's correlation coefficients were estimated and utilized a common exploratory factor analysis using the orthogonal parsimax rotation [20]. Factors were estimated with maximum likelihood methods [20,21]. As a sensitivity analysis, we implemented an alternative method for extracting factors: iterated principal factor analysis. We determined the number of factors to extract through parallel analysis, where each of the eigenvalues of the input correlation matrix was compared against an empirical distribution of eigenvalues. The empirical distribution of eigenvalues was obtained from 10,000 simulations of generated random correlation matrices. We retained all factors with corresponding eigenvalues that exceeded the one-sided critical value (α = 0.01) of the empirical eigenvalue distribution [20,22].
The eigenvalues and vectors were then used to compute the standardized (mean = 0, standard deviation = 1) latent constructs in the validation dataset, upon which multivariableadjusted regression analysis that simultaneously adjusted for all of the exposure latent constructs could be applied to identify the structure of relationships between exposure latent constructs and latent constructs representing cardiometabolic health. These inter-relationships were visualized using Cytoscape Version 3.7.2, with the following criteria dictating which associations were displayed: rotated factor pattern (standardized regression coefficients ≥ 0.5); uniqueness (display = all); inter-factor correlations (correlation coefficient ≥ 0.4); and multivariable-adjusted regression coefficients (significance under the Bonferroni criterion).
All analyses were conducted using SAS version 9.4, maintenance release #6.

Participant Characteristics
All of the exposome variables that were included in the models are detailed in Table 1 and Supplementary Table S2. Of the 36,496 MVP participants analyzed, 86% were men, 85% were Caucasians and 11% were African-Americans ( Table 1). The mean ± SD body mass index was 28 ± 5 kg/m 2 (Supplementary Table S2). Markers of cardiometabolic health are presented in Table 2.  Two-thirds of participants (n = 24,411) were randomly assigned to the training dataset and the remaining one-third (n = 12,085) were randomly assigned to the validation dataset. The mean ± SD age of the participants in the training and validation datasets was identical (62.4 ± 13.4 years).

Latent Constructs Describing the Exposure Variables in the Training Dataset
Tetrachoric and polychoric, common exploratory factor analysis in the training dataset revealed 19 common factors that explained shared exposure observed (measured) variable covariance. The common factors could be broadly categorized according to the measured (observed) variables they represented ( Figure 1 and Supplementary Figure S2). For example, the Common Exposure Factor E1 represented shared covariance in the intakes of many commonly consumed vegetables. Furthermore, Common Exposure Factor E17 had strong positive weighting for intake of whole milk but a strong negative weighting for intake of skim milk, representing the fact that, in this cohort, participants who frequently consumed whole milk were less likely to frequently consume skim milk.
Different types of physical activity were grouped together in three separate Common Exposure Factors. In particular, Common Exposure Factor E6 represented moderate and vigorous physical activity at home and during leisure time, Common Exposure Factor E7 represented moderate and vigorous physical activity at work, and Common Exposure Factor E10 represented light levels of physical activity at home, during leisure and at work.

Latent Constructs Describing the Physiological Variables in the Training Dataset
Common exploratory factor analysis in the training dataset revealed 5 common fac- Figure 1. Rotated factor pattern based on tetrachoric and polychoric, common exploratory factor analysis of measured exposure variables in the training dataset; limited to observed (measured) variables that had a standardized regression coefficient ≥ 0.5 for at least one latent construct. These latent constructs (common factors) are those that best described the shared covariance of the observed (measured) exposures in the training, dataset.Number of participants: 24,411. Standardized regression coefficient.

Latent Constructs Describing the Physiological Variables in the Training Dataset
Common exploratory factor analysis in the training dataset revealed 5 common factors that explained shared physiology variable covariance. These broadly represented (i) total cholesterol and LDLC; (ii) glycemic control; (iii) blood pressure; (iv) HDLC; and (v) triglycerides ( Figure 2). Common Physiology Factor P1 had positive loadings for all measures of total cholesterol and LDLC, and Common Physiology Factor P3 had high loadings for all of the measures of blood pressure. In fact, the final model applied similar loadings to mean and maximum values of the observed (measured) variables. As sensitivity analysis, we implemented an iterated principal factor analysis as the extraction method and observed similar factor loadings with the exception of mean and maximum glucose, which went from having a factor loading < 0.5 for Common Physiology Factor P2 in the primary analysis to having a factor loading > 0.5 (0.64 and 0.60, respectively) for Common Physiology Factor P2 in the sensitivity analysis.
Nutrients 2021, 13, x FOR PEER REVIEW 9 of 24 method and observed similar factor loadings with the exception of mean and maximum glucose, which went from having a factor loading < 0.5 for Common Physiology Factor P2 in the primary analysis to having a factor loading > 0.5 (0.64 and 0.60, respectively) for Common Physiology Factor P2 in the sensitivity analysis.

Relationships between Human Exposures and Physiology in the Validation Dataset
Identification of the 19 Common Exposure Factors was done without knowledge of the physiological variables. Likewise, the creation of the 5 Common Physiology Factors was independent of the exposure variables. In Figure 3a-e, we present the complex patterns underlying the structure of relationships between Common Exposure Factors and Common Physiology Factors that remain after taking into account the non-independence of the assessed exposome. Some Common Exposure Factors had no association with the Common Physiology Factors, whereas others showed a strong association, both inversely and positively. Specifically, even though the Common Exposure Factor describing intake of processed meat and fried potato (E2) was associated with the Common Physiology Factors describing total cholesterol and LDLC (P1), triglycerides (P5), blood pressure (P3), and glycemic control (P2), the Common Exposure Factor representing red meat intake from main and mixed dishes (E14) was not associated with any of the physiological common factors. Similarly, although the Common Exposure Factor describing intake of moderate and vigorous physical activity at home and during leisure (E6) was associated with the Common Physiology Factors describing total cholesterol and LDLC (P1), triglycerides (P5), and HDLC (P4), the Common Exposure Factor representing moderate and vigorous physical activity at work (E7) was not associated with any of the physiological common factors.
When considering individual physiology factors, the fruit latent construct (Common Exposure Factor E3), but not the vegetable latent constructs (Common Exposure Factors E1 and E19) was inversely associated (estimate: −0.03, P: 0.0077) with the latent construct summarizing total cholesterol and LDLC (Common Physiology Factor P1) (Figure 3a). Conversely, the latent construct with a positive weighting for intake of whole milk but a strong negative weighting for intake of skim milk (Common Exposure Factor E17) had a positive association with Common Physiology Factor P1, as well as with Common Physiology Factor P3, the latent construct summarizing measures of blood pressure (Figure 3d). The latent construct describing light levels of physical activity (Common Exposure Factor E10) was inversely associated with the blood pressure latent construct.   Figures 1 and 2, respectively. Criteria for displaying measured (observed) variables: rotated factor pattern: Standardized regression coefficient ≥ 0.5. Criteria for displaying association lines: uniqueness (display all); inter-factor correlations (correlation coefficient ≥ 0.4); and multivariable, adjusted regression coefficients (p value significant using the Bonferroni threshold). For multivariable-adjusted regression coefficients, the line thickness represents the value of the -log10(p value), range: 2.11, 9.05  Figures 1 and 2, respectively. Criteria for displaying measured (observed) variables: rotated factor pattern: Standardized regression coefficient ≥ 0.5. Criteria for displaying association lines: uniqueness (display all); inter-factor correlations (correlation coefficient ≥ 0.4); and multivariable, adjusted regression coefficients (p value significant using the Bonferroni threshold). For multivariableadjusted regression coefficients, the line thickness represents the value of the -log10(p value), range: 2.11, 9.05.

Discussion
In this prospective cohort study of U.S. male and female Veterans, we reported an association between the exposome and markers of cardiometabolic health. Specifically, using factors identified in a training dataset, multiple multivariable-adjusted regression analyses revealed significant positive and inverse associations between exposure latent constructs and latent constructs describing observed (measured) physiology variables when applied to a separate validation dataset containing different participants. This provided us with critical insights and observations that represent steps forward in enhancing our understanding of how the exposome, as a holistic entity, shapes human physiology.
We employed common exploratory factor analysis to reveal the structure of interrelationships between individual exposures and physiology variables in a way that substantially advances our understanding of how observed (measured) exposome variables relate to the cardiometabolic risk profile. For example, a Common Exposure Factor was created to reflect the close relationship in study participants between high levels of moderate and vigorous physical activity at home and high levels of moderate and vigorous physical activity during leisure time. This relationship was not strongly correlated with levels of moderate and vigorous physical activity at work, which was represented by a different Common Exposure Factor. This suggests that the amount of moderate to vigorous physical activity participants perform at work did not covary with the amount of moderate to vigorous physical activity performed during leisure and at home [23]. By unveiling "hidden" relationships between the latent constructs and the observed (measured) variables they represent that matched our understanding of biology and variable representation, the utility of common exploratory factor analysis for both questionnaire-derived assessments of exposome and electronic medical record-derived assessments of cardiometabolic health was highlighted. However, it is unclear what the causal implications for these relationships are.
Very few studies have attempted to determine the influence of the exposome as a whole on cardiometabolic risk (as determined through electronic medical records). Modelling the system of relationships underlying the way in which the exposome, as a whole, shaped the cardiometabolic risk profile was therefore an important aim of our investigation. In addition to revealing the structure of the exposome and the structure of physiology variables, this study also revealed the structure of relationships between the exposome and cardiometabolic risk profile through multiple regression of latent constructs. An example of this was the creation of a latent construct in the training dataset that represented the reciprocal relationship between consumption of whole and skim milk. In other words, any associations of whole milk with cardiometabolic disease risk could not be separated from the effects of skim milk and should not be interpreted in isolation. When applied to separate validation datasets, this milk-based latent construct was positively associated with the latent constructs representing total cholesterol and LDLC as well as systolic and diastolic blood pressure. This observation is supported by randomized controlled trials directly comparing non-fermented whole milk to non-fermented skim milk that suggest adverse effects of whole milk, compared to skim milk, on total cholesterol and LDLC [24,25]. Further, skim but not whole milk has been shown to exhibit antihypertensive properties [26,27]. It is not yet clear whether dairy fat intake increases cardiovascular disease risk [28]. Despite this, results of our exposome analysis support the 2006 American Heart Association to Diet and Lifestyle Recommendations and the 2015-2020 Dietary Guidelines for Americans, which both encourage adults to select milk products that are either fat-free or low in fat rather than whole milk products [29,30]. The finding that individual components of the exposome are both numerically and biologically intertwined highlights the urgent need to implement analytic techniques that holistically examine the complex networks of interrelationships within and between observed (measured) variables. This was achieved through the representation of unmeasurable or unmeasured theoretical constructs as well as parameter-specific measurement error in order to draw robust generalizations regarding the complex interactions between the many exposome components and human physiology.
The use of latent factors to describe interrelationships between individual exposome and physiology components was able to shed light on hypothesized relationships. For example, the factor describing fruit consumption was inversely associated with the factor describing concentrations of total cholesterol and LDLC. This is supported by (i) our previous findings from the National Heart, Lung, and Blood Institute Family Heart Study, which found that consumption of fruit and vegetables was inversely related to LDLC in both men and women [31] and (ii) results from other cohort studies and randomized controlled trials [32,33]. Although the benefits of fruit consumption on cholesterol concentrations are not conclusive, with some studies showing no benefit of fruit consumption [34], the high fiber content of fruit has been attributed to its cholesterol-lowering capacity [35,36]. In this study, peaches, oranges, and apples contributed to the fruit factor. Apples have been shown to increase the clearance of plasma cholesterol by enhancing the fecal excretion of bile acids and cholesterol [37], and the peel of peaches has been shown to lower total cholesterol and LDLC in rats fed a high-sucrose diet. Further, the polyphenols in apples have been shown to have beneficial effects on cholesterol metabolism [38][39][40][41], as too have the pectins of apples and oranges [42]. A randomized controlled trial testing the effect of the combination of peaches, oranges, and apples on serum lipid profile is needed in order to ascertain causality of this observed association. It is important to note that there were cases where hypothesized relationships were not observed. For example, despite vegetables being a rich source of dietary fiber and higher consumption of vegetables being associated with lower risk of all-cause mortality and cardiovascular mortality [43], the vegetable consumption factors in this study were not significantly associated with the factor describing total cholesterol and LDLC. The absence of confirmatory findings regarding vegetables in this study may be explained by the absence of data on intake of nutrients, such as fiber, which can summarize contributions from many different foods that are biologically important. Another reason may be that the measurement error may be lower in fruits as opposed to vegetables. However, another interpretation may be that, after controlling for other exposome components, vegetables are not associated with total cholesterol and LDLC concentrations in this cohort. Further studies using longitudinal data are needed to confirm these findings.
Although the methods implemented were well-tested and effective, it is important to note that diet was self-reported, health outcomes were captured through electronic medical records, there was a lack of data on medication adherence, and there were a limited number of women and non-whites in this U.S. Veteran cohort. Additionally, causality of observed relationships could not be established due to the observational nature of the study. Nevertheless, it is important to note that the exposome was measured at least one year prior to any of the physiologic variables being assessed, which, although not ruling it out, does reduce the likelihood and impact of reverse causation. An additional factor to consider when interpreting the results is the possibility of false-positive findings, which was reduced through implementation of the conservative Bonferroni correction [44]. Furthermore, although residual or unmeasured confounders cannot be ruled out, the common exploratory methods implemented in this study do aim to represent unmeasured and/or immeasurable variables through the creation of latent constructs. This analytic approach also enabled us to model the measurement error inherent when using selfreported exposome assessments as well as collations from electronic medical records, even when there are some exposome variables, such as environmental variables, that are not measured directly. By conducting the analysis in both a training and a validation dataset, we were able to demonstrate the utility of our analytic strategy for use in the increasingly prevalent type of cohort that has extensive questionnaire-based assessments of the exposome as well as markers of human physiology derived from electronic medical records. However, further replication in separate datasets is warranted.
It is becoming increasingly recognized that studies of the complete exposome are more biologically representative than fragmented models based on subsets of factors. There is no more clear example of this than the position of the Academy of Nutrition and Dietet-ics, which plainly states that the "total diet or overall pattern of food eaten is the most important focus of healthy eating" [45]. In recognition of this, as opposed to focusing on individual nutrient recommendations, the Dietary Guidelines for Americans highlight key elements of healthy eating patterns [30]. Some patterns, such as the Mediterranean Dietary Pattern, are based on a priori knowledge of how individual dietary components influence human health, and are often represented by a pattern score that reflects relative adherence to the dietary pattern, as is the case for the Healthy Eating Index [46,47]. Although hypothesis-driven, there is no general consensus in the scientific or clinical community as to what is the ideal dietary pattern for optimal health [48]. Importantly, these dietary patterns typically reflect only a select group of dietary components, and not necessarily the diet as a whole [46,47]. Given that the aim of the present paper was to identify the importance of the entire exposome in influencing cardiometabolic health, we chose to adopt a data-driven approach which allowed us to identify existing patterns in the population, and how individual foods, lifestyle factors, and demographic features related to each other at a population level. In this study, the measured exposome variables were represented by 19 common factors that were created independently from the physiological variables. We observed heterogeneous patterns of association of exposome constructs with each of the physiological constructs, which represented measured biomarkers that are important indicators of cardiometabolic health. As such, even though the exposome overall contributed to cardiometabolic health, each facet of the exposome had differential implications for different aspects of cardiometabolic health. It is evident that the overall complex exposome that individuals are exposed to needs to be studied more in order to more fully understand what is contributing to cardiometabolic risk profiles in communities of free-living individuals. To be more comprehensive, a logical extension to this current work would be to incorporate more environmental exposome variables, such as air pollution and access to green space, into the current models.
In conclusion, we observed a complex pattern of associations between the exposome and markers of cardiometabolic health. Given that we are increasingly recognizing the potential of the exposome as a whole to have far-reaching health implications beyond the effects associated with the sum of its parts, it is more important than ever that we make available analytic tools and approaches that are capable of dealing with this directly. It should be noted that the analytic strategy implemented in this paper could be applied to address a range of research questions that utilize data from questionnaire and electronic health record data, thus bringing us one step closer to understanding how the exposome, as a whole, impacts human health.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/nu13041364/s1, Table S1: Number of participants excluded based on use of antilipemic, antihypertensive and/or hypoglycemic medications during either the exposure-assessment or outcomeassessment periods, Table S2: Additional baseline characteristics of all Million Veterans Program participants included in this study, Figure S1: Schematic overview of the training and validation datasets, Figure S2: Rotated factor pattern based on tetrachoric and polychoric, common exploratory factor analysis of measured exposure variables in the training dataset; limited to observed (measured) variables that did not have a standardized regression coefficient ≥ 0.5 for at least one latent construct.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the VA Central IRB [13] (protocol code MVP001 approved in 2010).

Informed Consent Statement:
Written informed consent has been obtained from the participants in accordance with all VA policies and under the authority of the VA Central IRB [13].
Data Availability Statement: Data described in the article, code book, and analytic code will not be made available to other researchers for purposes of reproducing the results or replicating the procedure, in order to comply with current VA privacy regulations pursuant to the US Department of Veterans Administration policies on compliance with the confidentiality of US veterans' data.   • 2500 Overlook Terrace, Madison, WI 53705, USA