Sources of Variation in Food-Related Metabolites during Pregnancy

The extent to which variation in food-related metabolites are attributable to non-dietary factors remains unclear, which may explain inconsistent food-metabolite associations observed in population studies. This study examined the association between non-dietary factors and the serum concentrations of food-related biomarkers and quantified the amount of variability in metabolite concentrations explained by non-dietary factors. Pregnant women (n = 600) from two Canadian birth cohorts completed a validated semi-quantitative food frequency questionnaire, and serum metabolites were measured by multisegment injection-capillary electrophoresis-mass spectrometry. Hierarchical linear modelling and principal component partial R-square (PC-PR2) were used for data analysis. For proline betaine and DHA (mainly exogenous), citrus foods and fish/fish oil intake, respectively, explained the highest proportion of variability relative to non-dietary factors. The unique contribution of dietary factors was similar (15:0, 17:0, hippuric acid, TMAO) or lower (14:0, tryptophan betaine, 3-methylhistidine, carnitine) compared to non-dietary factors (i.e., ethnicity, maternal age, gestational age, pre-pregnancy BMI, physical activity, and smoking) for metabolites that can either be produced endogenously, biotransformed by gut microbiota, and/or derived from multiple food sources. The results emphasize the importance of adjusting for non-dietary factors in future analyses to improve the accuracy and precision of the measures of food intake and their associations with health and disease.


Introduction
The accurate assessment of dietary intake remains a major challenge in human nutrition research due to the complex nature of food exposure and the reliance on self-reporting, which often leads to biased or unreliable measures of food intake. While most studies use self-reported dietary intake methods, such as food frequency questionnaires (FFQ), 24-h dietary recalls, and food records, they may be subject to recall, misclassification, and measurement biases [1]. To circumvent this problem, metabolomics-the global analysis of low molecular weight metabolites in biological samples-have been increasingly applied in large-scale epidemiological studies for the discovery and validation of food intake biomarkers [2].
Biomarkers can provide a more objective assessment of food exposures than selfreported dietary intake because they account for nutrient bioavailability and metabolism. An ideal biomarker of food intake is one that can be readily measured in human biofluid (blood or urine) at the population level, is highly specific for one food item or food group, shows a dose-and time-dependent response, and is not extensively transformed by the microbiota and host tissue upon consumption. However, complex interpretative challenges exist since nutrients are derived from various food sources and can display intercorrelation between other metabolic processes [3]. Furthermore, the human metabolome exhibits variability due to intrinsic physiologic characteristics, such as age, sex, hormonal levels, and the gut microbiome, as well as due to extrinsic factors, such as habitual diet and lifestyle. Further, many putative biomarkers of food intake do not exclusively originate from a single food or nutrient. For example, trimethylamine N-oxide (TMAO) is formed from a TMA-containing nutrient, such as choline, which is abundant in fish, beef, and eggs, but can also be produced from carnitine in red meat [2,4]. Moreover, many of the gutmicrobiome-dependent metabolites and other food-specific metabolites are metabolized in the liver at different rates, depending on hepatic enzyme activity [5], which may contribute to the greater variability observed in the range of metabolite measured in the biological samples [6]. Consequently, it is important to identify potential non-dietary sources of food-related biomarkers and examine the extent to which these factors explain differences in metabolite concentration.
In most cases, food intake explains a relatively small proportion (R 2 < 10%) of the total variation in a given metabolite concentration, and other determinants are typically unknown, unmeasured, or, if measured, the extent of the measurement error is not clear [7]. Biomarkers derived from food intake and gut microbiota are influenced by non-dietary factors [8,9]; however, the extent to which these factors compromise the validity of the metabolite as a food intake biomarker may depend on the specificity of the biomarker (wellestablished, uncertain, or weak biomarker of the particular food), whether the biomarker is endogenously produced, biotransformed by gut microbiota, and/or derived from more than one food source. Understanding the sources of variation in biomarkers of food intake that are not attributed to changes in food intake are critical to advancing the application/field of food intake biomarkers. If the sources of the variation are not clearly understood, then using these biomarkers as markers of food/nutrient intake may simply exchange one source of measurement error (self-misreport) for others (changes in the biomarker intake unrelated to changes in food intake).
Carefully designed studies examining the association between non-dietary factors and biomarker concentrations are sparse and especially lacking in women during pregnancy. Observational studies, specifically birth cohort studies, are useful designs to learn about pregnancy exposures and birth outcomes [10]. Women experience a series of metabolic modifications during pregnancy, likely affected by pre-pregnancy and intrapartum factors, which in turn may affect maternal health and disease at the critical stages of fetal development [11,12]. Moreover, metabolite concentrations during gestation and pre-pregnancy, and pregnancy-related factors, such as GDM, also differ between ethnic groups (e.g., White Europeans and South Asians) [9]. The purpose of this study was to examine the associations of non-dietary factors, including demographics, lifestyle, and pregnancy-related factors with serum metabolite concentrations using a panel of commonly identified biomarkers derived from food intake and/or gut microbiota, including proline betaine, five fatty acids (even-chain saturated fatty acids (SFA) myristic acid (14:0); odd-chain SFA pentadecanoic acid (15:0) and heptadecanoic acid (17:0); omega-3 polyunsaturated fatty acids (ω-3 PUFA), docosahexaenoic acid (DHA) and eicosapentaenoic acid (EPA); hippuric acid; TMAO; 3-methylhistidine; carnitine; and tryptophan betaine, in pregnant women of two ethnically diverse groups, and to determine the extent to which non-dietary factors explain the variability in the concentrations of the putative biomarkers of food intake.

Data Source and Participants
This study used data from two longitudinal Canadian birth cohorts of pregnant women: the Family Atherosclerosis Monitoring In earLY life (FAMILY) study and SouTh Asian biRth cohorT (START). The FAMILY study included White European women and the START cohort included women of South Asian ethnic background. Design and methodology of these two studies have been described in detail elsewhere [13,14]. Briefly, the FAMILY study was designed to understand the environmental, genetic, and biochemical factors important in the development of obesity and cardiovascular disease risk factors in childhood. A total of 857 families (901 newborns) were recruited between 2002 and 2009 in the Hamilton area, Ontario, Canada. Women were recruited between 24 and 36 weeks of gestation. The START study enrolled 1012 South Asian (people who originate from the Indian sub-continent: India, Pakistan, Sri Lanka, or Bangladesh) mother-child pairs between 2011 and 2015 from the Peel Region of Ontario to investigate the influence of diverse environmental exposures and genetics on early life adiposity, growth trajectory, and cardiometabolic risk. The ancestral origin of both the woman, her partner, and both offspring's grandparents were required to be classified as South Asian.
All enrolled participants provided full informed consent, and both studies obtained ethics approval from the McMaster Hamilton Integrated Research Ethics Board [START (HiREB #10-640) and FAMILY (HiREB #02-060)].
Clinical and demographic data was harmonized across the two cohorts. When questions were not identical between studies (e.g., physical activity level during pregnancy), comparable categories were constructed with the available data to satisfy the same definition. Within each cohort, 300 pregnant women were randomly selected for serum metabolomics analysis as previously described [15]. This selection was based on the contrasting diet quality score (DQS), where 100 mothers were randomly selected from the 3 DQS groups (>90th percentile ["high" diet quality], <10th percentile ["low" diet quality], and between 10th and 90th percentile ["intermediate" diet quality]). A total of 600 pregnant mothers were included in the current analysis (Supplementary Figure S1).

Maternal Serum Metabolome Analyses
A validated multiplexed separation platform based on multisegment injection-capillary electrophoresis-mass spectrometry (MSI-CE-MS) was used for targeted and nontargeted profiling of polar/ionic metabolites measured consistently in serum filtrate samples with stringent quality control (QC). A standardized method protocol was used for the identification and quantification of the maternal serum metabolome, as described in more detail elsewhere [15]. Briefly, a total of 66 and 67 polar ionic metabolites from serum filtrate samples satisfied selection criteria for their analysis in the FAMILY and START cohorts, respectively, and 53 of these were measured consistently across both cohorts. Serum metabolites were reported only if they satisfied two additional criteria: (1) metabolites that were detected in the majority of the individual samples (≥75%) in a cohort (i.e., frequency filter) and (2) acceptable technical precision based on the repeated analysis of QC samples (i.e., QC filter) to reduce false discoveries and data overfitting. Metabolites with nondetectable or missing values were replaced with half of the lowest detected value for the compound in each cohort. Moreover, a QC-based batch correction algorithm was applied for the robust correction of long-term monitoring of signal drift in MSI-CE-MS [16]. Among metabolites measured consistently in the two cohorts, six metabolites, including proline betaine, 3-methylhistidine, hippuric acid, TMAO, carnitine, and tryptophan betaine, were selected for our current analysis, as they were previously determined to be associated with the self-report of dietary intake [2,17]. Further, they offer a combination of evidence (good, fair, or poor) for candidate biomarkers of food intake that are produced exogenously, endogenously, biotransformed by gut microbiota, and/or derived from more than one food source [2,17,18]. The reference interval for these serum metabolites in different birth cohorts from across Canada, their technical/biological variance, and interclass correlation coefficients have been reported previously [15].
Non-esterified fatty acid (NEFAs) from serum ether extracts were analyzed in the FAM-ILY cohort using a validated protocol based on MSI-NACE-MS, which offers a multiplexed separation platform for rapid NEFA analysis on an Agilent 6230 TOF mass spectrometer with a coaxial sheath liquid ESI ionization source equipped with an Agilent G7100A capillary electrophoresis (CE) (Agilent Technologies Inc., Santa Clara, CA, USA) [19]. Serum extracts were injected with alternating background electrolyte spacer plugs for a total of seven discrete samples analyzed within 30 min for a single run. Repeat QC samples introduced in a randomized position for each MSI-NACE-MS run were analyzed for NEFA confirming technical precision (mean CV = 15%, n = 46). Serum non-esterified 14:0, 15:0, 17:0, DHA, and EPA were reliably quantified and reported as relative proportions as a percentage by moles of a total quantified of 19 NEFAs (mol%) [20].

Assessment of Dietary Intake
Maternal dietary intake during pregnancy was collected at 24-28 weeks gestation. Semi-quantitative validated food-frequency questionnaires (157 items in the FAMILY and 163-items in the START) developed and validated as part of the Study of Health and Risk in Ethnic Groups (SHARE) Study were used [21][22][23]. Participants were asked to report on the frequency (daily, weekly, monthly, yearly, or never) and amount in serving size of each food or food group on average in the past 12 months. For our study, food items were either used as separate items (chicken, canned fish, fried fish) or classified into main food groups: citrus food (citrus fruit and citrus juice), red meat, eggs (boiled and fried eggs), seafood, nuts and legumes, and fruits and vegetables. Nutrient intake was calculated using the ESHA Food Processor Nutrient Analysis Software (ESHA Research, version 6.11, 1996, Salem, OR, USA), derived from the 1991 Canadian Nutrient File and the US Department of Agriculture nutrient food composition databases. Fiber intake and total energy intake was also estimated using the FFQ [17,21]. Data were logarithm-transformed to correct for skewness prior to including them in the regression analysis, and nutrient intakes were adjusted for energy intake using the residual approach [24].

Non-Dietary Factors
Non-dietary factors included ethnicity (White European or South Asian), maternal age (years), gestational age (i.e., weeks of pregnancy), parity, pre-pregnancy body mass index (BMI, kg/m 2 ), smoking history (current or former smoker and never smoker), physical activity (mainly sedentary, mild activity, moderate activity, and strenuous activity), social disadvantage index (SDI), and gestational diabetes (GDM). For SDI, derived using a previously validated index based on employment status, income, and marital status, higher values indicate greater socioeconomic disadvantage [25]. A case of GDM was defined based on the Born in Bradford (BiB) oral glucose tolerance test criteria, self-reported GDM, and insulin use in pregnancy in START cohort, whereas the International Association of the Diabetes and Pregnancy Study Groups (IADPSG) criteria (75 g OGTT with fasting glucose ≥ 5.1 mmol/L, 1 h ≥ 10.0 mmol/L, 2 h ≥ 8.5 mmol/L) was used in the FAMILY cohort. We selected these factors based on the known and plausible associations with the selected metabolites and/or the fact that they are commonly adjusted in population-based nutritional metabolomics studies.

Statistical Analysis
Descriptive statistics for categorical variables were summarized using frequency and percentages, and continuous data were summarized using mean and standard deviation (SD) or median and interquartile range (IQR). Random-effects hierarchical linear models (HLM) were fit, whereby each of the natural logarithm-transformed food-metabolite concentration was regressed on dietary and non-dietary factors after adjusting for other covariates, including total energy intake (kcal), total fiber intake (g/day), and period of time between the day FFQ information was collected and blood was drawn (FFQ before blood, FFQ after blood, and both taken on the same day) [26].
The data had a nested (clustered) structure, where individuals within the same cohort represented a cluster because they were more similar to one another with regards to dietary and non-dietary factors. Therefore, we used hierarchical linear modeling (HLM) to accommodate the dependent nature of observations in clustered data. HLM allows nesting effects to be incorporated into the model, producing more accurate estimates and corrects for the error structure violations (non-independent errors) to provide robust conclusions [27,28]. First, unconditional (intercept-only) HLM models were tested to determine whether serum metabolite concentrations were nested within cohort using an intraclass correlation (ICC) calculated based on the covariance parameter estimates. An ICC refers to the amount of variation attributed to level-two (study-level) factor. An ICC can be determined from an intercept-only model and any relationship with an ICC of 2% or greater suggests the presence of level 2 effects [26]. The results showed an ICC of 3.9% for proline betaine, 25.6% for 3-methylhistidine, 1.5% for carnitine, 0% for hippuric acid, 46.0% for tryptophan betaine, and 7.0% for TMAO. A sensitivity analysis using an ordinary least squares (OLS) multivariable linear regression was conducted for carnitine and hippuric acid (Supplementary Table S1). Next, the association between dietary factors as level 1 predictors (fixed), previously shown to be associated with a specific metabolite (e.g., citrus fruit and proline betaine), was examined. Finally, in addition to the dietary factors, all non-dietary factors were also added as level 1 predictors. These HLM procedures produced the following three models:

Random Intercept (u0j) with Fixed Level 1 Factors (Dietary (γ10) and Non-dietary factors (γ20 . . . ))
The goodness-of-fit statistics, including the Akaike Information Criterion [AIC], Bayesian Information Criterion [BIC] and the change in deviance statistics, were used to evaluate model fit in terms of the clustering variable. Smaller values of these statistics indicate a better model fit [29]. The AIC and BIC consider error and model parsimony simultaneously. An OLS multivariable linear regression was conducted for NEFAs as these data were only available in the FAMILY cohort. Regression estimates of (b) 95% confidence intervals (95% CI) and p-values were reported, and statistical analysis was conducted using SAS software version 9.4.
Finally, principal component partial R-square (PC-PR2) analysis was used to quantify the sources of systematic variability in serum metabolite concentrations [30]. The PC-PR2 method combines features of principal component analysis (PCA) and the partial R-square statistic in multivariable linear regression and allows for some degree of intercorrelation between explanatory variables. The mathematical details of the PC-PR2 method are described elsewhere [30]. A data reduction component was not necessary because the analytic strategy was applied to a single metabolite. The partial R 2 statistic was calculated for each explanatory variable, which quantifies the amount of variability in metabolite explained by that variable, conditional on all other covariates included in the model. The PC-PR2 method was conducted using the R software, version 1.2.5.

Association of Dietary and Non-Dietary Factors with Food-Related Metabolites
The descriptive characteristics of the participants overall and by ethnicity are shown in Table 1. Model fit statistics from the HLM examining the dietary and non-dietary factors associated with food-intake biomarkers are presented in Supplementary Table  S2, and the regression estimates and 95% CI are presented in Table 2 Table S2). For each metabolite outcome, the log likelihood, AIC, and BIC statistics decreased considerably after adding the non-dietary covariates, indicating better model fit. Thus, the regression estimates presented in Table 2 are based on Model 3. As expected, most of the dietary food sources were significantly associated with their respective metabolite concentrations, except for carnitine (p > 0.05) ( Table 2). For exogenous metabolites specific to a single food source, higher citrus food intake was positively associated with proline betaine concentration (b: 0.27; 95% CI: 0.20, 0.34), and a higher intake of nuts and legumes was positively associated with tryptophan betaine concentration (b: 0.02; 95% CI: 0.00, 0.03). For metabolites with both endogenous metabolic and exogenous sources and obtained from multiple food sources, such as hippuric acid, higher intake of fruits and vegetables were associated with higher hippuric acid concentration (b: 0.22; 95% CI: 0.08, 0.36), but no such association was found with tea and coffee intake. Higher intake of chicken (b: 0.02; 95% CI: 0.00, 0.04) and red meat (b: 0.03; 95% CI: 0.01, 0.06) were positively associated with 3-methyl-histidine concentration, while seafood intake was positively associated with TMAO concentration (b: 0.08; 95% CI: 0.04, 0.12) ( Table 2).
For non-dietary factors, maternal age, gestational age, and smoking history were associated with the serum concentration of some metabolites after adjusting for diet-related factors ( Table 2). Higher maternal age was associated with a higher concentration of proline betaine (b: 0.04; 95% CI: 0.01, 0.07) and TMAO (b: 0.02; 95% CI: 0.00, 0.04), and higher gestational age of pregnancy was associated with a higher concentration of 3-methylhistidine (b: 0.01; 95% CI: 0.00, 0.02) and lower concentration of carnitine (b: −0.01; 95% CI: −0.02, −0.01). Participants who indicated having ever smoked cigarettes had a lower concentration of proline betaine (b: −0.60; 95% CI: −0.95, −0.25) and a higher concentration of carnitine (b: 0.06; 95% CI: 0.02, 0.10) compared to those who never smoked cigarettes (Table 2). Parity, GDM, pre-pregnancy BMI, physical activity, SDI, and the timing of the administration of the FFQ (before or after the blood draw relative to at the same time as the blood draw) were found to not be associated with any of the six metabolite concentration outcomes. The results for the HLM models examining the association of dietary and nondietary factors with food-related metabolites stratified by ethnicity (White European and South Asians) are presented in Supplementary Tables S3 and S4, respectively. The results between the two cohorts were generally similar to those reported for the overall sample.

Results from PC-PR2 Analysis
PC-PR2 analysis was utilized to quantify the sources of systematic variability in serum metabolite concentrations, and the results for the overall sample are displayed in Figures 1 and 2, and results stratified by cohort are displayed in Supplementary Figures  S2-S7. For largely exogenous metabolites, such as proline betaine, hippuric acid, and tryptophan betaine, dietary food intake explained a greater proportion of variability in the metabolite than non-dietary factors. Citrus fruit intake explained the largest proportion of variation in proline betaine concentration with a R 2 partial value of 10.8%, followed by smoking history (2.5%), maternal age (1.2%), and ethnicity/cohort (1.2%) ( Figure 1A). Similarly, for hippuric acid, fruits and vegetables intake displayed the largest R 2 partial value of 2.0%, followed closely by energy intake (1.4%) ( Figure 1B). For tryptophan betaine, intake of nuts and legumes, fiber intake, and overall energy intake explained between 1.2% and 1.9% of the variability. Meanwhile, ethnicity has quite a substantial impact on tryptophan betaine levels as the R 2 partial value of cohort was 10.2% ( Figure 1C). When the model was stratified by cohort, nuts and legumes explained the most variability (3.6%) in the FAMILY cohort (primarily White European women), while fiber intake (4.2%), energy intake (2.2%), and GDM (1.5%) explained most of the variability in tryptophan betaine in the START cohort (exclusively South Asian women) (Supplementary Figure S4).
For endogenous (less food-specific) metabolites, the dietary factors explained the most variability for two of the metabolites (3-methyl-histidine and TMAO), while non-dietary factors, such as gestational age (R 2 partial value: 5.7%) and smoking history (R 2 partial value: 1.9%), appeared to play a more prominent role in explaining the variability in carnitine ( Figure 1D). This latter finding is also consistent with the results obtained from HLM showing no dietary factor was associated with carnitine concentration. Seafood intake explained the greatest proportion of variability in TMAO, with a R 2 partial value of around 3.0%, followed by maternal age (R 2 partial value: 1.2%) ( Figure 1E). For 3-methyl-histidine, red meat intake had the highest R 2 partial value of 1.2% ( Figure 1F). There was evidence of differences by ethnicity/cohort, where red meat explained 5.8% of the variability in 3-methyl-histidine in the START cohort but a negligible amount in the FAMILY cohort. Each of the remaining explanatory variables explained a negligible amount of total variation in the metabolite concentrations. Although there were some differences in findings between the two cohorts, overall, the results obtained from PC-PR2 are congruent with those obtained from the HLM analysis.

Discussion
Using data from two birth cohorts representing two ethnically diverse groups, the results showed that for exogenous biomarkers such as proline betaine and (largely) DHA, dietary factors explained higher proportion of variability whereas the contribution of nondietary factors was relatively little. On the contrary, for metabolites that can either be produced endogenously, biotransformed by gut microbiota, and/or derived from more than one food source, the unique contribution of dietary factors was similar (15:0, 17:0, hippuric acid, and TMAO) or lower (14:0, tryptophan betaine, 3-methylhistidine, and carnitine) compared to non-dietary factors (ethnicity, maternal age, gestational age, prepregnancy BMI, physical activity, and smoking history). Further, there was an ethnicity effect for all metabolites, except carnitine and hippuric acid.
For the non-dietary factors, higher maternal age was positively associated and ever having smoked was inversely associated with proline betaine concentrations after adjusting for citrus foods. Evidence indicates that older women are more likely to make healthier choices including increasing their consumption of fruits and vegetables from pre-pregnancy to pregnancy compared to younger women [31,32]. Many studies have also shown that smokers have lower concentrations of antioxidants and elevated concentration of 8-isoprostane [33,34], which may be due to low consumption of antioxidants [35], reduced vitamin C absorption, or decreased turnover of vitamin C by free radicals produced from smoking [36]. Proline betaine (stachydrine), a marker of citrus foods, which are rich in vitamin C (potent water-soluble antioxidant), has been shown to inhibit cell proliferation and production of reactive oxygen species in in vitro and in vivo studies [37,38]. As expected, higher citrus food intake was associated with proline betaine concentration and explained the largest proportion of variation in proline betaine concentration relative to non-dietary factors. In kinetics studies, proline betaine is excreted rapidly and nearly completely in urine within 24 h [39], and therefore it is considered to be minimally metabolized in humans. Furthermore, proline betaine was previously validated in a large-scale observational study, where it was highly sensitive (86.3%) and specific (90.6%) for citrus fruit consumption [39], and thus considered a robust biomarker for citrus food intake.
Even-chain SFA (14:0) can be derived from both exogenous sources (via dietary intake) and endogenous synthesis (via de novo lipogenesis) [40,41], whereas odd-chain SFAs (15:0 and 17:0) mainly reflect dietary intake of full-fat dairy [42], though the possible contribution of endogenous sources cannot be ruled out [43,44]. As expected, both 15:0 and 17:0 were associated with full-fat dairy intake and 14:0 was not. Full-fat dairy intake did not, however, explain the largest variance in 15:0 or 17:0 levels. Rather, non-dietary factors, including higher gestational age and pre-pregnancy BMI, were associated with lower odd chain SFA (15:0 and 17:0) and low physical activity level was associated with lower 17:0. In a previous longitudinal analysis, odd-chain SFA (sum of 15:0 and 17:0) progressively declined during pregnancy [45]. Although the exact mechanism for the gestational alterations in these SFAs remain unclear, it is possible that pregnancy associated physiologic changes and increase in adipose deposition throughout pregnancy may be important factors contributing to the observed differences [46]. In several population-based studies, higher circulating odd-chain SFAs (15:0 and 17:0) were inversely associated with obesity and cardiometabolic diseases [47,48]. ω-3 PUFAs (DHA more than EPA) have been considered robust biomarkers of habitual fish/fish oil intake [2]. This association was demonstrated for DHA in the current study where fish/fish oil intake explained the largest proportion of variation in DHA relative to non-dietary factors. Fish/fish oil daily servings explains about twice the amount of variation in ω-3 PUFAs compared with dietary fish intake, indicating that it is important to account for EPA and DHA sources from both diet and supplements.
For other metabolites, non-dietary factors were associated with metabolite concentrations, however, their overall contribution was minimal, except for carnitine which was mostly explained by gestational age. Carnitine mainly reflects the consumption of amino acids and fatty acid-containing foods and, as a result, is considered a generic marker for foods of animal origin but may also be synthesized from the essential amino acids lysine and methionine [4,49]. A decline in carnitine across trimesters during pregnancy was previously reported [9,50]. A significant rise in acylcarnitine in pregnant women as pregnancy progresses may reflect enhanced fatty acid oxidation in later periods of gestation [50]. This distribution may suggest a greater uptake of carnitine in the fatty acid β-oxidation process, leading to a lower free carnitine substrate and resulting in a lower total body carnitine pool in pregnant women [51,52].
For all metabolites, except for proline betaine and two NEFAs (15:0 and DHA), the unique contribution of food sources was similar to or lower than non-dietary factors. This may reflect endogenous production, microbial synthesis, or the multiple food sources of some of these metabolites. Interindividual variability in hippuric acid [53,54], TMAO [55,56], and tryptophan betaine [57,58] may partly be due to differences in intestinal microbiota. However, the potential variation in these metabolites attributable to the gut microbiome could not be accounted for in our study. Further, variation in an endogenous metabolite concentration such as carnitine may reflect the general intake of foods of animal origin and/or physiological changes that take place during pregnancy, and is influenced by factors such as age and health status, and thus may not be a suitable biomarker of red meat at the population level [4,49].
Metabolite concentration may also vary widely across cultures and ethnic groups as the type of food, method of consumption, and food preparation techniques may vary [59]. In our multi-level analysis, there was an ethnicity effect for all metabolites, except carnitine and hippuric acid. Proline betaine concentration was shown to vary to some extent by cohort, likely attributable to differences in citrus food intake in the two cohorts (Table 1). Additionally, some of this variability may be attributed to differences in lifestyle factors between members of the two cohorts, such as smoking status. Regardless, citrus fruit consumption still explained the largest amount of variance in proline betaine in both cohorts, suggesting that non-dietary factors do not contribute substantially to proline betaine variation (Supplementary Figure S2). However, mixed results were shown by cohort for metabolites that are synthesized or modified by gut bacteria. Tryptophan betaine concentration was shown to vary considerably between the two cohorts, with higher tryptophan betaine associated with higher nuts and legumes intake in the FAMILY cohort, and with higher fiber intake and lower kilocalories in the START cohort. A possible explanation for this discrepancy may be that nuts and legumes is a heterogeneous food group so the type of nuts and preparation/cooking methods for legumes may play an important role [60]. Further, it is also likely that the association of nuts or legumes intake with tryptophan betaine may be confounded by fiber intake in the START cohort, as fiber intake is higher in this cohort, and tryptophan betaine has been identified in fiber-rich plant-based foods and linked to gut microbiota in fiber-enriched diets [57].
Hippuric acid was one of the metabolites that did not vary by ethnic cohort but was only associated with greater fruit and vegetable intake in the FAMILY cohort despite greater intake in the START cohort. An explanation for this may be related to the metabolism of different dietary polyphenols [61]. Evidence suggests that differences in excretion of hippuric acid may reflect altered gut microbial metabolism [62]. Generally, the amount of variability in food consumption may also affect the robustness of the association. For example, the IQR for certain foods, such as chicken and red meat, were higher in FAMILY compared to START, whereas variability for other foods, such as fruits and vegetables, tea, eggs, and nuts and legumes, were higher in START compared to FAMILY. This may explain inconsistencies in the results for at least some serum metabolites, such as the association between red meat and TMAO in the START cohort.
In other comparisons, TMAO varied slightly by cohort, but this may be explained by a relatively lower consumption of meats, including red meat, canned and fried fish, and seafood, in the START cohort compared to the FAMILY cohort. Despite this, higher seafood intake was positively associated with TMAO concentration in both cohorts. Differences in TMAO production and excretion may partly be related to metabolic precursors, such as choline, betaine, and carnitine. TMAO concentration increases postprandially (within 15 min) after the consumption of fish [63], but it takes more time after consumption of meat [64], suggesting that free TMAO in seafood may be readily absorbed after fish consumption without much involvement of gut microbiota. Finally, although the association of 3-methylhistidine with chicken and red meat was significant in the overall sample, these associations were attenuated when analysis was stratified by cohort. This is likely because the intra-cohort variability was small or because intakes of these foods were highly correlated (as was the case in the START cohort).
Finally, biomarkers with 'good' evidence are considered as direct surrogates for food intake [65]. However, there are several factors, in addition to food exposure, that can influence variation in food-related metabolites concentration and thus require appropri-ate consideration during the statistical analyses of the data [66]. In line with previous research [7], in most cases, our study found that dietary factors explained less than 10% of the total variation in metabolite concentration. While some aspects of the source of the errors are explained by measurement error (self-report), others can be related to non-dietary factors. Therefore, future studies should account for non-dietary factors and differences by ethnicity to control for some of the inter-individual variation in food-related metabolites.
Our study has several strengths, including a large sample size that allowed for stratification by ethnicity, use of fasting serum samples, and comparing a diverse set of metabolites reflecting commonly consumed foods which have been previously reported in free-living population studies [2]. We adopted a novel methodological approach to address an unanswered question regarding the non-dietary sources of metabolite variation in the field of nutritional metabolomics and biomarkers of food intake. Our study also has some limitations. We included only pregnant women from white European and South Asian backgrounds, and thus the generalizability of our findings is limited to these populations. Dietary assessment was based on a self-reported FFQ and may be prone to some measurement error; however, FFQs are commonly used in nutritional epidemiology. The period of dietary assessment of 12 months may not be indicative of recent intake of foods or intake of foods only during pregnancy, but since our aim was to identify sources of variability in metabolites of foods that reflect habitual dietary intake, a 12-month intake was more appropriate. Samples were collected at one point in pregnancy and data on changes in dietary intake during pregnancy were not collected and, therefore, not available for the analysis.

Conclusions
Overall, the results emphasize that serum metabolites that reflect specific foods are also influenced by non-dietary factors (ethnicity, maternal age, gestational age, pre-pregnancy BMI, physical activity, and smoking history) but to differing degrees. The results of this study provide insight into the external factors that impact serum metabolite concentrations and provide guidance on appropriate modeling when metabolomics is used in nutritional epidemiological studies to identify diet-disease associations. Identifying robust and generalized food related biomarkers in diverse populations remains a challenge, but appropriate adjustment for non-dietary factors is necessary for an unbiased assessment of metabolite concentration. Future work will explore the role of maternal nutrition and food exposures on health outcomes later in life, such as childhood obesity and metabolic syndrome.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/nu14122503/s1, Table S1: Results from ordinary least squares regression examining the association of dietary and non-dietary factors with food-based metabolites; Table S2: Results of model fitting analyses examining the association of dietary and non-dietary factors with food metabolites; Table S3: Results from ordinary least squares regression examining the association of dietary and non-dietary factors with food-based metabolites in FAMILY cohort; Table S4: Results from ordinary least squares regression examining the association of dietary and non-dietary factors with food-based metabolites in START cohort; Table S5: Results from ordinary least squares regression examining the association of dietary and non-dietary factors with serum nonesterified fatty acid (NEFA) in FAMILY cohort; Figure S1. Consort flow diagram outlining selection criteria used in a cross-sectional study involving participants from the FAMILY and START birth cohorts; Figure S2A Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data described in the manuscript, code book, and analytic code will not be made available because participants in the FAMILY and START studies did not consent to public sharing of their data at the time of recruitment. Datasets can be made available from the corresponding author on reasonable request.
Scientific Advisory Committee on Nutrition (SACN) Subgroup on the Framework for the Evaluation of Evidence (Public Health England). The other authors declare no conflicts of interest.