3. Discussion
In this yearlong diet validation study with repeated measures of diet using both FFQ and 24HRs and two measures of fasting plasma metabolic profiles approximately 6 months apart, we replicated many food–metabolite associations that were found in other studies, and identified several potentially novel food biomarkers. More associations were found via FFQ than via 24HRs. Reproducibility of the 238 identified metabolites was acceptable for a large proportion, with 38% of metabolites with an ICC > 0.6. Our findings contribute to the literature on food-based biomarkers and provide important information on the reproducibility of the biomarkers which could facilitate their utilization in future nutritional epidemiological studies.
Generally, we identified more food–metabolite associations using the FFQ than using the 24HRs. Additionally, the biomarker AUCs were higher in general using the FFQ than using the 24HRs. In other words, the identified biomarkers predict dietary intake assessed via the FFQ better than that via the 24HRs. Even though the repeated measurements using 24HRs are considered a superior method of assessing the true intake in the validation study, the FFQ is designed to capture usual food intake in the past 12 months. That metabolites correlated better with the FFQ than the average 24HRs may indicate that the biomarkers reflect a long-term status of dietary intake. We observed a greater number of associations in the current study than in our previous study in the CPS-II Nutrition Cohort [
8], probably because in the CPS-3, the FFQ was collected in closer proximity to blood draw (as part of the validation study), and using an average of two blood samples likely better captured usual metabolite levels during the year.
We replicated five metabolites that had been correlated with total citrus fruits and juices or orange juice in the CPS-II Nutrition Cohort [
8]. Stachydrine—the strongest biomarker of total citrus fruits and juices—was first identified in an acute feeding study [
18] and then validated as a biomarker of habitual citrus fruit intake in several cross-sectional datasets [
9,
10,
11,
12,
13,
19,
20,
21] including our previous metabolomics study in the CPS-II Nutrition Cohort [
8]. Among the food biomarkers we identified in the CPS-3 DAS but not in CPS-II Nutrition Cohort, 4-allylphenol sulfate that is associated with apple/pear and blueberry intake is a nonspecific microbial metabolite of polyphenols [
22], and has been reported as a biomarker of pears in a randomized trial [
23]. Among the 75 vegetable–metabolite associations, 14 were found in the CPS-II Nutrition Cohort [
8]. Notably, we replicated ergothioneine as a putative biomarker of mushroom intake, and several metabolites such as alliin, N-acetylalliin and S-allylcysteine as biomarkers of garlic intake. We previously found S-methylcysteine sulfoxide as a biomarker of cruciferous vegetable intake [
8] which was also reported in the Prostate, Lung, Colorectal and Ovarian (PLCO) cohort [
24]. In the present study, we found S-methylcysteine, the biological precursor of S-methylcysteine sulfoxide to be associated with cruciferous vegetable intake. Among the food–metabolite associations not found in the CPS-II Nutrition Cohort, S-methylcysteine and pipecolate were reported as useful dry bean biomarkers in both human and mouse studies [
25]; genistein sulfate and 4-ethylphenyl sulfate are biomarkers for soy product intake. 4-ethylphenyl sulfate is a uremic toxin produced by gut bacteria, and its association with soymilk has been reported in a cohort of female twins [
9].
We identified several new biomarkers for whole grain products such as 2,6-dihydroxybenzoic acid, 2-aminophenol sulfate and 2-acetamidophenol sulfate compared with our previous study in the CPS-II Nutrition Cohort [
8]. 2,6-dihydroxybenzoic acid is a phenolic acid, also known as γ-resorcylic acid, which was identified as a marker for a high dietary fiber intake in an intervention study [
26]. It is possible that 2,6-dihydroxybenzoic acid was derived from alkylresorcinols or lignans through a speculated microbial enzyme not yet identified in humans [
26]. 2-acetamidophenol sulfate (HPAA sulfate) and 2-aminophenol sulfate are benzoxazinoid metabolites that were previously found as biomarkers of whole grain intake in urine [
27]. 2-aminophenol sulfate was also found to be elevated in plasma after high dietary fiber intake [
26].
Most of the metabolites associated with egg, meat and poultry intake are amino acids and lipids, especially plasmalogens. Two novel biomarkers of red meat are xenobiotics 3-bromo-5-chloro-2,6-dihydroxybenzoic acid and 3,5-dichloro-2,6-dihydroxybenzoic acid, which were also correlated with milk intake in the present study. We replicated three metabolites that have been associated with habitual consumption of fish and shellfish in our previous study [
8], including 3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF), hydroxy-CMPF (previously known as X-02269) and docosahexaenoate (DHA; 22:6 n3). The most predictive metabolite hydroxy-CMPF (X-02269) for total fish was also reported in the TwinsUK cohort [
9] and a US cohort [
13]. Among the five metabolites that correlated with nut intake, tryptophan betaine and 4-vinylphenol sulfate were also reported in similar cross-sectional studies [
8,
10,
12,
13].
In our previous study [
8], ethyl glucuronide was the most predictive metabolite of all types of alcohol and is metabolized directly from ethanol in the liver by UDP-glucuronosyltransferases [
28]. In the present study, the most predictive metabolite of alcohol was ethyl alpha-glucopyranoside (previously known as X-24293), which is a glycoside found in Japanese rice wine and might be used as a functional food or cosmetic material [
29]. For wine consumption (total and red but not white wine), we replicated the potential biomarker 2,3-dihydroxyisovalerate, an intermediate metabolite produced by yeast during wine fermentation [
30]. We replicated 26 metabolites as biomarkers of total coffee intake [
8], including quinate, the highly predictive unknown metabolite X-21442, several caffeine metabolites (e.g., 1-methylxanthine, 1,3-dimethylurate, 1,7-dimethylurate, 1,3,7-trimethylurate) and other metabolites. Chlorogenic acid, an abundant natural polyphenol, is found in high concentration in coffee. During the roasting process, chlorogenic acid is broken down to quinate and caffeic acid. In both the CPS-II Nutrition Cohort and CPS-3 DAS, quinate was among the top predictive biomarkers of caffeinated and decaffeinated coffee. Previous animal studies showed chlorogenic acid and related compounds exert antiviral [
31] and anticarcinogenic effects [
32,
33]. Future human studies need to investigate these biomarkers with disease outcomes directly or through mediation analyses. For tea consumption, we replicated that theanine was the most predictive biomarker for total tea, green tea and black tea consumption.
As discussed above, our studies (both in CPS-3 DAS and our prior research in the CPS-II Nutrition Cohort [
8]) and others have identified many biologically plausible, putative food biomarkers using metabolomics, which highlights the importance of this technology in identifying dietary biomarkers. Moving forward, more research is needed to determine the use of these putative biomarkers in diet assessment. One important step is to develop calibration equations in controlled feeding studies, so that the biomarkers may be used to correct self-reported dietary intake [
1]. Urinary recovery biomarkers have been used to calibrate energy and protein intakes and showed improved diet–disease associations compared with uncalibrated data [
34]. Lampe et al. also evaluated blood concentration biomarkers in a feeding study of postmenopausal women and suggested that they perform as well as recovery biomarkers and, therefore, can be used to correct self-reported dietary intake data in future studies [
35]. Cross-sectional studies such as the present study provide important information as one could examine multiple foods simultaneously and determine if a metabolite is correlated with multiple foods. Among the identified metabolites, many may not be optimal food biomarkers if they are not specific to certain foods or if they are synthesized endogenously, because their levels will be influenced by other characteristics.
One concern of using the metabolomic biomarkers in epidemiological studies is that one-time measurement is subject to short-term variation and may not represent long-term status. Large within-person variation compared to between person variation in metabolite levels can contribute to measurement errors that would result in underestimated disease risk estimates. An ICC, the ratio of between-person variance to total variance, is a good indicator of metabolite reproducibility. High ICCs indicate large between-person variation relative to the total variation, such as biomarkers for fish, milk, meat and coffee. Low ICCs indicate large within-person variation relative to the total variation. However, a low ICC does not necessarily exclude the metabolite from being used as a dietary biomarker in all circumstances. The low ICCs observed in the present study could be due to the infrequency of consumption of certain foods e.g., soy products, and could also be due to the seasonal variation in consumptions of certain fruits and vegetables, as one of the purposes of the CPS-3 DAS was to capture seasonal variation in blood biomarkers by collecting the samples six months apart. If collected a year apart, we would expect to see higher ICCs for many biomarkers of the foods that are consumed seasonally. A few previous studies examined the reproducibility of metabolites over a period, although did not focus on diet related biomarkers [
36,
37]. Floegel et al. [
36] investigated the ICCs of 163 fasting serum metabolites over a 4-month period and found that the median ICC was 0.57 (vs. median ICC of 0.56 over six months in the present study). Carayol et al. [
37] found a median ICC of 0.70 among 158 metabolites measured in fasting plasma samples over a 2-year period. They also found that the ICCs were higher for metabolites measured in fasting samples than in nonfasting samples, although Sampson et al. [
17] found that fasting is not a major source of variation in metabolite levels in population studies. Therefore, one-time measurement is likely sufficient for many of the metabolites with high reproducibility.
The present study has several strengths. Its large sample size and comprehensive dietary and metabolomic data allowed us to explore a large number of diet–metabolite associations simultaneously which is more efficient than feeding studies and can provide information on the specificity of the biomarkers. Furthermore, the repeated measurements of blood samples enabled us to test biomarker reproducibility over 6 months. Our findings confirmed many previously identified food biomarkers and identified new metabolites for further testing. Reproducibility of food-based biomarkers is largely unknown in the field but very important to inform the application of such biomarkers in etiologic analyses. Large within-person variation in the biomarker over time is a major source of measurement error that could lead to underestimated diet–disease associations and inconsistent findings. Additional feeding studies are needed to test the dose–response relationships between food intake and the identified biomarkers to further confirm their validity for future use.