Identification and Reproducibility of Plasma Metabolomic Biomarkers of Habitual Food Intake in a US Diet Validation Study

Previous metabolomic studies have identified putative blood biomarkers of dietary intake. These biomarkers need to be replicated in other populations and tested for reproducibility over time for the potential use in future epidemiological studies. We conducted a metabolomics analysis among 671 racially/ethnically diverse men and women included in a diet validation study to examine the correlation between >100 food groups/items (101 by a food frequency questionnaire (FFQ), 105 by 24-h diet recalls (24HRs)) with 1141 metabolites measured in fasting plasma sample replicates, six months apart. Diet–metabolite associations were examined by Pearson’s partial correlation analysis. Biomarker reproducibility was assessed using intraclass correlation coefficients (ICCs). A total of 677 diet–metabolite associations were identified after Bonferroni adjustment for multiple comparisons and restricting absolute correlation coefficients to greater than 0.2 (601 associations using the FFQ and 395 using 24HRs). The median ICCs of the 238 putative biomarkers was 0.56 (interquartile range 0.46–0.68). In this study, with repeated FFQs, 24HRs and plasma metabolic profiles, we identified several potentially novel food biomarkers and replicated others found in our previous study. Our findings contribute to the growing literature on food-based biomarkers and provide important information on biomarker reproducibility which could facilitate their utilization in future nutritional epidemiological studies.


Introduction
Self-reported diet assessment tools such as food frequency questionnaires (FFQs) have long been used to assess habitual diet in population studies. Such methods are subject to random and systematic measurement errors that could lead to underestimated diet-disease risk estimates and inconsistent findings in nutritional epidemiological studies [1]. Biomarkers are considered objective measures of diet and are not subject to the same measurement errors as self-reported diet, although other measurement errors may exist, and thus can complement or replace self-reported methods. Recovery dietary biomarkers can be used to estimate absolute intake (e.g., 24-h urinary nitrogen for protein intake) [2][3][4], and concentration biomarkers and predictive biomarkers can be used as stand-alone risk factors for disease outcomes, and to correct for measurement errors of a FFQ [5,6]. Although promising tools for diet assessment, the few established dietary biomarkers are primarily nutrient-based, and there is great potential and need for robust food-based biomarkers.
In recent years, metabolomics has been increasingly used to identify food-based biomarkers in human blood and urine samples [7]. It holds a great promise in nutritional epidemiology as an increasing number of food biomarkers have been identified and could be used to facilitate diet assessment in future research [1]. Several large metabolomics analyses conducted in cohort studies with biospecimens have identified biomarkers of habitual food intakes [8][9][10][11][12][13][14] or dietary patterns [15,16]. In our previous metabolomics analysis of 91 food groups and 1186 serum metabolites among 1369 nonsmoking postmenopausal women in the Cancer Prevention Study II (CPS-II) Nutrition Cohort, we identified 379 diet-metabolite associations with 199 metabolites as putative food biomarkers of 42 food groups/items (one metabolite could be biomarker of multiple food groups/items) [8]. Many of the biomarkers were previously identified in population and/or intervention studies, and thus were validated in our study (e.g., stachydrine for citrus fruit intake). Novel biomarkers with high sensitivity and specificity for the correlated food intake included alliin for garlic intake and dopamine 3-O-sulfate for banana intake. These newer biomarkers need to be replicated across diverse populations.
One concern of using these biomarkers in population studies is that one-time measurement may poorly reflect long-term status [17]. Large day-to-day variation in certain metabolite levels due to measurement and random errors could lead to underestimation of diet-disease associations if only measured once. Therefore, it is important to assess biomarker reproducibility over time to determine if one-time measurement is sufficient to capture usual exposure.
In the Diet Assessment Sub-study (DAS) from the Cancer Prevention Study-3 (CPS-3) cohort, where diet and fasting blood samples were measured twice six months apart, we aimed to (1) replicate and identify metabolites associated with individual food groups/items using untargeted metabolomic profiling, and (2) to assess the reproducibility of identified metabolites over six months.

Participant Characteristics
Characteristics of the study population are shown in Table 1. Among the 671 participants in the DAS, 60.1% were white, 24.7% were black, 15.2% were Hispanic. The majority (65.1%) were female. The mean age was 52.3 ± 9.5 years. Abbreviations: BMI, body mass index; 24HR, 24-h diet recall; FFQ, food frequency questionnaire; MET-h, metabolic equivalent hour. 1 Values are mean ± standard deviation for continuous variables, and frequency (%) for categorical variables. 2 Includes missing.
The AUCs were calculated to inform the predictive accuracy of the diet-related metabolites. The top three most predictive metabolites (according to FFQ, if less than three then according to 24HRs) for each of the 74 food groups or items are shown in Table 2. For most food groups, the most predictive metabolite also had the highest |r|.

Vegetables
We identified 75 associations for 16 vegetable groups or individual vegetables, with 53 associations for 14 groups/items from the FFQ, and 38 associations for 8 groups/items from the 24HRs. Specifically, we identified 1 metabolite for tomatoes, 3 for asparagus, 3 for beans, 19 for all soy products, 7 for fermented soy products, 5 for soy milk, 1 for soy protein powder, 8 for cruciferous vegetables, 4 for leafy greens, 1 for iceberg or head lettuce, 1 for peppers, 5 for mushrooms (24HRs only), 3 for allium vegetables, 3 for onions, 10 for garlic and 1 for garlic powder. Of these, the strongest association was seen for an unknown metabolite X-16649 with soy products assessed by the 24HRs (r = 0.37, AUC = 75%).

Grains
We identified 18 food-metabolite associations for 5 grain groups/items (4 for total whole grains, 1 for whole grain bread, 5 for whole grain cereals, 5 for corn products and 3 for refined grains), with 15 associations using FFQ, and 8 using 24HRs. An unknown metabolite X-21752 was the most predictive metabolite for total whole grains (r = 0.31, AUC = 89%) and whole grain cereals (r = 0.42, AUC = 87%) assessed using the FFQ.

Proteins
We identified 181 diet-metabolite associations for 11 protein foods (2 for egg, 31 for red meat, 30 for processed meat, 46 for poultry, 17 for total fish, 16 for dark fish, 6 for shellfish, 7 for total nuts, 12 for peanuts, 7 for other nuts and 7 for seeds); 164 associations for 11 groups/items were identified using the FFQ and 99 associations for 10 groups/items using the 24HRs. The strongest association was between X-13835 and FFQ-assessed poultry intake (r = 0.54, AUC = 85%).

Dairy/Dairy Alternatives
There were 41 diet-metabolite associations for 4 dairy/dairy alternative groups (6 for milk, 3 for almond milk or rice milk, 14 for total cheese, 18 for cream); 39 associations were found using the FFQ, and 13 using the 24HRs. The strongest association was between X-11381 and milk (r = 0.33, AUC = 84%). Almond milk or rice milk was a new line item on the CPS-3 FFQ. The only metabolite that had a positive association with almond milk (X-24475) was also associated with intake of other nuts. All the cheese-related metabolites were fatty acids and sphingomyelins. All of the 18 metabolites associated with cream intake, a majority being xenobiotics, also were associated with coffee intake, indicating that the two were commonly consumed together and these biomarkers should not be considered as specific biomarkers for cream intake.

Fats and Oils
We identified 16 associations for creamy salad dressing (n = 12), oil and vinegar salad dressing (n = 2) and olive oil (n = 2), and 15 were found using the FFQ and 9 found by the 24HRs.

Alcohol
Using either instrument, we identified 172 associations for alcohol, including 58 for total alcohol, 19 for beer, 44 for wine, 39 for red wine, 8 for white wine and 4 for liquor. Using the FFQ, 160 associations were found, and using the 24HRs, 102 associations were found. Ethyl alpha-glucopyranoside was the most predictive metabolite for total alcohol (r = 0.52, AUC = 95%) and individual types of alcohol (AUC ranging from 74% for white wine to 91% for total wine) assessed using the FFQ. Ethyl glucuronide was the second most predictive metabolite for total alcohol.

Beverages
There were 80 associations for beverages, including 33 for total coffee, 34 for caffeinated coffee, 2 for decaffeinated coffee, 4 for total tea, 1 for green tea, 3 for black tea, 2 for herbal tea and 1 for diet beverages, with 77 found from the FFQ and 71 from 24HRs. Quinate and the unknown X-21442 were the most predictive metabolites for total coffee consumption (r = 0.77, AUC = 99% and r = 0.81, AUC = 99%, respectively). The majority of metabolites correlated with total coffee and caffeinated coffee were involved in xanthine and benzoate metabolism. For tea consumption, theanine was the most predictive biomarker, slightly stronger for black tea than for green tea, and strongest for total tea (r = 0.40, AUC = 86%). Acesulfame was associated with diet beverage consumption (r = 0.42, AUC = 82%).

Miscellaneous
The remaining 43 associations were found for miscellaneous foods, 10 for French fries, 3 for ice cream, 10 for chips, 6 for chocolate candies, 7 for dark chocolate, 1 for energy/protein bars, 3 for soy sauce and 3 for artificial sweeteners. Several xanthine metabolites that were correlated with coffee intake were also correlated with chocolate intake, including theobromine, 3-methylxanthine and 7-methylxanthine. In addition to acesulfame that was correlated with diet beverages, two more metabolites-saccharin and erythritol-were associated with overall artificial sweetener intake.

Reproducibility of the Identified Food Metabolites
Of the 238 metabolites that were significantly associated with food groups/items identified via FFQ or 24HRs, the median ICC calculated using duplicate samples over six months was 0.56 (interquartile range: 0.46-0.68). By super pathway, the median ICC ranged from 0.39 for carbohydrates to 0.69 for cofactors and vitamins.
Combining information on both accuracy (AUC) and reproducibility (ICC) over time can indicate if a biomarker is reliable to be used in future epidemiological studies. The combined information on AUC and ICC for the top three metabolites of the 74 food groups/items are shown in Figure 1.
Biomarkers in the upper right corner with both high AUC and ICC are considered reliable, while those in the lower left corner with the low AUC and ICC are less reliable. AUCs obtained from 24HRs were generally lower than those from the FFQ. In the present study, such reliable biomarkers were seen for several food groups/items including fish, milk, meat, nuts, coffee, leafy greens, oranges and whole grain cereals. Biomarkers with high AUCs but low ICCs might be useful in short-term studies to monitor dietary intake compliance but may require more than one measurement to capture long-term levels.  (a) Top three predictive metabolites for food intake assessed using the food frequency questionnaire; (b) top three predictive metabolites for food intake assessed using the average of 24-h diet recalls. Prediction accuracy was assessed by area under the curve (AUC) from the receiver operating characteristic curve, which indicates how well a metabolite could discriminate top quartile from bottom quartile intake of a food group/item. Reproducibility was assessed by intraclass correlation coefficients (ICCs), calculated as the ratio of between-person variance to the total variance among participants with repeated blood metabolic profiles measured six months apart.

Discussion
In this yearlong diet validation study with repeated measures of diet using both FFQ and 24HRs and two measures of fasting plasma metabolic profiles approximately 6 months apart, we replicated many food-metabolite associations that were found in other studies, and identified several potentially novel food biomarkers. More associations were found via FFQ than via 24HRs. Reproducibility of the 238 identified metabolites was acceptable for a large proportion, with 38% of metabolites with an ICC > 0.6. Our findings contribute to the literature on food-based biomarkers and provide important information on the reproducibility of the biomarkers which could facilitate their utilization in future nutritional epidemiological studies.
Generally, we identified more food-metabolite associations using the FFQ than using the 24HRs. Additionally, the biomarker AUCs were higher in general using the FFQ than using the 24HRs. In other words, the identified biomarkers predict dietary intake assessed via the FFQ better than that via the 24HRs. Even though the repeated measurements using 24HRs are considered a superior method of assessing the true intake in the validation study, the FFQ is designed to capture usual food intake in the past 12 months. That metabolites correlated better with the FFQ than the average 24HRs may indicate that the biomarkers reflect a long-term status of dietary intake. We observed a greater number of associations in the current study than in our previous study in the CPS-II Nutrition Cohort [8], probably because in the CPS-3, the FFQ was collected in closer proximity to blood draw (as part of the validation study), and using an average of two blood samples likely better captured usual metabolite levels during the year.
We replicated five metabolites that had been correlated with total citrus fruits and juices or orange juice in the CPS-II Nutrition Cohort [8]. Stachydrine-the strongest biomarker of total citrus fruits and juices-was first identified in an acute feeding study [18] and then validated as a biomarker of habitual citrus fruit intake in several cross-sectional datasets [9][10][11][12][13][19][20][21] including our previous metabolomics study in the CPS-II Nutrition Cohort [8]. Among the food biomarkers we identified in the CPS-3 DAS but not in CPS-II Nutrition Cohort, 4-allylphenol sulfate that is associated with apple/pear and blueberry intake is a nonspecific microbial metabolite of polyphenols [22], and has been reported as a biomarker of pears in a randomized trial [23]. Among the 75 vegetable-metabolite associations, 14 were found in the CPS-II Nutrition Cohort [8]. Notably, we replicated ergothioneine as a putative biomarker of mushroom intake, and several metabolites such as alliin, N-acetylalliin and S-allylcysteine as biomarkers of garlic intake. We previously found S-methylcysteine sulfoxide as a biomarker of cruciferous vegetable intake [8] which was also reported in the Prostate, Lung, Colorectal and Ovarian (PLCO) cohort [24]. In the present study, we found S-methylcysteine, the biological precursor of S-methylcysteine sulfoxide to be associated with cruciferous vegetable intake. Among the food-metabolite associations not found in the CPS-II Nutrition Cohort, S-methylcysteine and pipecolate were reported as useful dry bean biomarkers in both human and mouse studies [25]; genistein sulfate and 4-ethylphenyl sulfate are biomarkers for soy product intake. 4-ethylphenyl sulfate is a uremic toxin produced by gut bacteria, and its association with soymilk has been reported in a cohort of female twins [9].
We identified several new biomarkers for whole grain products such as 2,6-dihydroxybenzoic acid, 2-aminophenol sulfate and 2-acetamidophenol sulfate compared with our previous study in the CPS-II Nutrition Cohort [8]. 2,6-dihydroxybenzoic acid is a phenolic acid, also known as γ-resorcylic acid, which was identified as a marker for a high dietary fiber intake in an intervention study [26]. It is possible that 2,6-dihydroxybenzoic acid was derived from alkylresorcinols or lignans through a speculated microbial enzyme not yet identified in humans [26]. 2-acetamidophenol sulfate (HPAA sulfate) and 2-aminophenol sulfate are benzoxazinoid metabolites that were previously found as biomarkers of whole grain intake in urine [27]. 2-aminophenol sulfate was also found to be elevated in plasma after high dietary fiber intake [26].
In our previous study [8], ethyl glucuronide was the most predictive metabolite of all types of alcohol and is metabolized directly from ethanol in the liver by UDP-glucuronosyltransferases [28]. In the present study, the most predictive metabolite of alcohol was ethyl alpha-glucopyranoside (previously known as X-24293), which is a glycoside found in Japanese rice wine and might be used as a functional food or cosmetic material [29]. For wine consumption (total and red but not white wine), we replicated the potential biomarker 2,3-dihydroxyisovalerate, an intermediate metabolite produced by yeast during wine fermentation [30]. We replicated 26 metabolites as biomarkers of total coffee intake [8], including quinate, the highly predictive unknown metabolite X-21442, several caffeine metabolites (e.g., 1-methylxanthine, 1,3-dimethylurate, 1,7-dimethylurate, 1,3,7-trimethylurate) and other metabolites. Chlorogenic acid, an abundant natural polyphenol, is found in high concentration in coffee. During the roasting process, chlorogenic acid is broken down to quinate and caffeic acid. In both the CPS-II Nutrition Cohort and CPS-3 DAS, quinate was among the top predictive biomarkers of caffeinated and decaffeinated coffee. Previous animal studies showed chlorogenic acid and related compounds exert antiviral [31] and anticarcinogenic effects [32,33]. Future human studies need to investigate these biomarkers with disease outcomes directly or through mediation analyses. For tea consumption, we replicated that theanine was the most predictive biomarker for total tea, green tea and black tea consumption.
As discussed above, our studies (both in CPS-3 DAS and our prior research in the CPS-II Nutrition Cohort [8]) and others have identified many biologically plausible, putative food biomarkers using metabolomics, which highlights the importance of this technology in identifying dietary biomarkers. Moving forward, more research is needed to determine the use of these putative biomarkers in diet assessment. One important step is to develop calibration equations in controlled feeding studies, so that the biomarkers may be used to correct self-reported dietary intake [1]. Urinary recovery biomarkers have been used to calibrate energy and protein intakes and showed improved diet-disease associations compared with uncalibrated data [34]. Lampe et al. also evaluated blood concentration biomarkers in a feeding study of postmenopausal women and suggested that they perform as well as recovery biomarkers and, therefore, can be used to correct self-reported dietary intake data in future studies [35]. Cross-sectional studies such as the present study provide important information as one could examine multiple foods simultaneously and determine if a metabolite is correlated with multiple foods. Among the identified metabolites, many may not be optimal food biomarkers if they are not specific to certain foods or if they are synthesized endogenously, because their levels will be influenced by other characteristics.
One concern of using the metabolomic biomarkers in epidemiological studies is that one-time measurement is subject to short-term variation and may not represent long-term status. Large within-person variation compared to between person variation in metabolite levels can contribute to measurement errors that would result in underestimated disease risk estimates. An ICC, the ratio of between-person variance to total variance, is a good indicator of metabolite reproducibility. High ICCs indicate large between-person variation relative to the total variation, such as biomarkers for fish, milk, meat and coffee. Low ICCs indicate large within-person variation relative to the total variation. However, a low ICC does not necessarily exclude the metabolite from being used as a dietary biomarker in all circumstances. The low ICCs observed in the present study could be due to the infrequency of consumption of certain foods e.g., soy products, and could also be due to the seasonal variation in consumptions of certain fruits and vegetables, as one of the purposes of the CPS-3 DAS was to capture seasonal variation in blood biomarkers by collecting the samples six months apart. If collected a year apart, we would expect to see higher ICCs for many biomarkers of the foods that are consumed seasonally. A few previous studies examined the reproducibility of metabolites over a period, although did not focus on diet related biomarkers [36,37]. Floegel et al. [36] investigated the ICCs of 163 fasting serum metabolites over a 4-month period and found that the median ICC was 0.57 (vs. median ICC of 0.56 over six months in the present study). Carayol et al. [37] found a median ICC of 0.70 among 158 metabolites measured in fasting plasma samples over a 2-year period. They also found that the ICCs were higher for metabolites measured in fasting samples than in nonfasting samples, although Sampson et al. [17] found that fasting is not a major source of variation in metabolite levels in population studies. Therefore, one-time measurement is likely sufficient for many of the metabolites with high reproducibility.
The present study has several strengths. Its large sample size and comprehensive dietary and metabolomic data allowed us to explore a large number of diet-metabolite associations simultaneously which is more efficient than feeding studies and can provide information on the specificity of the biomarkers. Furthermore, the repeated measurements of blood samples enabled us to test biomarker reproducibility over 6 months. Our findings confirmed many previously identified food biomarkers and identified new metabolites for further testing. Reproducibility of food-based biomarkers is largely unknown in the field but very important to inform the application of such biomarkers in etiologic analyses. Large within-person variation in the biomarker over time is a major source of measurement error that could lead to underestimated diet-disease associations and inconsistent findings. Additional feeding studies are needed to test the dose-response relationships between food intake and the identified biomarkers to further confirm their validity for future use.

Study Population
The Diet Assessment Sub-study (DAS) was a one-year study among 745 men and women enrolled in the CPS-3 cohort, designed to evaluate the validity of the CPS-3 FFQ. Briefly, CPS-3 is a large prospective cohort study of 303,682 adults aged 30-65 residing in 35 states in the United States, plus the District of Columbia and Puerto Rico, who were enrolled between 2006 and 2013 [38]. At enrollment, participants provided a blood sample, had their waist circumference measured and completed an enrollment survey. They were also asked to complete a comprehensive baseline survey that assessed demographic, lifestyle and medical information. Follow-up questionnaires were sent in 2015 to those who completed the baseline survey after enrollment (N = 254,650) to update lifestyle and medical information and to assess diet using the CPS-3 FFQ for the first time.
The CPS-3 DAS was designed to evaluate the validity and reproducibility of the newly modified CPS-3 FFQ over a year. CPS-3 participants living in 5 regions defined by Quest Diagnostics business units (Atlanta, GA, USA; Dallas, TX, USA; Auburn Hills, MI, USA; West Hills, CA, USA; San Jose, CA, USA) were invited to participate in DAS. Participants were asked to complete the 2015 follow-up survey (to serve as the first FFQ), six telephone-administered 24HRs throughout the year, provide two fasting blood and two 24-h urine samples and complete the post-FFQ at the end of the study. The six 24HRs aimed to include four weekdays and two weekend days, with a goal of obtaining two 24HRs per "trimester"; we aimed to collect one 24HR within a week prior to the fasting blood draw. Blood and urine samples were collected six months apart to capture seasonal variation.
A total of 745 men and women completed both FFQs and the first 24HR, meeting the minimum criteria to remain in the DAS. For the metabolomics analysis, we excluded participants who completed less than three 24HRs (n = 2), had poor post-FFQs (n = 20; defined as missing 2 or more sections, an entire page, >100 line items or with daily energy intake <800 or >4500 kcal for men, and <600 or >3800 kcal for women) or had no blood sample (n = 1). We further excluded current smokers (n = 21), those whose body weight was missing at both blood draw appointments (n = 3) or weight change was >20 lbs between blood draws (n = 14) and pregnant women (n = 13). A total of 671 men and women were included in this plasma metabolomics analysis. Those with two blood draws (n = 644) were included in the metabolomic reproducibility analysis ( Figure S1). The CPS-3 DAS protocol was approved by the Emory University (Atlanta, GA, USA) Institutional Review Board.

Diet Assessment
Diet was assessed using the newly modified CPS-3 FFQ as described elsewhere [39]. Briefly, the Willett FFQ [40,41] was modified for the CPS-3 study population, to capture racial/ethnic and geographic diversity of the cohort. Modifications to the FFQ were informed through telephone-administered 24HRs, analyses of NHANES 2009-2010 and focus groups [39]. The final modified FFQ included 191-line items. Only the post-FFQ was used to assess dietary intake in the present study. We defined 101 food groups/items from the FFQ as shown in Supplemental Table S1, similar to what we defined in the CPS-II Nutrition Cohort [8]. Comparable food groups were derived from the 24HRs to match those from the FFQ. We also created a few food groups using the 24HRs that are not asked (e.g., mushroom) or asked in combination with other foods (e.g., apples) on the FFQ. A total of 105 food groups/items were derived from the 24HRs.

Blood Collection and Processing
Participants were instructed to make an appointment with a Quest Patient Service Center to have fasting blood drawn on the morning of the visit. Participants were asked to follow their usual diet except during the 8-h fasting period before the appointment. A total of 40 mL of fasting blood was collected using 5 EDTA tubes for plasma collection, and 4 serum separator tubes for serum collection. Blood samples were refrigerated and transferred to a Quest Diagnostics regional processing laboratory where they were fractionated by centrifugation and aliquoted into 9 vials. All aliquots of blood were frozen and shipped on dry ice to an off-site biorepository (Fisher Biorepositories, Inc., Frederick, MD, USA) for long term storage in the vapor phase of liquid nitrogen.

Metabolomics Analysis
Metabolomic profiling was conducted by Metabolon, Inc. (Durham, NC, USA) using ultrahigh performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) described elsewhere [42]. Briefly, plasma samples were treated with methanol to precipitate proteins. Four sample fractions were dried and reconstituted in different solvents for measurement under four different platforms. These platforms consisted of two separate reverse phase UPLC-MS/MS methods with positive ion mode electrospray ionization (ESI), one reverse phase UPLC-MS/MS method with negative ion mode ESI and one hydrophobic interaction chromatography UPLC-MS/MS with negative ion mode ESI. Individual metabolites were identified by comparison with a chemical library maintained by Metabolon that comprises more than 3300 commercially available purified standard compounds and recurrent unknown entities, based on retention index, mass to charge ratio and chromatographic data.
A total of 1368 metabolites were detected in the fasting plasma samples. Metabolites that were below the detection limit in >90% of the samples were excluded (n = 131). For the remaining metabolites, missing values were assigned the minimum detection value. To correct the day-to-day variation from the platform, each metabolite was divided by its daily median. Duplicates of 60 participant samples were used as quality controls to assess inter-and intrabatch variation. Interclass correlation coefficients (ICCs) were calculated among the quality control samples to test the reproducibility of the platform. Metabolites with ICC < 0.5 were further excluded from the analysis, leaving 1141 for food-metabolite analysis. Of the 1141 included metabolites, the median technical ICC was 0.87, with an interquartile range of 0.77 to 0.93, suggesting a very high reproducibility of the platforms.
Putative dietary biomarkers were further evaluated for predictive accuracy of discriminating high consumers (top quartile) from low consumers (bottom quartile), assessed using the area under the curve (AUC) calculated from the receiver operating characteristic (ROC) curve using R package pROC [44]. We considered AUC < 0.7 to be low, 0.7-<0.8 to be moderate and ≥0.8 to be high.
The reproducibility of the identified food-related metabolites over six months was assessed using ICCs. ICCs were calculated as the ratio of between-person variance to the total variance among participants with repeated measures of blood metabolic profiles. Between-person variance was estimated from a random effects model where participants were modeled as a random variable. ICCs >0.6 were considered good and >0.75 considered excellent.

Conclusions
In conclusion, in this large and comprehensive analysis of habitual diet and fasting plasma metabolic profiles in a free-living population of racially/ethnically diverse men and women, we identified several potentially novel food biomarkers and replicated others found in previous studies. Our findings contribute to the growing literature on food-based biomarkers and provide important information on the reproducibility of the biomarkers which could facilitate their utilization in future nutritional epidemiological studies.
Supplementary Materials: The following are available online at http://www.mdpi.com/2218-1989/10/10/382/s1. Figure S1: study population exclusion; Table S1: food-metabolite associations identified using either FFQ or 24-h diet recalls in the CPS-3 Diet Assessment Sub-study; Table S2: food-metabolite associations identified using the FFQ in the CPS-3 Diet Assessment Sub-study; Table S3: Food-metabolite associations identified using the average 24-h diet recalls in the CPS-3 Diet Assessment Sub-study; Table S4: food group definitions in the CPS-3 Diet Assessment Sub-study. Funding: The American Cancer Society funds the creation, maintenance, and updating of the Cancer Prevention Study-3 cohort. Support for this project was funded by the American Cancer Society.