Identification and Reproducibility of Urinary Metabolomic Biomarkers of Habitual Food Intake in a Cross-Sectional Analysis of the Cancer Prevention Study-3 Diet Assessment Sub-Study

Previous cross-sectional metabolomics studies have identified many potential dietary biomarkers, mostly in blood. Few studies examined urine samples although urine is preferred for dietary biomarker discovery. Furthermore, little is known regarding the reproducibility of urinary metabolomic biomarkers over time. We aimed to identify urinary metabolomic biomarkers of diet and assess their reproducibility over time. We conducted a metabolomics analysis among 648 racially/ethnically diverse men and women in the Diet Assessment Sub-study of the Cancer Prevention Study-3 cohort to examine the correlation between >100 food groups/items [101 by a food frequency questionnaire (FFQ), and 105 by repeated 24 h diet recalls (24HRs)] and 1391 metabolites measured in 24 h urine sample replicates, six months apart. Diet–metabolite associations were examined by Pearson’s partial correlation analysis. Biomarkers were evaluated for prediction accuracy assessed using area under the curve (AUC) calculated from the receiver operating characteristic curve and for reproducibility assessed using intraclass correlation coefficients (ICCs). A total of 1708 diet–metabolite associations were identified after Bonferroni correction for multiple comparisons and restricting correlation coefficients to >0.2 or <−0.2 (1570 associations using the FFQ and 933 using 24HRs), 513 unique metabolites correlated with 79 food groups/items. The median ICCs of the 513 putative biomarkers was 0.53 (interquartile range 0.42–0.62). In this study, with comprehensive dietary data and repeated 24 h urinary metabolic profiles, we identified a large number of diet–metabolite correlations and replicated many found in previous studies. Our findings revealed the promise of urine samples for dietary biomarker discovery in a large cohort study and provide important information on biomarker reproducibility, which could facilitate their utilization in future clinical and epidemiological studies.


Introduction
Nutritional epidemiological studies have significantly advanced understanding of the relationships between diet and chronic diseases and have led to dietary guidelines for disease prevention in recent decades [1][2][3]. However, the field is still largely impeded by inconsistent findings from many studies. Most studies rely on self-reported dietary data, such as those collected from food frequency questionnaires (FFQs), which involve systematic and random measurement errors that could result in underestimated risk estimates [4]. Robust and reliable objective dietary biomarkers are important to estimate dietary intake or calibrate self-reported dietary data, thus holding promise to advancing research on diet and cancer and other health outcomes; however, such dietary biomarkers are limited to a few nutrients and do not exist for most foods and dietary patterns.
Area under the curve (AUC) of receiver operating characteristic (ROC) curve was calculated to inform how well the diet-related metabolites can discriminate top from bottom quartiles of dietary intake. The AUCs were generally higher when dietary intake was assessed using the FFQ than using 24HRs.The top 3 most predictive metabolites for each of the 79 food groups/items are shown in Table 2 (according to the post-FFQ assessment, if less than 3 metabolites are identified then top metabolites according to 24HRs were presented). The most predictive metabolite usually also had the highest |r| with a food group/item.

Fruits
We identified 119 food-metabolite associations for 17 fruit groups/items estimated either from the FFQ or 24HRs, including 1 for grapes, 1 for prunes, 6 for bananas,19 for avocado, 2 for apples or pears, 6 for apples (24HRs only), 23 for total citrus fruits and juices, 16 for oranges, 15 for orange juice, 2 for grapefruit, 1 for watermelon, 1 for cantaloupe, 10 for berries, 3 for strawberries, 11 for blueberries, 1 for peaches and plums (Supplemental Table S1); 84 associations were observed using the FFQ (Supplemental Table S2) and 82 using the 24HRs (Supplemental Table S3). The AUCs ranged from 0.6 for vanillactate predicting prune intake assessed using the 24HRs to 0.94 for stachydrine predicting total citrus fruit and juice intake assessed by the post-FFQ.

Vegetables
There are 150 associations for 15 vegetable groups or individual vegetables (119 associations using the FFQ, and 91 associations using the 24HRs), including 1 metabolite for ketchup and salsa, 9 for beans, 58 for all soy products, 8 for fermented soy products, 17 for soy milk, 6 for soy protein powder, 10 for cruciferous vegetables, 2 for leafy greens, 1 for iceberg or head lettuce, 2 for peppers, 7 for mushrooms, 5 for allium vegetables, 5 for onions, 17 for garlic, and 2 for garlic powder. The AUCs ranged from 0.58 for 4-acetylphenyl sulfate predicting fermented soy products assessed using the 24HRs to 0.91 for 4 metabolites predicting total bean intake assessed using the FFQ.

Proteins
We identified 404 diet-metabolite associations for 10 protein food groups/items (107 for red meat, 126 for processed meat, 119 for poultry, 7 for total fish, 6 for dark fish, 3 for shellfish, 10 for total nuts, 9 for peanuts, 9 for other nuts, and 8 for seeds); 376 associations were identified using the FFQ and 158 using 24HRs. Most metabolites correlated with red, processed meat and poultry had negative correlations with intake. The AUCs ranged from 0.69 for 3-carboxy-4-methyl-5-propyl-2-furanpropanoate (CMPF) and X-23587 predicting shellfish intake using the FFQ to 0.9 for two metabolites (tryptophan betaine and X-24412) predicting total nut intake using the FFQ.

Dairy/Dairy Alternatives
There were 98 diet-metabolite associations for 4 dairy/dairy alternative groups (20 for milk, 4 for almond milk or rice milk, and 4 for total cheese, and 70 for cream); 93 associations were found using the FFQ, and 10 using the 24HRs. The AUCs ranged from 0.64 for X-13846 predicting almond milk or rice milk intake from 24HRs to 0.88 for heptenedioate (C7:1-DC) predicting total cheese intake from the FFQ.

Fats and Oils
Twenty-two associations were identified for 3 fats and oils (17 for creamy salad dressing, 1 for oil and vinegar salad dressing, and 4 for olive oil); 21 were found using the FFQ and only 1 found by 24HRs. The AUCs ranged from 0.70 for 2,6-dimethylphenol sulfate predicting cream to 0.80 for N-methyltaurine predicting olive oil (FFQ) and oil and vinegar salad dressing (24HRs).

Alcohol
Using either FFQ or 24HRs, we identified 443 associations for alcohol, including 136 for total alcohol, 53 for beer, 120 for wine, 104 for red wine, 24 for white wine, and 6 for liquor. 421 associations were found using the FFQ, and 243 associations were found using the 24HRs. The AUCs ranged from 0.66 for several metabolites as biomarkers of white wine intake to 0.99 for ethyl glucuronide as the biomarker of total alcohol. Ethyl glucuronide was also the most predictive metabolite for all subtypes of alcohol (beer, red wine, white wine and liquor).

Beverages
There were 359 associations for 9 beverage groups, including 145 for total coffee, 142 for caffeinated coffee, 5 for decaffeinated coffee, 24 for total tea, 8 for green tea, 13 for black tea, 6 for herbal tea, 10 for sugar-sweetened beverages and 6 for diet beverages, with 349 found from the FFQ and 304 from 24HRs. The AUCs ranged from 0.63 for X-17686 as a biomarker of herbal tea estimated from 24HRs to 1.0 for glucuronide of C 19 H 28 O 4 (1) and citraconate/glutaconate as biomarkers of total coffee intake. Glucuronide of C 19 H 28 O 4 (1) is also the most predictive metabolite for caffeinated (AUC = 0.98) and decaffeinated coffee (AUC = 0.66). For tea consumption, N-acetyltheanine was the most predictive biomarker for total tea, green tea, and black tea but was not correlated with herbal tea intake.

Miscellaneous
The remaining 78 associations were found for 8 miscellaneous food groups, including 22 for French fries, 20 for all chips, 12 for chocolate candies, 12 for dark chocolate, 2 for desserts, 3 for bars (breakfast, energy and high protein bars combined), 2 for soy sauce and 5 for artificial sweeteners. Acesulfame, sucralose, saccharin, erythritol and X-25785 that were associated with all artificial sweetener intake were also associated with diet beverages. The lowest AUC was 0.66 for erythritol as a biomarker of artificial sweetener intake (estimated from 24HRs); the highest AUC was 0.85 for pentose acid, abscisate for French fries (negative correlations) and X-12823 for chocolate candies (estimated from post-FFQ).

Reproducibility of the Identified Food Metabolites
Of the 513 metabolites that were significantly associated with food groups/items identified via FFQ or 24HRs, the median ICC for duplicate samples over six months was 0.53 (interquartile range: 0.42-0.62). By super pathway, the median ICC ranged from 0.40 for carbohydrates to 0.65 for energy metabolites.
Combining information on both prediction accuracy (AUC) and reproducibility (ICC) over time can inform the reliability of a biomarker to be used in future studies. The combined information on AUC and ICC for the most predictive metabolites of the 79 food groups/items are shown in Figure 1a,b. Biomarkers in the upper right corner with both high AUC and ICC are considered reliable, while those in the lower left corner with both low AUC and ICC are less reliable. Reliable biomarkers were seen for several food groups/items including coffee, alcohol, nuts, fish, tea, processed meat, poultry, and chocolate candies. Due to the design of DAS to capture seasonal variation by collecting 24 h urine six months apart, the low ICCs of metabolites might reflect true variation in dietary intake. We further investigated the relationship between consumption frequency in relation to AUC and ICC. Biomarkers of foods with low consumption frequencies tend to have lower AUCs and ICCs ( Figure 2). Exceptions included biomarkers for fish and alcohol.
Metabolites 2020, 10, x FOR PEER REVIEW 16 of 23 intake. We further investigated the relationship between consumption frequency in relation to AUC and ICC. Biomarkers of foods with low consumption frequencies tend to have lower AUCs and ICCs (Figure 2). Exceptions included biomarkers for fish and alcohol.
(a) (b) Figure 1. Metabolite prediction accuracy for food intake by metabolite reproducibility for the most predictive metabolite of 79 food groups/items in the Cancer Prevention Study-3 Diet Assessment Sub-study. (a) The most predictive metabolites for 71 food groups/items assessed using the food frequency questionnaire; (b) the most predictive metabolites for 60 food groups/items assessed using the average of 24 h diet recalls. Prediction accuracy was assessed by area under the curve (AUC) from the receiver operating characteristic curve, which indicates how well a metabolite could discriminate top quartile from bottom quartile intake of a food group/item. Reproducibility was assessed by intraclass correlation coefficients (ICCs), calculated as the ratio of between-person variance to the total variance among participants with repeated blood metabolic profiles measured six months apart. Figure 1. Metabolite prediction accuracy for food intake by metabolite reproducibility for the most predictive metabolite of 79 food groups/items in the Cancer Prevention Study-3 Diet Assessment Sub-study. (a) The most predictive metabolites for 71 food groups/items assessed using the food frequency questionnaire; (b) the most predictive metabolites for 60 food groups/items assessed using the average of 24 h diet recalls. Prediction accuracy was assessed by area under the curve (AUC) from the receiver operating characteristic curve, which indicates how well a metabolite could discriminate top quartile from bottom quartile intake of a food group/item. Reproducibility was assessed by intraclass correlation coefficients (ICCs), calculated as the ratio of between-person variance to the total variance among participants with repeated blood metabolic profiles measured six months apart.

Discussion
In this cross-sectional metabolomics study among 648 men and women in the CPS-3 DAS with comprehensive dietary data assessed using both FFQ and repeated 24HRs, and with two 24 h urine samples collected approximately 6 months apart, we identified 1708 diet-metabolite correlations after adjusting for multiple comparisons. More diet-metabolite correlations were found using FFQ than 24HRs. Reproducibility of the 513 unique metabolites over six months was good for a large proportion, with 28% of metabolites with an ICC > 0.6. The comparisons of urinary dietary biomarkers identified in the present study with our previous findings in fasting plasma samples in the same study [13] revealed several overlapping food biomarkers identified in both blood and urine and many more putative biomarkers identified in urine for further evaluation. This study also provided important information on the reproducibility of the urinary biomarkers, which could facilitate their utilization in future clinical and epidemiological studies.
Urine collection is less invasive, cheaper, and offers greater volumes than blood collection. Most food components (e.g., phytochemicals) are xenobiotics that will be transformed and eliminated quickly via urine or feces. Therefore, urine as a biospecimen could be very useful for identifying dietary biomarkers in large population studies. The usefulness of urine was recently highlighted by a population study comparing dietary biomarkers measured in blood and urine samples from the same individuals. Playdon et al. [11] identified more diet-metabolite correlations in urine than in blood and more than a third of the correlations found in blood were also found in urine with similar magnitude. We previously published findings of diet-related biomarkers identified in fasting plasma samples in the CPS-3 DAS [13]. Among 671 men and women with at least one fasting blood sample in the CPS-3 DAS, a total of 677 diet-metabolite associations were identified (238 metabolites were associated with 76 food groups/items). In the present study, among a similar number of participants with at least one 24 h urine sample we identified a greater number of associations (n = 1708). We also found many overlapping diet-metabolite correlations in urine as we found previously in fasting plasma samples in the same study. For example, the same plausible biomarkers (food constituents or derivatives) were found for apples or pears (4-allphenol sulfate), citrus fruits and juices (stachydrine, N-methylhydroxyproline, N-methylproline), soy products (genistein glucuronide), cruciferous vegetables (S-methylcycteine or S-methylcycteine sulfoxide), garlic (alliin, N-acetylalliin), whole grains (2,6-dihydroxybenzoic acid, 2-acetamidophenol sulfate, 4-methoxyphenol sulfate, 2-aminophenol sulfate), poultry (3-methylhistidine), fish (CMPF), nuts (tryptophan betaine, 4-vinylphenol sulfate), milk (N,N,N-trimethyl-5-aminovalerate and galactonate), artificial sweeteners (acesulfame, saccharin, and erythritol), alcohol (ethyl glucuronide and ethyl α-glucopyranoside), coffee (e.g., quinate, 3-hydroxypyridine sulfate, trigonelline (N-methylnicotinate)), and diet beverages (acesulfame). We previously found theanine, a potentially specific biomarker of tea intake in blood [6,13]. A derivative of theanine, N-acetyltheanine, was found to be the most predictive biomarker of tea in urine in the present study. The magnitude of the correlations was similar in blood and urine. We also observed similar ICCs for the same biomarkers measured in both blood and urine. The high consistency between blood and urine findings in the CPS-3 DAS is also likely influenced by the fact that 24 h urine samples were returned on the same day when fasting blood samples were collected from the same participants.
Reproducibility of food-based biomarkers, affected by many sources of variability, is very important to inform the application of such biomarkers in large-scale clinical and epidemiological studies [17]. Large within-person variation in the biomarker over time is a major source of measurement errors that could lead to underestimated diet-disease risk estimates and inconsistent findings. Generally, we found lower reproducibility (or ICCs) for urinary biomarkers than for blood biomarkers, with a median ICC being 0.53 vs. 0.56 [13]. It is likely because most urinary biomarkers are xenobiotics and amino acids that are hydrophilic which have shorter half-lives than lipophilic biomarkers. Many polyphenol biomarkers have half-lives shorter than 24 h [28]. Metabolites with a short half-life tend to have a higher within-person variation, and thus a lower ICC. However, some may still be useful to capture habitual diet if the food/beverage is consumed frequently in the population (e.g., coffee), as we observed a positive relationship between consumption frequency and reproducibility of the biomarkers. Although our goal is to identify reliable biomarkers for habitual dietary intake, sensitive and specific short-term biomarkers, such as isoflavones and their derivatives for soy products, are still useful in monitoring dietary compliance in intervention studies or in populations with higher frequency of consumption. On the other hand, lipophilic or erythrocyte-associated biomarkers have longer half-lives in weeks or months because of the equilibrium of biomarkers between blood and fatty tissues, or because of binding to red blood cells [5]; thus, are useful as long-term biomarkers. For example, even though fish and alcohol were not frequently consumed among participants in the present study, their most predictive metabolites (CMPF and ethyl glucuronide, respectively) still had high reproducibility over the six-month period.
Plausible biomarkers should have positive correlations with food intake. Many metabolites were inversely correlated with foods such as red and processed meat and may not be good candidates for further evaluation. A large proportion of the diet-related metabolites are unknowns which need annotation in future studies. We reported the unknowns herein given their strong relationships with dietary factors, so they may be compared with future studies using this platform. Moving forward, more research is needed to systematically evaluate plausible food and food group biomarkers in multiple aspects such as robustness in different populations and study settings, half-lives, dose-response relationships over a range of intakes, and comparisons to benchmark biomarkers [29].
The present study has several strengths, including its large sample size, comprehensive dietary data collected using both an FFQ and repeated 24HRs, availability of 24 h urine samples, and metabolomic profile data measured by an untargeted and sensitive mass spectrometry-based approach. These rich resources enabled us to explore a large number of diet-metabolite correlations simultaneously. The repeated measures of 24 h urinary metabolic profiles make the study unique because most cohort studies did not collect urine samples or only collected spot urine and because the repeated measures allowed for an assessment of biomarker reproducibility over time. This study also has limitations. Metabolites with low correlation coefficients may not be ideal biomarkers as they only explain a small portion of the variation in dietary intake. The low correlations do not exclude them from further evaluation as candidate dietary biomarkers though, as diet was assessed using self-reported instruments in this study that have measurement errors which could attenuate the correlation estimates with biomarkers. We were not able to distinguish acute intake biomarkers from habitual dietary biomarkers as the study was designed to not to burden the participants by collection 24HRs and biospecimens at the same time. Future studies need to confirm these biomarkers in spot urine samples as 24 h urine collections are burdensome and generally not feasible in large population studies.

Study Population
The Diet Assessment Sub-study (DAS) was a one-year observational study among 745 men and women enrolled in the CPS-3, designed to evaluate the validity and reproducibility of the newly modified CPS-3 FFQ over a year. CPS-3 is a large prospective cohort study of 303,682 adults aged 30-65 residing in 35 states plus the District of Columbia and Puerto Rico, who were enrolled between 2006 and 2013 as described in detail elsewhere [30]. Briefly, at enrollment, participants provided a blood sample, had waist circumference measured and completed an enrollment survey. Most participants also completed a more comprehensive baseline survey that assessed extensive lifestyle, medical and other information. Follow-up questionnaires were sent in 2015 to those who completed the baseline survey after enrollment (n = 254,650) to update lifestyle and medical information and to assess diet using the CPS-3 FFQ for the first time.
To recruit participants to the DAS, CPS-3 participants living in 5 regions defined by Quest Diagnostics business units (Atlanta, GA, USA; Dallas, TX, USA; Auburn Hills, MI, USA; West Hills, CA, USA; San Jose, CA, USA) were invited. Enrolled participants were asked to complete the 2015 follow-up survey (to serve as the pre-FFQ), six telephoneadministered 24HRs throughout the year, provide two fasting blood and two 24 h urine samples and complete a post-FFQ at the end of the study. The six 24HRs aimed to include four weekdays and two weekend days. Blood and urine samples were collected approximately six months apart to capture seasonal variation.
A total of 745 men and women met the minimum inclusion criteria of completing both pre-and post-FFQs and the first 24HR. For the urinary metabolomics analysis, we excluded participants who completed less than three 24HRs (n = 2), had poor post-FFQs (n = 20; defined as missing 2 or more sections, an entire page, >100 line items, or with daily energy intake <800 or >4500 kcal for men, and <600 or >3800 kcal for women), or had missing or invalid urine collections at both time points (n = 30). Invalid urine collections were defined as missed or spilled voiding ≥2 times, incorrect collection or flushing of the next morning samples, missing volume or extreme total volume (top and bottom 1% distribution), extreme urinary creatinine (top and bottom 1% distribution), or total collection period <20 or >28 h. We further excluded current smokers (n = 19), those whose body weight was missing at both urine collection appointments (n = 1) or weight change was >20 lbs between urine collections (n = 13), and pregnant women (n = 12). Finally, 648 men and women were included in the urinary metabolomics analysis (Supplemental Figure S1). Those with two eligible urine samples (n = 482) were included in the analysis of assessing reproducibility. The CPS-3 DAS protocol was approved by the Emory University (Atlanta, GA, USA) Institutional Review Board.

Diet Assessment
Diet was assessed using the newly modified CPS-3 FFQ as described elsewhere [31]. Briefly, the Willett FFQ [32,33] was modified for the CPS-3 study population, of which 17.3% were non-white participants. Modifications to the FFQ were informed through telephone-administered 24HRs, analyses of NHANES 2009-2010, and focus groups. The final modified FFQ included 191-line items. We defined 101 food groups/items from the FFQ as shown in Supplemental Table S4, generally consistent with the definitions in our previous analysis in the CPS-II Nutrition Cohort [6]. Comparable food groups were derived from the 24HRs to match those from the FFQ. We also created a few food groups using the 24HRs that are not asked (e.g., mushroom) or asked in combination with other foods (e.g., apples) on the FFQ. A total of 105 food groups/items were derived from the 24HRs. Only the post-FFQ was used in the present study as it assessed average dietary intake in the past 12 months during which period 24 h urine samples were collected.

24 h Urine Collection and Processing
Participants were instructed to begin 24 h urine collections in the morning the day prior to their fasting blood collection appointment. Urine collection started after voiding the first specimen in the morning, and participants collected all urine for the next 24 h including the following morning's first specimen. Urine was collected in 3 L unpreserved jugs, and participants were instructed to refrigerate or keep samples in a cooler with cool packs provided. The following morning, participants delivered their completed 24 h urine collection to a Quest Patient Service Center and volume was recorded. Urine specimens were then transported to a Quest Diagnostics regional processing laboratory where samples were aliquoted into 4 × 5 mL and 5 × 1.8 mL labeled cryovials. All aliquots were frozen and shipped on dry ice to an off-site biorepository (Fisher BioServices, Inc., Frederick, MD, USA) for long-term storage in the vapor phase of liquid nitrogen.

Metabolomics Analysis
Metabolomic profiling was conducted by Metabolon, Inc. (Durham, NC, USA) using ultrahigh performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) described in detail elsewhere [34,35]. Briefly, 100 µL urine samples were treated with 450 µL of methanol to precipitate proteins using an automated liquid handling robot (Hamilton LabStar, Hamilton Robotics, Inc., Reno, NV, USA). Four sample fractions were dried and reconstituted in different solvents for measurement under four different platforms. Two aliquots were analyzed using two separate reverse phase (RP)/UPLC-MS/MS methods with positive ion mode electrospray ionization (ESI), one chromatographically optimized for more hydrophilic compounds and one for more hydrophobic compounds. Another aliquot was analyzed using RP/UPLC-MS/MS with negative ion mode ESI using a separate dedicated C18 column. The last aliquot was analyzed via hydrophilic interaction chromatography (HILIC)/UPLC-MS/MS with negative ion mode ESI. Mobile phases of the RP positive ion method consisted of 0.1% formic acid in water and 0.1% formic acid in methanol. Mobile phases of the RP negative ion method consisted of 6.5 mM ammonium bicarbonate in water (pH 8) and 6.5 mM ammonium bicarbonate in 95% methanol/5% water. Mobile phases of the HILIC method consisted of 10 mM ammonium formate in 15% water, 5% methanol, 80% acetonitrile and 10 mM ammonium formate in 50% water, 50% acetonitrile. For all methods, the injection volume was 5 µL and a 2× needle loop overfill was used. Individual metabolites were identified by comparison with a chemical library maintained by Metabolon that comprises more than 3300 authenticated standards and recurrent unknown entities, based on retention time/index, mass to charge ratio, and chromatographic data (including MS/MS spectral data).
A total of 1551 metabolites were detected in the 24 h urine samples. Metabolites that were below the detection limit in >90% of the samples were excluded (n = 147). Values for each sample were normalized by osmolality. To correct the day-to-day variation from the platform, each metabolite was then rescaled to set the median equal to 1. Lastly, missing values are imputed with the minimum. Triplicates of 44 participant samples were used as quality controls to assess inter-and intra-batch variation. Intraclass correlation coefficients (ICCs) were calculated among the quality control samples to test the reproducibility of the platforms. Metabolites with an ICC < 0.5 were further excluded from the analysis, leaving 1391 for diet-metabolite analysis. Of the 1391 included metabolites, the median technical ICC was 0.94 (interquartile range: 0.89 to 0.97), suggesting a very high reproducibility of the platforms.
Putative dietary biomarkers were further evaluated for predictive accuracy of discriminating top from bottom quartile of consumption (highest vs. lowest intake), assessed using the AUC calculated from the ROC curve using R package pROC [37]. AUC < 0.7 was considered to be low, 0.7-<0.8 to be moderate, and ≥0.8 to be high.
The reproducibility of the identified food-related metabolites over six months was assessed using ICCs. ICCs were calculated as the ratio of between-person variance to the total variance among participants with repeated measures of urinary metabolic profiles. Between-person variance was estimated from a random effects model where participants were modeled as a random variable. We considered ICCs > 0.6 to be good and >0.75 to be excellent reproducibility.

Conclusions
In conclusion, in this large cross-sectional analysis of habitual diet and 24 h urinary metabolic profiles in a free-living population of 648 racially/ethnically diverse men and women, we identified many more potential dietary biomarkers in urine than fasting blood samples in the same study, and replicated several found in other previous studies. These findings provided complimentary information to blood biomarkers and important information on the reproducibility of the urinary biomarkers. These candidate biomarkers warrant further evaluation and reliable ones could be used in future clinical and epidemiological studies.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/metabo11040248/s1. Figure S1: study population exclusion; Table S1: Food-metabolite associations identified using either FFQ or 24 h diet recalls in the CPS-3 Diet Assessment Sub-study; Table  S2: Food-metabolite associations identified using the FFQ in the CPS-3 Diet Assessment Sub-study; Table S3: Food-metabolite associations identified using the average 24 h diet recalls in the CPS-3 Diet Assessment Sub-study; Table S4: Food group definitions in the CPS-3 Diet Assessment Sub-study. Funding: The American Cancer Society funds the creation, maintenance, and updating of the Cancer Prevention Study-3 cohort. Support for this project was funded by the American Cancer Society.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of the Emory University (IRB ID CR001-IRB00059007, approved on 10/23/2020).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data described in the manuscript and analytic code are not available to protect participant confidentiality and in adherence with institutional policies.