Demographic, Health and Lifestyle Factors Associated with the Metabolome in Older Women

Demographic and clinical factors influence the metabolome. The discovery and validation of disease biomarkers are often challenged by potential confounding effects from such factors. To address this challenge, we investigated the magnitude of the correlation between serum and urine metabolites and demographic and clinical parameters in a well-characterized observational cohort of 444 post-menopausal women participating in the Women’s Health Initiative (WHI). Using LC-MS and lipidomics, we measured 157 aqueous metabolites and 756 lipid species across 13 lipid classes in serum, along with 195 metabolites detected by GC-MS and NMR in urine and evaluated their correlations with 29 potential disease risk factors, including demographic, dietary and lifestyle factors, and medication use. After controlling for multiple testing (FDR < 0.01), we found that log-transformed metabolites were mainly associated with age, BMI, alcohol intake, race, sample storage time (urine only), and dietary supplement use. Statistically significant correlations were in the absolute range of 0.2–0.6, with the majority falling below 0.4. Incorporation of important potential confounding factors in metabolite and disease association analyses may lead to improved statistical power as well as reduced false discovery rates in a variety of data analysis settings.


Introduction
The field of metabolomics involves the parallel measurement of large numbers of small molecules in biological systems and offers new avenues for understanding biological phenotypes, deciphering mechanisms, and identifying biomarkers or drug targets for a variety of diseases [1][2][3][4][5][6]. The currently detectable metabolome is complex and rich; tens of thousands of endogenous human body metabolites have been identified thus far, with a majority belonging to various lipid classes. These metabolites represent the downstream products of gene expression and protein action and thus provide an instantaneous snapshot of biological phenotype. To date, metabolomics studies have resulted in numerous important findings in systems biology and biomarker discovery, including a deeper understanding of the relationship between metabolism and chronic diseases [7]. Furthermore, these findings have the potential to improve early disease detection or therapy monitoring, as well as application to research areas such as environmental and nutritional sciences.
In the area of diet and nutrition studies, metabolomics plays a key role in the development of nutritional biomarkers (e.g., objective measure of nutrient or food intake) to better understand the relationship between dietary intake and chronic disease risk [24]. To date, most diet-disease association studies depend on observational data. A challenge, however, is that despite a broad range of available metabolomics technologies that now routinely enable the identification of dietary intake serum or urine biomarkers, the deleterious effect of confounding factors on such biomarker candidates is a major bottleneck [25]. It is becoming increasingly apparent that metabolite profiles are influenced by numerous clinical and demographic factors, such as sex, age, body mass index (BMI), smoking, alcohol consumption, and medication use, such as lipid-lowering and anti-inflammatory drugs [2,[26][27][28]. Consequently, the derived biomarkers can be confounded by one or more of these factors [2,[29][30][31][32][33].
To address this challenge, we investigated the correlation of more than 1000 serum and urine metabolites, including lipids, with numerous demographic, lifestyle, and clinical parameters in a subset of a racially and ethnically diverse, well-characterized observational cohort of 444 post-menopausal women in the Women's Health Initiative (WHI), for which extensive diet and clinical data have been collected. Our overarching goal was to determine which factors are most important for adjustment when investigating metabolites in disease association studies.

Overview of the Women's Health Initiative Observational Study
The Women's Health Initiative (WHI) Observational Study (WHI-OS) is a prospective study of 93,676 postmenopausal women who were enrolled between 1994 and 1998 at 40 clinical centers in the United States. Details on study design and recruitment have been published previously [34][35][36]. Data and biological samples for this present analysis were derived from the Nutrition and Physical Activity Assessment Study Observational Study (NPAAS-OS), an ancillary study derived from the WHI, conducted at 9 WHI clinical centers between 2006 and 2009 [37]. The NPAAS study was approved by the institutional review board, and all women gave informed written consent, in addition to ongoing consent for WHI participation. The WHI is registered at clinicaltrials.gov (NCT00000611).

Study Population
Details about the study population have been published [37]. A total of 450 WHI-OS participants completed a fasting blood draw, a 24 h urine collection, anthropometry, and 3 dietary assessment tools: a food frequency questionnaire (FFQ), a 4-day food record, and 3 24 h recalls. In this analysis, only the FFQ data were used. Most of these activities were carried out during 2 clinic visits over a period of 2 weeks. Blood samples were collected in the fasting state (≥12 h) and were maintained at 4 • C for up to 1 h until serum was separated from cells. Centrifuged aliquots were stored in freezers (at −80 • C) until analysis [38] For urine collections, participants were instructed to begin collections after a first-morning void preceding the penultimate study day and to record any missed voids or spillage. Boric acid was used as a preservative. [39]. Data collected at the NPAAS-OS clinic visit included participant's age, height, and weight, and reporting of current cigarette smoking, medication use (duration and frequency), dietary supplement use (type and daily average amount), [40] and physical activity using standardized WHI instruments. Race and ethnicity (self-identified), height, baseline income, education, and marital status were obtained from the primary WHI database at baseline enrollment 1993-1998. Medication use was obtained from data collected at the end of the first WHI extension study in 2010. A thorough medication inventory system was developed to capture the use of all usual medications, including formulation, dose, and duration, and whether these medications were prescription or over-the-counter drugs [34]. A total of 1653 different medications were reported among 212 women in NPAAS-OS. Any medication class used by >5% of women were retained for analysis. Self-reported recreational physical activity was categorized as metabolic equivalent task (MET) hours, where METs are used to specify intensity, e.g., light, moderate, or vigorous. Total recreational physical activity is then calculated as MET hours/week by multiplying individual exercise METs by hours per week engaged in that activity and summing over all types of activities. NPAAS-OS recruitment was oversampled for participants with extremes of body mass index (BMI; <18.5 and ≥30.0 kg/m 2 ), women aged ≤59 years at WHI enrollment, and self-identified Black and Hispanic/Latina women. Exclusion criteria included medical conditions precluding participation, weight instability, or travel plans during the study period. Participants were instructed to follow their usual diet, dietary supplement use, and physical activity habits during the study. A total of 444 women for whom serum and urine samples were available were included in this present analysis.

Serum
Details on metabolite measurements have been published previously [41]. Briefly, for the analysis of aqueous blood metabolites, methanol extracts of fasting serum samples from 444 NPAAS-OS participants, along with 17 blinded duplicates were analyzed by targeted LC-MS/MS using an AB Sciex Triple Quad 6500+ mass spectrometer [42]. The LC system was composed of four Shimadzu Nexera LC-20 pumps, an AB Sciex/CTC autosampler, and Agilent 1260 column compartment containing a column-switching valve integral to the MS instrument (Agilent Instrument Technologies, Santa Barbara, CA, USA). A total of 2 HILIC (hydrophilic interaction chromatography) columns (Waters XBridge Amide; 150 × 2.1 mm, 2.5 µm particle size), connected in parallel, were used for positive and negative ionization modes. Each sample was injected twice, 10 µL for analysis using negative ionization mode and 5 µL for analysis using positive ionization mode. Our setup allows one column to perform the separation, while the other column is reconditioned and readied for the next injection. The flow rate was 0.300 mL/min, the autosampler temperature was kept at 4 • C, the column compartment was set at 40 • C, and the total separation time for both ionization modes was 20 min. The mobile phase was composed of Solvents A (5 mM ammonium acetate in 88% H 2 O, 10% acetonitrile, 2% methanol +0.2% acetic acid) and B (5 mM ammonium acetate in 88% acetonitrile, 10% H 2 O, 2% methanol + 0.2% acetic acid). Identical gradient conditions were used for both separations. The assay was developed using authentic commercially obtained compounds (Sigma-Aldrich, Saint Louis, MO, USA or Fisher Scientific, Pittsburgh, PA, USA) and targeted a total of 303 metabolites that represent >40 different metabolic pathways, along with 33 stable isotope labeled internal standards (Cambridge Isotope Laboratory, Tewksbury, MA, USA). Metabolite concentrations were obtained using MultiQuant 3.0.2 software. Of the 303 metabolites targeted, 157 were detected in >80% of all samples.

Urine
Metabolite profiles for 24 h urine samples (blinded duplicates) were analyzed by 1 H 800 MHz NMR spectroscopy. For NMR analysis, urine samples (300 µL each) were mixed with an equal volume of phosphate buffer (100 mM, pH = 7.4) containing an internal standard, TSP (3-(trimethylsilyl)propionic-2,2,3,3-d 4 acid sodium salt, 25 µM), and transferred to 5 mm NMR tubes. The samples were analyzed using a Bruker Avance II 800 MHz NMR spectrometer equipped with a cryogenically cooled probe and Z-gradients suitable for inverse detection. One-dimensional NMR experiments using the 'noesyprld' pulse sequence with water suppression using presaturation were performed under identical experimental conditions. Each spectrum was obtained using 10,000 Hz spectral width and 32,768 time-domain data points. Free induction decay (FID) signals were Fourier transformed after multiplying using an exponential window function and a line broadening of 0.5 Hz after setting the spectrum size to 32,768 points. Resulting spectra were phase and baseline corrected, and the chemical shifts were referenced to the internal TSP peak. Metabolites were then identified based on the literature and chemical shift databases [45,46]. Metabolite concentrations were obtained after normalizing NMR spectra with reference to the internal standard, TSP, peak. Bruker Top-Spin versions 3.0 and 3.1 software packages were used for NMR data acquisition and processing, and Bruker AMIX software was used for metabolite quantitation. Relative spectral abundances for 58 metabolites were obtained. None of the metabolites had any missing values.
Separately, urine samples were analyzed using global GC-MS using an Agilent 7890/5975 GC-MS instrument and following established protocols [18]. Urine samples were treated with urease enzyme to deplete the urea level followed by methoxymation using methoxime. Urine metabolites were then derivatized using MSTFA (N-Methyl-N-(trimethylsilyl) trifluoroacetamide) with 1% (v/v) TMSC (trimethylchlorosilane). Prior to derivatization, the samples were mixed with myristic acid-d 27 and a FAME (fatty acid methyl-ester) mixture of retention time index compounds. The samples (1 µL) were injected onto the instrument using splitless mode. Helium was used as the carrier gas with a flow rate of 1.2 mL/min. Separation was performed using an Agilent DB-5 ms + 10 m Duraguard capillary column (20 m × 250 µm × 0.25 µm). The column temperature was maintained at 60 • C for 1 min, then increased at a rate of 10 • C/min to 325 • C and held at this temperature for 10 min. Mass spectral signals were collected after a solvent delay of 4.90 min. Peak intensities and elution times for the retention time index compounds were verified by m/z values after each experiment. After converting the data to the appropriate format, MS peaks were analyzed using Agilent MassHunter Quantitative Analysis software and PARADISe version 1.1.6 [47]. Relative concentrations of metabolites were obtained after normalizing the data with respect to the internal standard, myristic acid-d 27 . This resulted in a total of 275 metabolites, 137 of which were named compounds, observed in >80% of all samples.

Quality Controls (QC) Used in the Metabolite Analysis
Analysis protocols used multiple layers of QC samples as well as isotope labeled or unlabeled internal standards to assess instrument stability/performance during the analysis and help with normalization and metabolite quantitation. Different types of QCs used included: (a) unblinded instrument QC samples (commercially obtained pooled human serum from Innovative Research, Inc. (Novi, MI, USA)) run every 10 samples and at the beginning and end of each batch of samples; (b) blinded, pooled study samples (5% for urine; 10% for serum) interspersed with the biological study samples (3 QCs/batch of 27 study samples) used to normalize batches of samples over the run; (c) 17 split-sample blinded duplicates of study samples also interspersed with study serum and urine samples, which were used to calculate reported median metabolite CV values; (d) isotope labeled internal standards for targeted analysis of aqueous metabolite (n = 33) and lipids (n = 54) in serum, which enabled absolute concentration determination and ensured evaluation of instrument stability and data quality; (e) internal standard, TSP, used to assess the spectral quality, calibrate spectra, and help with data normalization of urine NMR spectra; and (f) FAME (fatty acid methyl esters) of different fatty acid chain lengths for retention time indexing and myristic acid-d 27 for help with metabolite identification and data normalization, respectively. Median CVs of blinded pooled study QC samples for the 4 different platforms (2 for serum analysis and 2 for urine analysis) across the samples were 4.4% for global NMR from 24 h urine, 5.6% for targeted lipidomics, 7.2% for targeted LC-MS/MS, and 21.6% for global GC-MS platforms.

Data Preprocessing and Analysis
Metabolite data from multiple high-dimensional platforms, including NMR, GC-MS, LC-MS, and lipidomics from both fasting serum and 24 h urine were utilized for this analysis. For metabolomic variables, those with more than 20% missing values were removed to ensure robust results. For the remaining variables, half of the minimum positive value was used to impute the values to approximate the detection limits. Normalization was performed for LC-MS and GC-MS data using local polynomial regression fitting (loess) over run order with the span parameter set at 0.75 within each batch among QC samples. All metabolites were log transformed using the natural logarithm to improve the normality of distributions prior to analysis. No transformations were conducted on covariates. Relative concentration data were used for the targeted aqueous LC-MS data. For ease of interpretation, unidentified metabolites from GC-MS were removed from the analysis. In total, 1108 metabolites were quantitated: n = 157 LC-MS aqueous metabolites and n = 756 LC-MS lipid species from 13 lipid classes were measured in serum; n = 58 NMR and n = 137 GC-MS were measured in urine. All metabolites included in analyses are listed in Supplementary Table S1.
We evaluated the pairwise correlations between these log-transformed metabolites and 29 covariates using Pearson's correlation for continuous variables and Spearman Rank correlations for categorical variables. Covariates available and hypothesized to have the potential to affect metabolite concentrations included demographic variables: age (years; continuous), body mass index [BMI (kg/m) 2 ; continuous], education (≤high school/GED; some college; ≥college degree; categorical); income (<USD 50 K versus ≥USD 50 K); marital status (never married/divorced or separated/widowed versus presently married/in a marriage-like relationship); self-reported race (Black compared to non-Hispanic white; there were insufficient observations for other race categories) and ethnicity (Hispanic compared to non-Hispanic whites; analyzed per categories at the time of data collection, which did not include an additional option for race within ethnicity) [48]; diet and lifestyle factors: total self-reported energy intake from FFQ (kcal/d; continuous); physical activity [total energy expended from recreational physical activity in metabolic equivalents (MET)/week; continuous]; current smoker (yes/no); alcohol intake (drinks/week; continuous); dietary supplement use [40]: multivitamin (yes/no); multivitamin plus mineral (yes/no); stress formula (yes/no); other combination pills (non-stress/multi; yes/no); any combination pills (yes/no); single ingredient supplements (yes/no); note that all participants reported taking at least one supplement; prescription medications (medications evaluated if used by ≥5% of study sample; yes/no): angiotensin-converting enzyme (ACE) inhibitors (also includes ACE inhibitors plus thiazide/thiazide-like agents and ACE inhibitor plus calcium channel blocker combinations); angiotensin II receptor antagonists (also includes angiotensin II plus thiazides); beta blockers (cardio-selective); bisphosphonates; calcium channel blockers; diuretic combinations (also includes thiazide-like diuretics); 3-hydroxy-3-methyl-glutaryl-coenzyme A (HMG-CoA) reductase inhibitors; nonsteroidal anti-inflammatory drugs (NSAIDS; also includes salicylates and Cox-2 inhibitors); proton pump inhibitors (also includes H-2 antagonists); and thyroid hormones; and other: season of FFQ completion (spring: March-May, summer: June-August; fall: September-November; winter: December-February; categorical), and sample storage time (years; continuous). Exclusions were made for extreme BMI (>50 kg/m 2 ; n = 3) and biologically implausible energy intakes (<600 kcal/d; n = 6). These exclusions only affected analyses for these variables. Data were missing for the following variables: dietary supplement use n = 1; alcohol intake n = 7; education n = 3; current smoking status n = 10; and self-reported physical activity n = 6. These observations were not included in the analyses for that variable. Data are given for all pairwise associations remaining significant after controlling for false discovery rate (FDR < 0.01) for all covariates across each of the 4 metabolite platforms using the Benjamini-Hochberg algorithm [49]. All analyses were performed in Stata (v17, College Station, TX, USA).

Results
Characteristics of the 444 women in the NPAAS ancillary study are given in Table 1. The majority of women were white, educated with higher incomes, and non-smoking, and all women reported the use of dietary supplements. Pairwise correlations for metabolites and covariates that were significant at FDR < 0.01 for all platforms are given in Tables 2-5. For the LC-MS platform, 9 variables significantly correlated with serum metabolites: age (n = 18 metabolites), BMI (n = 16), alcohol intake (n = 3), self-reported race (Black compared to non-Hispanic white; n = 12), self-reported ethnicity (non-Hispanic white compared to Hispanic; n = 3), multivitamin plus mineral use (n = 3), non-stress combination pills (n = 2), any combination supplements (n = 6), and single supplements (n = 2) ( Table 2). Most metabolites showed positive correlations with these covariates, except for BMI, race, and ethnicity, for which about half were inversely related. Absolute correlation values were weak to moderate, ranging from 0.19-0.47.      When we examined correlations between covariates and the 13 serum lipid classes, we found 5 classes that were lower among Black women compared to non-Hispanic white women: LPC, LPE, PC, DAG, and TAG. LPC and LPE classes were lower, and DAG and TAG classes were higher (correlated positively) with BMI. Alcohol intake was associated with higher PC and remained highly significant in sensitivity analyses restricted to women consuming <10 drinks/week. LPC as a class was higher with both non-stress supplement combinations and any combination supplement use (Table 3). Some individual lipid species were correlated with covariates; however, there were no specific patterns, suggesting minor shifts within lipid classes (Supplementary Table S2). Absolute correlations ranged from 0.18-0.36.

Metabolite
For the NMR platform, urinary metabolite concentrations were higher with increasing alcohol intake (n = 27), lower among Blacks compared to non-Hispanic whites (n = 42) and non-Hispanic whites compared to Hispanic/Latinas (n = 2), and higher with longer sample storage time (n = 15). Only one and two metabolites, respectively, were associated with BMI and age. Absolute correlations ranged from 0.18-0.58 (Table 4).
Similarly for GC-MS, urinary metabolite correlations were higher with increasing alcohol intake (n = 17) and sample storage time (n = 3), and lower among Blacks compared to non-Hispanic whites (n = 5). Absolute correlations ranged from 0.19-0.37 (Table 5).

Discussion
Clinical, demographic, and environmental factors may influence the metabolome separately from the main effect parameters of interest under investigations. This is particularly true for putative disease association biomarkers, which have often failed in validation studies due to the influences of clinical and demographic factors on metabolite levels and disease outcomes of interest. Our aim was to specifically determine which of the clinical, demographic, and lifestyle factors collected from an extensive list were highly correlated with metabolites to inform potential confounding factor selection, thereby improving model precision and reducing false discovery in future assessments of metabolite-disease associations. In this well-characterized observational cohort of 444 post-menopausal women with extensive demographic, clinical, and dietary data, we evaluated the correlation patterns of 1108 metabolites across 4 platforms with 29 covariates and observed significant, albeit modest, correlations for age, BMI, alcohol intake, and race. Dietary supplements and sample storage time were important for a smaller number of metabolites. Surprisingly, medication use had little significant impact on any metabolites, even with a high proportion (42%) of our study population taking at least one of the prescription medications evaluated.
Despite the influence of confounding factors on metabolite levels, efforts to identify these factors and quantify their contribution to affecting specific metabolite-outcome relations of interest have been limited. The exception is the assessment of well-known confounders such as age, BMI, and sex [2]. The impact of aging on the metabolome has been outlined in two recent reviews [50,51], which report altered metabolites in metabolic pathways primarily related to redox homeostasis, energy, and amino acid metabolism. In agreement with these reviews, we found that age was associated with many metabolites related to oxidative stress and amino acids. Similarly, comparisons of metabolite profiles across the continuum of BMI show large variations [52] with more profound associations on the metabolome demonstrated between lean and obese individuals [33]. We found that BMI was strongly associated with metabolites related to branched-chain amino acid and energy metabolism and several classes of lipids, including LPCs, LPEs, DAGs, and TAGs. Although our cohort comprised only women, other studies have reported the impacts of biological sex on amino acids, lipids, sugars, and keto acids (reviewed in [32]). Finally, several studies have also investigated circadian rhythms on metabolite levels; however, all samples in our cohort were obtained in the morning after an overnight fast, precluding this evaluation [53][54][55].
The finding that alcohol was highly correlated with many metabolites, particularly fatty acids, was not unexpected. Other investigators have evaluated the effects of alcohol on the blood and urine metabolome and have reported similar differences in lipids, [56][57][58] including higher diacyl PCs as we observed here. Although not significant at a class level, we also noted a shift (some higher, some lower) in numerous individual TAG species. Alterations in fat metabolism with alcohol intake are well recognized. [59] Mechanistically, alcohol promotes the accumulation of fat in the liver primarily through the substitution of ethanol instead of fatty acids for energy [60]. These fatty acids are then esterified into triglycerides, phospholipids, and cholesterol esters, which are secreted into circulation via lipoproteins [60]. This factor should be recorded if at all possible, particularly when utilizing lipidomics platforms. We should note that the women in this study were either non-drinkers or consumed only moderate amounts of alcohol (median = 0.4, with only 26 women reporting 10 or more drinks/week). In sensitivity analyses restricted to women consuming <10 drinks/week, associations were attenuated but did not disappear.
Many metabolites were also associated with self-identified race. Comparisons of Blacks and non-Hispanic whites revealed correlations with 5 classes of lipids, LPCs, LPEs, PCs, DAGs, and TAGs, and 42 aqueous metabolites, including TCA cycle intermediates and several amino acids, measured by NMR. Notably, all but a few significant lipids and urine metabolites were lower among Black women compared to non-Hispanic white women. We hypothesized that these relationships may be confounded by age, BMI or lipid-lowering medication use, as there were a greater proportion of Black women who were taking HMG-CoA reductase inhibitors relative to non-Hispanic white women (31% compared to 19%). However, the majority of significant correlations persisted with adjustment for any of these factors (data not shown). Most studies to date have evaluated the association of race and metabolites in the context of disease outcomes, making comparisons with our results difficult. Furthermore, the contribution of other confounding factors, such as adiposity (as opposed to BMI) in race-metabolite relationships may be difficult to separate.
Few studies have evaluated the contribution of multiple lifestyle and clinical factors to metabolite measures. We found no statistically significant correlations between the serum or urine metabolome and self-reported physical activity or energy intake. While self-report instruments such as FFQs or other questionnaires may be useful for capturing overall trends, i.e., in habitual dietary intakes or characterizing dietary or exercise patterns, they are subject to a lack of precision and systematic bias, especially for energy intake [61][62][63]. There were also no associations with metabolites for marital status, income, or education. Lastly, four of the six classes of dietary supplements were significantly correlated with vitamin metabolites, including pantothenate and pyridoxic acid, both B vitamins, likely reflecting high supplement use in this cohort. Of note, 373 of the 444 (84%) women in this study reported taking at least one supplement evaluated.
Factors related to sample handling, including processing and storage, can affect data quality owing to the instability of some metabolites or enzymatic activity during the clotting process. Some compounds, i.e., fatty acids and sulfur-containing amino acids, are more prone to deterioration than others. Most studies evaluating metabolite stability have focused on pre-analytic factors, short-term stability, or temperature. In one study evaluating the effects of long-term sample storage on plasma metabolite abundances, only 2% of metabolites tested were altered in the first 7 years of storage, with up to 26% after 16 years [64]. Metabolites that were most affected included complex lipids, fatty acids, and amino acids. Our samples were stored at −80 • C for an average of 11 years (range of 8.9-13.6) [38]. Metabolites most affected in our study were urinary amino acids and compounds involved in energy metabolism assayed by NMR, with all having higher concentrations with longer storage time, as has been noted in previous studies [65,66]. These observations likely reflect the high precision of NMR. Given that relatively few metabolites were altered overall, i.e., 18 of >1000, these data suggest that the majority of metabolites are stable with longer-term storage of~11 years.
Many investigators have applied various statistical techniques in an effort to account for the effects of potential confounders. For example, Chen et al. used seemingly unrelated regression (SUR) to analyze metabolomics data from a colorectal cancer study and found that factors such as sex, BMI, age, alcohol use, and smoking had significant confounding effects, which could be minimized through appropriate modeling [31]. In our previous work focused on other NPAAS-related analyses, we have considered potential confounding factors for biomarker calibration-equation development to correct self-reported dietary intake [41,67,68]. We have found that including participant characteristics, such as BMI, age, dietary supplement use, the season of participation, and self-reported physical activity, in models with metabolites, strengthened diet-disease associations in larger WHI cohorts from which NPAAS derived [68,69]. Importantly, variables that have previously been found to be potential confounders between dietary exposures and chronic disease outcomes need to be adjusted in the disease risk models, even if they were part of the calibration equation development. While these studies illustrate the importance of various statistical techniques to alleviate confounding overall, none have quantified correlations between individual metabolites and potential confounders.
Strengths of this study include the evaluation of a large number of potential influencers on the metabolome, the use of multiple platforms with broad metabolome coverage to analyze both serum and urine, and a large, well-characterized post-menopausal cohort. We included many factors that have not been previously examined (e.g., dietary supplements, medication use, and long-term sample storage) and focused our analysis specifically on determining their effects on the metabolome. Nonetheless, there are some notable limitations. Importantly, our study sample included only post-menopausal women. Therefore, our results may not be generalizable to other populations. The use of boric acid as a preservative in urine may have affected some metabolite levels; however, we would expect that these would be the same across all samples. Although most factors considered had a wide range of variations and sufficient observations, some groups or factors had too few observations to provide meaningful correlations. For example, only 2% of women were current smokers. Additionally, we are evaluating these potential factors in isolation, whereas many may interact such that their individual contributions are hard to tease apart, e.g., adiposity may be a source of variation in the relationship between some covariates and many metabolites. Plans are underway to further these explorations with mathematical modeling and subsequent application to diet-disease risk assessments in other WHI cohorts.
In summary, of the factors evaluated, age, BMI, alcohol intake, and race had the largest impact on the serum metabolome, with most correlations ranging between 0.2 and 0.4. Dietary supplements and sample storage time had effects on a smaller subset of metabolites. Some factors, including commonly used prescription medications, marital status, income, education, and physical activity had little or no association with any metabolites in this cohort. The inclusion of the most relevant potential confounding factors, as well as the use of good methods to capture these data accurately, should be considered in biomarker-disease association studies using metabolomics to improve the reliability of metabolite-disease association analyses.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/metabo13040514/s1, Table S1: All measured metabolites; Table S2  Informed Consent Statement: Informed consent was obtained from all subjects involved in this study.
Data Availability Statement: Data, codebook, and analytic code used in this report may be accessed in a collaborative mode as described on the Women's Health Initiative website (www.whi.org).