Plasma Metabolites Associated with Coffee Consumption: A Metabolomic Approach within the PREDIMED Study

Few studies have examined the association of a wide range of metabolites with total and subtypes of coffee consumption. The aim of this study was to investigate associations of plasma metabolites with total, caffeinated, and decaffeinated coffee consumption. We also assessed the ability of metabolites to discriminate between coffee consumption categories. This is a cross-sectional analysis of 1664 participants from the PREDIMED study. Metabolites were semiquantitatively profiled using a multiplatform approach. Consumption of total coffee, caffeinated coffee and decaffeinated coffee was assessed by using a validated food frequency questionnaire. We assessed associations between 387 metabolite levels with total, caffeinated, or decaffeinated coffee consumption (≥50 mL coffee/day) using elastic net regression analysis. Ten-fold cross-validation analyses were used to estimate the discriminative accuracy of metabolites for total and subtypes of coffee. We identified different sets of metabolites associated with total coffee, caffeinated and decaffeinated coffee consumption. These metabolites consisted of lipid species (e.g., sphingomyelin, phosphatidylethanolamine, and phosphatidylcholine) or were derived from glycolysis (alpha-glycerophosphate) and polyphenol metabolism (hippurate). Other metabolites included caffeine, 5-acetylamino-6-amino-3-methyluracil, cotinine, kynurenic acid, glycocholate, lactate, and allantoin. The area under the curve (AUC) was 0.60 (95% CI 0.56–0.64), 0.78 (95% CI 0.75–0.81) and 0.52 (95% CI 0.49–0.55), in the multimetabolite model, for total, caffeinated, and decaffeinated coffee consumption, respectively. Our comprehensive metabolic analysis did not result in a new, reliable potential set of metabolites for coffee consumption.


Introduction
Coffee, a widely consumed beverage worldwide [1], has been associated with both beneficial and detrimental effects on health-related outcomes, although harmful associations have been shown to be largely nullified by adequate adjustment for smoking [2]. The most commonly consumed types of coffee, caffeinated and decaffeinated, have consistently been associated with a lower risk of type 2 diabetes (T2D) [3]. However, these findings are based on self-reported dietary assessment methods that might be subject to some degree of misclassification or measurement error [4].
Metabolomics has the potential to advance nutritional epidemiology by objectively measuring metabolic products of foods, and might therefore better accurately reflect food exposure [5]. A comprehensive metabolite profiling may also provide a deeper understanding of metabolic response to foods providing new functional insight to their role in health. Particularly, coffee contains a variety of compounds, many of which may impact metabolic pathways related to disease development or prevention.
In this regard, coffee consumption has been found to be positively associated with two classes of sphingomyelins and negatively associated with long-and medium-chain acylcarnitines in plasma among 284 men of the KORA study population [6]. In a prospective study of 1610 EPIC-Potsdam participants, coffee consumption was inversely associated with one diacylphosphatidylcholine in both sexes and phenylalanine in men, whereas in women, coffee consumption was positively associated with three acyl-alkyl-phosphatidylcholine species [7]. Other metabolites related to coffee exposure include hydroxycinnamate derivatives, phenolic acid derivatives and dimethoxycinnamic acids [8][9][10].
Recently, a 3-stage coffee trial found 115 serum metabolites significantly associated with coffee consumption among 47 participants [11]. Five metabolic pathways were significantly enriched: (a) xanthine, (b) benzoate, (c) steroid, (d) fatty acid (acyl choline), and (e) endocannabinoid [11]. The same research team extended this work to analysis of the lipidomic changes in response to coffee consumption and found that coffee may induce alterations in glycerophospholipid metabolism [12]. However, to date, limited or no metabolomic-analysis has been conducted using combinations of different metabolomic platforms to cover a wide range of metabolites and examine their association with total and subtypes of coffee consumption.
Other studies examining the ability of metabolites to discriminate between coffee consumption categories suggested that trigonelline could be a useful marker of coffee consumption [13]. Regarding caffeine, few studies have assessed its discriminating accuracy with inconsistent results [14,15]. Caffeine is also found in tea, soft drinks, and other food items [16] making it difficult to control for this.
Using a validated multiplatform metabolomics analysis, cross-sectional associations between plasma levels of identified metabolites with self-reported total, caffeinated, or decaffeinated coffee consumption were examined in participants of the PREDIMED (Prevención con DietaMediterránea) study. The ability of metabolites to discriminate between total and the aforementioned types of coffee consumption categories was also investigated.

Study Design
This study is a cross-sectional evaluation of baseline data from two nested case-cohort studies [17] within the PREDIMED trial (ISRCTN35739639), a primary prevention cardiovascular disease (CVD) trial conducted in 7447 participants free of cardiovascular events at baseline but at high cardiovascular risk. Participants were men (55-80 years) and women (60-80 years) without CVD at baseline and fulfilling at least one of the two following criteria; presence of T2D or three or more major cardiovascular risk factors. A detailed description of the PREDIMED trial can be found elsewhere [18,19]. In brief, 7447 participants were randomly assigned to a Mediterranean diet regime, supplemented with extra-virgin olive oil; a Mediterranean diet supplemented with mixed nuts, or a control diet consisting of advice to reduce fat intake. All participants provided written informed consent, and the study protocol and procedures were approved according to the ethical standards of the Declaration of Helsinki. The Institutional Review Board (IRB) of Hospital Clinic (Barcelona, Spain) approved the study protocol on July 2002. This IRB is accredited by the US Department of Health and Human Services (DHHS). Later, the IRBs of all other centers also approved the protocol.

Subjects Selection
For the present study, 1871 out of 1882 PREDIMED subjects with available metabolomics data from the CVD [17] and T2D [20] projects, who completed a validated semiquantitative 137-item food frequency questionnaire (FFQ) were included [21]. Nutrient and energy intakes were calculated using Spanish food composition tables [22]. Participants (n = 34) who had extremes daily energy intake (<500 or >3500 kcal/day for women and <800 or >4000 kcal/day for men) were excluded from the present analysis as well as those (n = 3) with equal or more than 20% missing values in metabolites leaving 1834 subjects for further analyses ( Figure 1). Two questions from the FFQ concerned the average consumption of coffee with and without caffeine during the previous 12 months. Validity of the FFQ for coffee, estimated by the intraclass correlation coefficient was 0.75. When coffee was treated as categorical variable (consumers vs. nonconsumers) the Kappa coefficient was 0.58. Ten categories were provided, ranging from "never or almost never" to "6 times/day or more". Furthermore, the usual portion size was assessed, with 1 cup defined as 50 mL. For the present analyses, the total coffee consumption was calculated as the sum of caffeinated and decaffeinated coffee. Due to the characteristics of the population being at high risk of CVD and >86% of them reported to have hypertension, the majority of them (n = 363) presented daily coffee consumption of 50 mL per day followed by 241 participants consuming 125 mL per day. Subsequently, coffee consumers were selected with at least 50 mL per day coffee consumption. We also separated caffeinated and decaffeinated coffee consumers and excluded overlapping cases in order to avoid their confounding effect on associations between metabolites and types of coffee consumption. There were 512 caffeinated coffee consumers and 721 decaffeinated coffee consumers. We categorized participants into (1) two groups of total coffee, 285 nonconsumers (0 mL coffee per day) vs. 1379 consumers (50 mL or more coffee per day); (2) two groups of caffeinated coffee, 285 nonconsumers (0 mL coffee per day) vs. 512 consumers (50 mL or more coffee per day); and (3) two groups of decaffeinated coffee, 285 nonconsumers (0 mL coffee per day) vs. 721 consumers (50 mL or more coffee per day). Finally, the consumption of other sources of caffeine, including tea and cocoa was relatively low (n = 284 and n = 115, respectively) among the 1664 study participants.

Metabolomics
Fasting (for ≥8 h) plasma EDTA samples were collected from subjects and stored at −80 • C. Pairs of samples for each participant were randomly ordered and analyzed using two liquid chromatography tandem mass spectrometry (LC-MS) methods to measure polar metabolites and lipids as described previously [23][24][25]. Briefly, amino acids and other polar metabolites were profiled with a Shimadzu Nexera X2 U-HPLC (Shimadzu Corp.) coupled to a Q Exactive mass spectrometer (ThermoFisher Scientific). Metabolites were extracted from plasma (10 µL) using 90 µL of 74.9:24.9:0.2 (v/v/v) of acetonitrile/methanol/formic acid containing stable isotope-labeled internal standards (valine-d8 (Sigma-Aldrich) and phenylalanine-d8 (Cambridge Isotope Laboratories)). The samples were centrifuged (10 min; 9000× g; 4 • C) and the supernatants were injected directly on to a 150 × 2-mm, 3-µm Atlantis HILIC column (Waters). The column was eluted isocratically at a flow rate of 250 µL/min with 5% mobile phase A (10 mmol ammonium formate/L and 0.1% formic acid in water) for 0.5 min followed by a linear gradient to 40% mobile phase B (acetonitrile with 0.1% formic acid) over 10 min. MS analyses were carried out using electrospray ionization in the positive-ion and full-scan spectra were acquired over 70-800 m/z. Lipids were profiled using a Shimadzu Nexera X2 U-HPLC (Shimadzu Corp.; Marlborough, MA) coupled to an Exactive Plus orbitrap mass spectrometer (Thermo Fisher Scientific; Waltham, MA). Lipids were extracted from plasma (10 µL) using 190 µL of isopropanol containing 1,2-didodecanoyl-sn-glycero-3-phosphocholine (Avanti Polar Lipids; Alabaster, AL) as an internal standard. Lipid extracts (2 µL) were injected onto a 100 × 2.1 mm, 1.7 µm ACQUITY BEH C8 column (Waters; Milford, MA). The column was eluted isocratically with 80% mobile phase A (95:5:0.1 v/v/v 10 mM ammonium acetate/methanol/formic acid) for 1 min followed by a linear gradient to 80% mobile-phase B (99.9:0.1 v/v methanol/formic acid) over 2 min, a linear gradient to 100% mobile phase B over 7 min, then 3 min at 100% mobile-phase B. MS analyses were carried out Nutrients 2019, 11, 1032 5 of 13 using electrospray ionization in the positive ion mode using full scan analysis over 200 to 1100 m/z. Raw data were processed using Trace Finder version 3.1 and 3.3 (Thermo Fisher Scientific) and Progenesis QI (Nonlinear Dynamics; Newcastle upon Tyne, UK). Polar metabolite identities were confirmed using authentic reference standards and lipids were identified by head group and total acyl carbon number and total acyl double bond content. To enable assessment of data quality and to facilitate data standardization across the analytical queue and sample batches, pairs of pooled plasma reference samples were analyzed at intervals of 20 study samples. One sample from each pair of pooled references served as a passive QC sample to evaluate the analytical reproducibility for measurement of each metabolite while the other pooled sample was used to standardize at using a "nearest neighbor" approach. Standardized values were calculated using the ratio of the value in each sample over the nearest pooled plasma reference multiplied by the median value measured across the pooled references. Plasma levels of 398 metabolites were measured.

Statistical Analysis
Baseline characteristics of study participants were described according to nonconsumers and frequent total coffee consumers as means (SD) for quantitative traits and percentages for categorical variables. From the 398 metabolites measured in the present study, 11 metabolites were removed due to high number of missing values (i.e., ≥20%), leaving 387 metabolites for further analysis. Missing values of individual metabolites were imputed (in those metabolites with less than 20% of missing values) using the random forest imputation approach ("missForest" R package). Missing values are those determinations that were below the limit of detection. The levels of metabolites were normalized and scaled to multiples of 1 SD with the rank-based inverse normal transformation. Due to the high dimensionality and collinear nature of the data, logistic regression with elastic net penalty was implemented in the "glmnet" R package (alpha = 0.5) to build a discrimination model for frequent coffee consumption. We performed 10-fold cross-validation to find the optimal value of the tuning parameter that result in a mean squared error within 1-SD of the minimum [26]. The discrimination accuracy was examined based on parameters of lamda.min. The discrimination model scores were computed as the weighted sum of all metabolites with weights equal to the regression coefficients from the discrimination models. To estimate the discrimination accuracy we split the data into 90% set and 10% set. Within the 90% set, we used the same elastic net procedure we used to build the model. Another 10-fold cross-validation was used to tune the model parameters. Then, we used the outer 10% set to evaluate the model built at the previous step. This procedure guarantees that the outer 10% set is completely separated from the model building procedure, thus the discriminative accuracy estimated in this step is unbiased. We then repeated all these steps for 10 times and averaged their discriminative accuracy in the 10% set. Since each of them is unbiased estimate of discriminative accuracy, the average is also unbiased. Logistic regression analysis was also performed and the derived coefficients were used to build models consisting of either only caffeine or 5-acetylamino-6-amino-3-methyluracil as concerns total and caffeinated coffee. The area under curve (AUC) was used to assess the discriminating power of the discrimination models for total and subtypes of coffee consumption. We compared the AUCs of the models including metabolites selected from elastic net regression analysis with a model including only caffeine or 5-acetylamino-6-amino-3-methyluracil using a nonparametric method. All analyses were performed using R statistical package 3.4.3 (www.r-project.org) (R Development Core Team, 2012).

Participants' Characteristics
Participants' characteristics are summarized in Table 1. The mean age of participants at baseline was 67.1 years and the mean BMI was 29.9 kg/m 2 . As compared with nonconsumers, those participants consuming coffee were more likely to be men, current smokers, and to have a higher prevalence of hypercholesterolemia in addition to higher BMI (Table 1). Data shows mean ± SD or number (%).* median (min, max). a p-value < 0.05 (Student's t-test between coffee categories). b p-value < 0.05 (X 2 between coffee categories).

Associations between Plasma Metabolites and Total Coffee Consumption
Metabolites (n = 10) ranked from the highest to the lowest elastic net positive and negative regression coefficients for caffeinated coffee consumption are displayed in Tables 2 and 3, respectively. Positive regression coefficients were found for four metabolites, caffeine, AAMU, sphingomyelin 24:0 and cotinine; while negative for sucrose, proline betaine, acetaminophen, one lyso-phosphatidylethanolamine (LPE) (16:0), piperine, and hypoxanthine.

Discussion
In the present study, specific metabolites associated with total, caffeinated and decaffeinated coffee consumption were identified among participants of the PREDIMED study. It was also demonstrated that caffeine alone had better discriminative ability for total and caffeinated coffee consumption as compared to a set of metabolites which also included caffeine, and therefore may be used as biomarker of consumption complementing traditional dietary assessment tools. Similarly, a previous cross-sectional study found that urine caffeine is useful (AUC: 0.85) to distinguish between those subjects drinking less than one caffeinated coffee per week versus at least one caffeinated coffee [14]. However, in the aforementioned study risk of overfitting is high due to the lack of internal or external validation [15].On the other hand, a previous metabolomics study showed that caffeine was much less performant to predict coffee consumption in a cohort study than other coffee metabolites alone or in combination [27]. We found a low discrimination of decaffeinated coffee categories from

Discussion
In the present study, specific metabolites associated with total, caffeinated and decaffeinated coffee consumption were identified among participants of the PREDIMED study. It was also demonstrated that caffeine alone had better discriminative ability for total and caffeinated coffee consumption as compared to a set of metabolites which also included caffeine, and therefore may be used as biomarker of consumption complementing traditional dietary assessment tools. Similarly, a previous Nutrients 2019, 11, 1032 9 of 13 cross-sectional study found that urine caffeine is useful (AUC: 0.85) to distinguish between those subjects drinking less than one caffeinated coffee per week versus at least one caffeinated coffee [14]. However, in the aforementioned study risk of overfitting is high due to the lack of internal or external validation [15].On the other hand, a previous metabolomics study showed that caffeine was much less performant to predict coffee consumption in a cohort study than other coffee metabolites alone or in combination [27]. We found a low discrimination of decaffeinated coffee categories from the multimetabolite model. Recently, a cross-sectional study investigating how accurately could plasma trigonelline discriminate consumers from nonconsumers of coffee suggested that this plant alkaloid may serve as a useful marker of coffee consumption (AUC:0.92) [13]. However, the lack of internal or external validation limits the generalizability of this marker.
A consistent set of three metabolites was positively associated with total and caffeinated coffee consumption. Plasma caffeine and one of its major metabolites, AAMU, reflect caffeine exposure [28]. Cotinine, the main metabolite of nicotine [29] could confirm the higher prevalence of tobacco exposure in frequent coffee consumers in our study. Higher cigarette consumption has been suggested to causally increase coffee consumption [30].
One sphingomyelin (SM 24:0) was also positively associated with total, caffeinated and decaffeinated coffee consumption. A previous study among German elders found two classes of SMs, one containing a hydroxygroup and the other having an additional carboxygroup to be directly associated with coffee consumption [6]. Furthermore, we found one phosphatidylcholine (C40:6) positively associated with decaffeinated coffee consumption. A cross-sectional analysis in participants in the EPIC-Potsdam study suggests associations of coffee consumption with certain acyl-alkyl-phosphatidylcholines, including C40:6 [7].
The positive association found in the present study between alpha-glycerophosphate (alpha-GP) and decaffeinated coffee is interesting since this metabolite is formed from glycolysis and previous experiments using perfused rat adipocytes propose a model, where alpha-GP combines with long-chain coenzyme A (LC-CoA) to form triacylglycerol (TAG), thus decreasing LC-CoA and its inhibition of TAG lipases leading to release of free fatty acids and glycerol [31]. Finally, hippurate, a gut microbial metabolite of polyphenol metabolism, has been previously associated with the consumption of polyphenol-rich foods including coffee [32].
Concerning metabolites negatively associated with total coffee consumption, proline betaine had the highest coefficient in our study. This metabolite is a biomarker for citrus fruit consumption [33]. Caffeine has been considered as a diuretic, mainly because of its caffeine content [34], and this action could increase the renal excretion of proline betaine as well as ofpiperine, the major bioactive component of pepper [35], acetaminophen also known as paracetamol and 7-methylguanine. The latter metabolite is considered to be a useful marker of DNA damage caused by nitrosamines in tobacco smoke and has been found significantly higher in the urine of smokers than in non-smokers [36]. Sucrose was also recorded to be negatively associated with total and caffeinated coffee consumption and its levels in blood may reflect gastric permeability to sucrose [37].
Inverse associations between coffee consumption and systemic levels of a number of different inflammatory markers including interferon-γ (IFN-γ) have been reported [38] and IFN-γ-mediated breakdown of tryptophan to kynurerine and downstream kynurenine metabolites like kynurenic acid could be counteracted by the consumption of coffee [39]. This could partially explain the inverse association between kynurenic acid and coffee consumption in our study.
Plasma glycocholate levels were also recorded to be negatively associated with total coffee drinking in our study. Glycocholic acid is a conjugated bile acid (cholic acid with glycine) synthesized in the liver and is involved in dietary lipids emulsification and cholesterol absorption [40]. Whether coffee consumption has an effect on glycocholic acid synthesis and/or release into circulation is unknown and needs further investigation. A disruption in bile acid synthesis could lead to cholesterol homeostasis dysregulation [41]. Meta-analyses of clinical trials of the effect of coffee consumption on blood lipids suggest increases in cholesterol levels [42,43], which is in line with our findings.
The inverse association between coffee consumption and plasma lactate levels merits discussion. Caffeine is known to have a stimulating effect on the central nervous system. An increase in energy expenditure was observed in a small study of older men after moderate coffee consumption [44]. Furthermore, a study conducted by Engels and Haymes (1992) [45] showed increases in pre-exercise free fatty acids, glycerol, and lactate concentrations in sedentary men after a single dose of caffeine (5 mg·kg −1 ). However, if free fatty acids are the main source of fuel, lactate levels should not increase.
Finally, the physiological significance of the inverse association of LPE 16:0 with caffeinated and decaffeinated coffee consumption requires further work hypotheses and research to be determined.
Allantoin is produced from the nonenzymatic oxidation of uric acid in humans and is considered to be a specific biomarker of oxidative stress [46]. It has been hypothesized that the diuretic action of caffeine might lower blood uric acid concentrations [47] and subsequent allantoin production. However, in our study lower plasma levels of allantoin were associated with decaffeinated coffee drinking.
The results of the present study should be interpreted in the context of its limitations and strengths. First, although the FFQ used to assess coffee consumption was validated, misclassification bias cannot be completely excluded. Secondly, even though we used a comprehensive metabolic analysis to cover a wide range of metabolites, some of those previously associated with coffee consumption were neither selected by the model nor identified in the platforms we used. Thirdly, the cross-sectional design does not allow making any causal inference of the observed associations, and therefore both directions are plausible. Finally, participants were elderly Mediterranean individuals at high cardiovascular risk consuming low amounts of coffee per day and this may limit the generalizability of the findings to other age-groups or populations. Regarding strengths, we have used a multimetabolomics approach in order to analyze a wide range of biochemical compounds in a relatively high sample size and have internally cross-validated our results.
In conclusion, different sets of plasma metabolites were associated with the consumption of 50 mL or more per day of total, caffeinated and decaffeinated coffee in a Mediterranean population at high cardiovascular risk. These sets consisted of caffeine; 5-acetylamino-6-amino-3-methyluracil; cotinine; lipid species, such as sphingomyelin, phosphatidylethanolamine, and phosphatidylcholine; and other metabolites including alpha-glycerophosphate, hippurate, kynurenic acid, glycocholate, lactate, and allantoin. Plasma caffeine appeared to best discriminate total and caffeinated coffee consumers versus nonconsumers. Despite a low discriminative ability of decaffeinated coffee observed from the multimetabolite model this suggests that other potential metabolites could be used as candidate biomarkers of consumption. Therefore, our comprehensive metabolic analysis did not result in new, reliable potential set of metabolites for coffee consumption.At least in our population the selected metabolites did not accurately discriminate coffee consumers. Dietary questionnaires remain practical and one of the most affordable way to gather coffee intake data. Further studies are needed for assessing the involvement of the identified metabolites in health or disease.

Conflicts of Interest:
The authors declare no conflicts of interest.