Identification of Urinary Polyphenol Metabolite Patterns Associated with Polyphenol-Rich Food Intake in Adults from Four European Countries

We identified urinary polyphenol metabolite patterns by a novel algorithm that combines dimension reduction and variable selection methods to explain polyphenol-rich food intake, and compared their respective performance with that of single biomarkers in the European Prospective Investigation into Cancer and Nutrition (EPIC) study. The study included 475 adults from four European countries (Germany, France, Italy, and Greece). Dietary intakes were assessed with 24-h dietary recalls (24-HDR) and dietary questionnaires (DQ). Thirty-four polyphenols were measured by ultra-performance liquid chromatography–electrospray ionization-tandem mass spectrometry (UPLC-ESI-MS-MS) in 24-h urine. Reduced rank regression-based variable importance in projection (RRR-VIP) and least absolute shrinkage and selection operator (LASSO) methods were used to select polyphenol metabolites. Reduced rank regression (RRR) was then used to identify patterns in these metabolites, maximizing the explained variability in intake of pre-selected polyphenol-rich foods. The performance of RRR models was evaluated using internal cross-validation to control for over-optimistic findings from over-fitting. High performance was observed for explaining recent intake (24-HDR) of red wine (r = 0.65; AUC = 89.1%), coffee (r = 0.51; AUC = 89.1%), and olives (r = 0.35; AUC = 82.2%). These metabolite patterns performed better or equally well compared to single polyphenol biomarkers. Neither metabolite patterns nor single biomarkers performed well in explaining habitual intake (as reported in the DQ) of polyphenol-rich foods. This proposed strategy of biomarker pattern identification has the potential of expanding the currently still limited list of available dietary intake biomarkers.


Introduction
In nutritional epidemiology, the accurate and precise estimation of dietary exposures is critical for an unbiased assessment of diet-disease associations. Intakes of foods, nutrients or other bioactive compounds related to health or diseases are often estimated using self-reported dietary assessment methods, such as 24-h dietary recalls  or dietary questionnaires (DQs). However, the reliability of traditional self-reported instruments has been challenged due to inherent and sizeable measurement errors [1,2]. Dietary measurement errors are a serious challenge to establish reliable diet-disease associations [3].
Over the last decades, a limited number of dietary biomarkers have been identified and implemented in nutritional epidemiology [4,5]. They have been useful as reference measurements to validate self-reported dietary assessment tools (i.e., doubly labeled water and urinary nitrogen), as complementary measurements to compare with estimates of dietary intake (i.e., fatty acids, and carotenoids in blood), or as substitute measurements for insufficient or unavailable dietary intake data (i.e., selenium and zinc in blood) [4,6]. More recently, with the development of metabolomics, novel dietary biomarkers are being identified that should further improve the accuracy of dietary intake estimation [7,8].
These biomarkers have been mostly used individually for dietary exposure assessment. However, the 'single biomarker' approach has some conceptual and methodological limitations. First, single biomarkers cannot reflect complex matrices of dietary exposures with various food groups, which consist of multiple nutrients and other food components converted to a number of metabolites through various biological pathways, including the gut microbiota. Also, there are high inter-correlations among biomarkers, and some biomarker levels are too low to be detected or to reach a statistically significant performance for use in dietary intake assessment. Therefore, a 'biomarker pattern' approach may provide a more comprehensive and accurate measurement of complex dietary exposures. In this respect, some recent studies have used combinations of dietary biomarkers to improve the accuracy of dietary exposures assessment [9,10].
Polyphenols are non-nutritive plant components widely distributed in a variety of foods including fruits, vegetables, tea, coffee and wine [11]. Research interest in polyphenols has increased due to their potential protective effects on non-communicable diseases including cardiovascular diseases, diabetes and cancer, and premature mortality [12][13][14][15]. Overall results may be promising, but they remain inconclusive, and further prospective studies assessing dietary polyphenol exposure and studies using other methods to evaluate exposure (i.e., markers of consumption, metabolism, excretion) have been recommended, as concluded in a recent meta-analysis summarizing available evidence on the association of dietary flavonoid and lignan intake with cancer risk in observational studies [15]. Polyphenol metabolites measured in biological specimens could complement traditional dietary assessment tools to improve exposure assessment. A recent systematic review using intervention studies confirmed that urinary polyphenol metabolites could serve as dietary biomarkers with high recovery yields and high correlations with intakes of polyphenol-rich food [16]. Some single urinary polyphenol metabolites, such as, for example, chlorogenic acid/caffeic acid, gallic acid/resveratrol, caffeic acid/epicatechin, and naringenin/hesperetin have been identified as potential biomarkers for intakes of coffee, wine, tea, and citrus fruits/juices, respectively [17][18][19][20]. However, we hypothesized that panels or patterns of polyphenol metabolites may better explain intake of polyphenol-containing foods.
Recently, we reported correlations of 34 individual urinary polyphenol metabolites with intake of polyphenol-containing foods in the European Prospective Investigation into Cancer and Nutrition (EPIC) cross-sectional study [20]. Some single polyphenols were found to be significantly correlated to recent intake of these foods and were proposed as potential biomarkers of intake for these foods. In the current study, the same data were used to identify patterns of urinary polyphenol metabolites by applying a new algorithm that combines dimension reduction and variable selection methods to maximize the explained variation in intake of specific polyphenol-rich foods. The ability of these urinary polyphenol patterns to rank individuals according to the intake and to discriminate between consumers and non-consumers was examined and compared with the respective performance of single polyphenols.

Subjects
This study included 475 subjects randomly selected from four European countries (i.e., Germany, France, Italy, and Greece) within the EPIC calibration study, as described in our previous study [20]. In brief, the EPIC study is an ongoing multi-center prospective cohort study with more than half a million subjects, mostly aged 35-70 years, recruited from 23 centers in 10 European countries between 1992-2000. The study was designed to investigate relations between diet, lifestyle and environmental factors, and the risk of cancer and other chronic disease by collecting information on diet and lifestyle characteristics, anthropometric measurements, and medical history [21]. For the EPIC calibration study, a single 24-HDR was collected from a random sub-sample (n = 36,900) of the entire cohort, and a 24-h urine specimen was collected from a convenient sub-sample between 1995-1999 (n = 1386) of the calibration study [22,23]. For the current study, all subjects with available data in the form of a 24-h urine specimen and a 24-HDR collected on the same day, and a country-specific validated DQ collected at different time intervals (1 day-34 months) with regard to the 24-h urine collection across centers (n = 475), were eligible [24]. These subjects were recruited from the general population residing within defined geographical areas in Germany (Heidelberg and Potsdam), Greece (nationwide) and Italy (Naples, Turin, and Varese). Subjects had come to the study from breast cancer screening in Florence, a local blood donors association and their partners in Ragusa, Italy, and from an existing cohort. The latter was the case in France, where there was a cohort based on female teachers and school workers (Paris and surrounding areas). All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the ethical review boards of the International Agency for Research on Cancer (IARC) and from local participating institutions (Project identification code: doc. SC/24/6, date of approval: September 1987).

Dietary Assessment
Dietary data were collected using a single standardized 24-HDR and a country-specific validated DQ. The 24-HDR face-to-face interview was conducted using a standardized dietary assessment methodology with a computerized program (EPIC-Soft) [22,25]. Dietary intake data using DQ with 158~266 items were self-administered or collected by face-to-face interviews to estimate usual intake over the previous 12 months [22].

Urinary Polyphenol Assessment
24-h urines were used for the measurement of urinary polyphenols. For the collection, subjects were provided two 2-L containers, each with 2 g boric acid as preservative. P-Aminobenzoic acid (PABA) was used as a marker for completeness of 24-h urine collections. After collection, 24-h urine samples were stored at −20 • C at the local center, and finally shipped within 24 h to and stored at −20 • C at the IARC, where laboratory analyses were performed after about 15 years of storage [19]. We do not expect major degradation of polyphenol metabolites during storage, and our previous studies using the same urine samples showed expected correlations between the metabolites and food intake [19,20]. As described previously [26], urine samples were first hydrolyzed with a β-glucuronidase/sulfatase enzyme mixture and the resulting polyphenol aglycones were extracted twice with ethyl acetate. Quantitative dansylation of phenolic hydroxyl groups was carried out with either 13C-dansyl chloride (samples) or non-labeled 12C-dansyl chloride (well-characterized reference pooled sample). Each 13C-dansylated sample was mixed with the 12C-dansylated reference sample, and the relative concentrations in samples over the reference were then measured by ultra-performance liquid chromatography-electrospray ionization-tandem mass spectrometry (UPLC-ESI-MS/MS). A total of 37 urinary polyphenols were measured, and their excretions in urine were expressed as µmol/24-h. Urinary polyphenol concentrations below the limit of quantification (LOQ) were replaced with values for half the LOQ. Since 98-100% of three polyphenols (procyanidins B1 and B2, and (+)-gallocatechin) values were below the LOQ, they were excluded from the analysis.

Statistical Analyses
Prior to the main statistical analyses, missing values of polyphenols were imputed by the expectation-maximization (EM) algorithm [27] after log transformation. In our previous study, center and batch were shown to explain a large part of the total variability of urinary polyphenols [20]. Urinary polyphenol measurements were therefore adjusted by taking the residuals from general linear models (GLMs), with center and batch variables as covariates. Intakes of food groups were log transformed and adjusted for energy intake by taking the residuals from GLMs with energy intake variable. Partial Pearson's correlations between 34 individual polyphenols and the intakes of 12 main food groups and their 144 sub-groups (see Tables S1-S5) were computed conditional on sex, body mass index (BMI) and age as covariates.
An algorithm using dimension reduction and variable selection methods were applied to identify patterns of polyphenol metabolites and best explain the intake of polyphenol-rich foods. The procedure of the algorithm was as follows: (1) Selecting optimal subsets of 34 polyphenol metabolites to explain intakes of specific polyphenol-rich food groups using two different variable selection methods: (i) variable importance in projection based on reduced rank regression (called the RRR-VIP method) [28] and (ii) least absolute shrinkage and selection operator (LASSO) regression [29]. (2) Identifying patterns of selected polyphenol metabolites (as predictor variables), and maximizing the explained variability of polyphenol-rich food group intakes (as response variables) through RRR analysis. (3) Evaluating the performance of the RRR models for the polyphenol metabolite patterns to discriminate between consumers and non-consumers through internal two-fold cross-validation analyses. This was achieved through splitting the data into two equal-sized subsets (a training and a test set) and calculating (i) RRR scores in the test set using factor weights derived from RRR analysis of the training set; and (ii) Pearson correlation coefficients of RRR scores with intakes and area under the receiver operating characteristic curves (ROC AUCs) for the RRR scores of the test set.
For variable selection using the RRR-VIP method, a VIP score of each polyphenol metabolite, which is a weighted sum of squares of the RRR weights accounting for the explained variance of each RRR model, was calculated, and then polyphenol metabolites with a VIP score greater than 0.85 were selected [28,30]. Alternatively, we applied LASSO regression and its five-fold cross validation to select subsets of polyphenol metabolites by shrinking i.e., setting to 0, some coefficients of the predictors [29]. Partial Pearson correlation coefficients of RRR scores with intakes of polyphenol-rich foods from 24-HDR or DQ were calculated conditional on covariates (sex, BMI and age), and ROC AUCs were adjusted for these same covariates. All analyses were conducted using the Statistical Analysis Software, release 9.4 (SAS Institute Inc., Cary, NC, USA) and R software, version R.3.1.2 (R Foundation for Statistical Computing, Vienna, Austria).

General Characteristics of the Study Population
The average age and BMI of the participants were 54 ± 8.5 years and 26 ± 4.3 kg/m 2 , respectively. The percentage of smokers (former/current) and never smokers was 62% and 36% in men, and 36% and 61% in women, respectively. The proportion of subjects with prevalent diabetes, hyperlipidemia or hypertension was 2.5%, 27.2% and 23.6%, respectively (Table 1).

Correlations between Individual Polyphenol Metabolites and Polyphenol-Rich Food Groups
In a first step, correlations between 34 individual polyphenol metabolites and the intakes of 12 main food groups and their 144 sub-groups in the EPIC study were explored to pre-select specific food groups that had sufficiently high correlations with a minimum set of polyphenol metabolites. Among the main food groups investigated, only four ('vegetables', 'fruit, nuts & seeds', 'non-alcoholic beverages', and 'alcoholic beverages') were significantly correlated with more than five individual polyphenol metabolites, while other main food groups were significantly correlated with less than three individual metabolites, and all coefficients were below 0.2 (Table S1). In a subsequent step, we examined correlations between polyphenol metabolites and food sub-groups (Tables S2-S5). Among these sub-groups, citrus fruits, apples and pears, olives, coffee, tea, all wine, and red wine were highly correlated with individual polyphenols (Table 2). For example, highly-correlated polyphenols were hesperetin (r = 0.54) and naringenin (r = 0.50) for citrus fruits, caffeic acid (r = 0.49) and ferulic acid (r = 0.42) for coffee, and gallic acid ethyl ester (r = 0.65) and resveratrol (r = 0.46) for red wine. All these food groups were selected for our multivariate analyses as polyphenol-rich food groups. The a priori arbitrarily defined criteria were that a given food group (or sub-group) showed a significant correlation with at least five polyphenols, and that at least one of these correlations was r ≥ 0.3. The criteria were chosen as a trade-off between having sufficiently informative predictor variables (i.e., polyphenols) and a wider range of potential food groups (i.e., response variables). Partial Pearson correlation with sex, BMI and age as covariates. Urinary polyphenols were adjusted for center and batch and intakes of food groups were adjusted for energy intake using residuals from general linear models (GLMs). Positive coefficients in blue cells were significant (p < 0.05) and higher coefficients had darker color.

Selection of Polyphenol Metabolites Using Variable Selection Methods
Out of 34 urinary polyphenol metabolites, sub-sets were selected for identifying patterns associated with intakes of polyphenol-rich food groups using the RRR-VIP method and LASSO regression. Selected polyphenols differed by method, but at least the first one or two polyphenols were common in both methods (Table 3). Table 3. Selected polyphenol metabolites a by reduced rank regression-based variable importance in projection (RRR-VIP) or least absolute shrinkage and selection operator (LASSO) methods (n = 475).

Discussion
We developed a novel statistical algorithm using a combination of dimension reduction and variable selection methods to integrate high-dimensional biomarker data, with the goal to complement self-reported dietary assessment methods and to improve dietary intake estimation in nutritional epidemiological studies. Here, we applied this approach to a panel of polyphenol metabolites measured in human urine and related dietary intake data. Among 34 targeted urinary polyphenol metabolites, optimal sub-sets were selected by RRR-VIP and LASSO methods, and these patterns of polyphenol metabolites derived by RRR models outperformed any single best polyphenol metabolite associated with the intake of polyphenol-rich foods, especially for coffee and olives.
Polyphenols are widely distributed in plant-based foods such as fruits, vegetables, tea, coffee and wine [11]. A previous study on dietary polyphenol intake in European countries [31] reported an average intake range of total polyphenol of 744-1786 mg/day and 584-1626 mg/day in men and women, respectively, and the main food sources of polyphenols were coffee (21-36%), tea (17-41%), fruits (9-25%), wine (10%) in Mediterranean (MED) countries, non-MED countries and the UK. In this study, polyphenol-rich food groups were pre-selected based on the correlation between food groups and individual polyphenol metabolites prior to the main analyses. Similar to the previous study, fruits (citrus fruits, apples and pears, and olives), coffee, tea, and wine food groups were also selected as polyphenol-rich food groups. Despite vegetables being regarded as a food group rich in polyphenols generally, none of the vegetable sub-groups reached our criteria for being selected as a polyphenol-rich food group. This might be explained by the observation that vegetables overall contributed only less than 5% to polyphenol intake in the EPIC study [31], and different vegetable sub-groups may thus contribute only marginally to polyphenol intake, at least in the EPIC populations.
Recently, individual polyphenol metabolites have been identified as potential biomarkers of dietary polyphenol intake [32,33]. A number of studies examined the potential role of polyphenols as dietary biomarkers in clinical trials or observational studies [17,18,[34][35][36][37][38][39][40]. Previous dietary intervention studies [34][35][36][37] have identified that flavonoids such as hesperetin, naringenin, kaempferol, phloretin, and quercetin in 24-h urine could be specific biomarkers for intakes of fruits and vegetables. Other clinical and observational studies [17,[38][39][40] found that some 24-h urinary polyphenols were good indicators of polyphenol-rich beverage consumption, such as gallic acids and resveratrol for wine, chlorogenic acid for coffee, and epicatechin for tea. However, all these previous studies examined individual polyphenol metabolites, and to the best of our knowledge, this is the first study using polyphenol metabolite patterns to investigate associations with food intake.
Conceptually similar to dietary pattern analyses [41], free-living people do not consume single polyphenols, but a combination of polyphenols coming from different food sources. Therefore, it is meaningful from a biological point of view to examine combinations of polyphenols, which is also a more comprehensive and efficient approach from a statistical point of view. In this study, we applied dimension reduction and variable selection methods to identify specific urinary polyphenol metabolite patterns associated with the intake of polyphenol-rich foods. RRR analysis is a multivariate dimension reduction technique to determine linear combinations of a set of predictors maximizing the explained variability in responses. RRR has been previously used along with principal component analysis, factor analyses, or cluster analyses for dietary pattern discovery in nutritional epidemiology [42,43]. RRR analysis is similar to partial least squares (PLS) analysis; they are both widely used in analyses of metabolomics data, and both are supervised approaches with regression-based models to reduce dimensions by extracting linear combinations of X-variables that explain variability in Y-variables [44,45]. The difference between RRR and PLS is that RRR focuses on explaining variation in Y-variables, whereas PLS seeks factors whereby the covariance between the X-and the Y-components is maximized. Therefore, in this study, applying the RRR method enabled the identification of patterns of polyphenol metabolites that maximized the explained variability of intakes of specific polyphenol-rich food groups.
For RRR analyses, sub-sets of polyphenol metabolites were pre-selected using variable selection methods: RRR-VIP and LASSO. Previous studies have already observed that most of the selected polyphenols by the two methods here are associated with polyphenol-rich foods or food groups. Hesperetin and naringenin are known abundant polyphenols in citrus fruits [46], and 3,4-dihydroxyphenlylacetic acid (3,4-DHPAA) and 3-hydroxyphenlylacetic acid (3-HPAA) are two metabolites formed from hesperetin and naringenin by the colonic microbiota [47]. Hydroxytyrosol, thyrosol and 3,4-DHPAA are predominant polyphenols in olives [48]. Caffeic acid and chlorogenic acid are two major polyphenols from coffee [49], which are metabolized into 3,4-DHPAA, protocatechuic acid, m-coumaric acid, and vanillic acid by the gut microbiota [50]. Gallic acid ethyl ester and resveratrol are the main polyphenols in red wine [48]. These selected polyphenols included all polyphenols with high coefficients in univariate comparisons. However, selected polyphenol metabolites differed by method, even though the first one or two metabolites were common in both methods. The LASSO method selected more polyphenol metabolites that were negatively associated with polyphenol-rich food intake. It seems that the variable selection using LASSO may be more affected by other factors, such as dietary patterns, while the RRR-VIP method may focus on explaining polyphenol-rich food intake itself. Despite this difference in selected polyphenol metabolites, both methods performed equally well in explaining polyphenol-rich food intake, and both methods were overall equally efficient. Future studies may again compare the performance of both methods in different study settings.
The patterns of urinary polyphenol metabolites in this study better explained acute intake assessed by 24-HDR than habitual intake assessed by DQ, and performed equally bad as any single polyphenol metabolite (Table 4). A review paper [32] suggested that urinary polyphenol metabolites were useful as biomarkers for recent intake (12-72 h) based on results of some clinical trials and on knowledge on their pharmacokinetic properties (median half-life of 2.8 h) [51]. However, relatively high stability over time has been observed for a number of polyphenol metabolites, and this is most likely explained by the frequent consumption of their main food sources [33]. This is what is observed here for coffee and wine, and this also explains the correlations observed with DQ data. For other polyphenol metabolites less frequently consumed, reproducibility in urine may be lower, and this should largely explain the lower correlation of metabolite patterns with DQ data when compared to 24-HDR data. Therefore, polyphenol metabolite patterns could be used as biomarkers for acute dietary intake of polyphenol-containing foods, or for the regular intake of more frequently consumed foods, such as coffee or wine.
The strength of this study includes, first, its statistical design/strategy, which can be easily applied to other "-omics" datasets to identify potential biomarker patterns that can serve as dietary exposure markers. Second, the availability of 24-h urine samples offered additional advantages for the accurate assessment of polyphenol metabolites over spot urine or plasma samples, which are mainly available in most other cohort studies. However, our study had also some limitations. The study design or urinary polyphenol metabolites used in this study did not allow the identification of biomarkers for habitual dietary intake, except for frequently consumed foods (coffee or wine), hence, further research is needed to identify longer-term biomarkers. In addition, this study was carried out in European populations, so the statistical algorithm for identifying polyphenol metabolite patterns should be adapted to other populations as well.

Conclusions
Urinary polyphenol metabolite patterns performed better or equally well as compared to any single best polyphenol metabolite biomarker for intakes of specific polyphenol-rich foods, especially for acute dietary intake or regular intake of frequently consumed foods. The algorithm developed using dimension reduction and variable selection could be easily extended to other metabolites, foods, and food constituents.
Supplementary Materials: The following are available online at www.mdpi.com/2072-6643/9/8/796/s1, Table S1: Correlation coefficients between urinary polyphenols and intakes of whole main food groups from 24-HDR among total subjects (n = 475), Table S2: Correlation coefficients between urinary polyphenols and intakes of vegetable groups from 24-HDR among total subjects (n = 475). Table S3: Correlation coefficients between urinary polyphenols and intakes of fruit groups from 24-HDR among total subjects (n = 475). Table S4: Correlation coefficients between urinary polyphenols and intakes of non-alcoholic beverage groups from 24-HDR among total subjects (n = 475). Table S5: Correlation coefficients between urinary polyphenols and intakes of alcoholic beverage groups from 24-HDR among total subjects (n = 475). Table S6: Correlations coefficients and ROC AUCs of RRR scores of selected polyphenol (PP) metabolites with polyphenol-rich foods from 24-HDR and DQ in total subjects (n = 475).

Conflicts of Interest:
The authors declare no conflict of interest.