Coffee Consumption and Whole-Blood Gene Expression in the Norwegian Women and Cancer Post-Genome Cohort

Norwegians are the second highest consumers of coffee in the world. Lately, several studies have suggested that beneficial health effects are associated with coffee consumption. By analyzing whole-blood derived, microarray based mRNA gene expression data from 958 cancer-free women from the Norwegian Women and Cancer Post-Genome Cohort, we assessed the potential associations between coffee consumption and gene expression profiles and elucidated functional interpretation. Of the 958 women included, 132 were considered low coffee consumers (<1 cup of coffee/day), 422 moderate coffee consumers (1–3 cups of coffee/day), and 404 were high coffee consumers (>3 cups of coffee/day). At a false discovery rate <0.05, 139 genes were differentially expressed between high and low consumers of coffee. A subgroup of 298 nonsmoking, low tea consumers was established to isolate the effects of coffee from smoking and potential caffeine containing tea consumption. In this subgroup, 297 genes were found to be differentially expressed between high and low coffee consumers. Results indicate differentially expressed genes between high and low consumers of coffee with functional interpretations pointing towards a possible influence on metabolic pathways and inflammation.


Introduction
Coffee is consumed worldwide, and consumption rates in Norway (9.7 kg per capita) are surpassed only by Finland (12.3 kg per capita) [1]. On average, Norwegian women consume 454 grams of brewed coffee per day [2].
There has been a growing interest in studying the associations between coffee consumption and health in the recent decades. Some studies have indicated that coffee is beneficial to health, and it has been linked with a decreased risk of Alzheimer's, Parkinson, and type 2 diabetes [3][4][5][6][7]. Studies have also indicated that coffee has either has a neutral or a beneficial effect on the risk of cancer, specifically associations with a probable decreased risk of liver, and endometrial cancer [8].
Other studies have revealed detrimental health effects such as increased total cholesterol and triglycerides in blood, as well as certain negative pregnancy outcomes [9][10][11][12][13] These diverse health effects may be attributed to different constituents of coffee, some of the most bioactive being caffeine, cafestol, kahweol, polyphenols, trigonellin, and polycyclic aromatic hydrocarbons [14,15].
Linking the different coffee constituents to health outcomes is challenging because of the individual variation in metabolism and physiological response to coffee. As an example, the metabolism of caffeine can vary up to 12-fold between individuals, mostly due to the variability of hepatic cytochrome p450 (CYP)1A2 activity, which metabolizes over 95% of caffeine [16].
Genes associated with either coffee or caffeine intake have been identified in genome-wide association studies of single nucleotide polymorphism (SNPs). Some of the most well established SNPs are located in CYP1A1 and CYP1A2 (caffeine metabolism), and AHR (regulation of CYP1A2) [17,18]. SNPs in these genes were also confirmed as being associated with coffee consumption in a large meta-analysis of over 120,000 individuals together with SNPs in six other genes (GCKR, ABCG2, MLXIPL, POR, BDNF, and EFCAB5) [19]. Still, the knowledge from functional genomics studies using mRNA is limited, and especially gene expression studies in peripheral blood are scarce.
The health effects of coffee consumption can also be difficult to disentangle from other diet and lifestyle factors, as many of the constituents of coffee are also present in other dietary sources. For example, tea and certain soft drinks contain caffeine, while smoking can influence the same metabolic pathways as coffee.
The Norwegian Women and Cancer Cohort (NOWAC) started its questionnaire data collection in 1991, with the aim of being a national representative, population-based cohort study [20]. Collection of whole-blood samples viable for microarray gene expression started in 2003 [21].
By using dietary data and whole-blood derived, microarray based mRNA gene expression data from NOWAC, we assessed whether high versus low consumers of coffee had differentially expressed genes that could elucidate the possible relevant biological processes associated with coffee consumption.

Study Population
The NOWAC study consists of more than 170,000 women aged 30-70 years at recruitment. These women were randomly chosen from the Norwegian central person registry, and received an invitational letter and an eight-page lifestyle and food frequency questionnaire (FFQ). Approximately 50,000 of these women also later gave blood samples eligible for gene expression analysis (the Norwegian Women and Cancer Post-Genome Cohort), and answered a two-page questionnaire about current lifestyle at the time of blood sampling. Detailed information on NOWAC is available from Lund et al. [20], and on the NOWAC Post-Genome Cohort from Dumeaux et al. [21]. The present paper describes results from a subset of the NOWAC Post-Genome Cohort, where cancer-free women (n = 977) originally enrolled as controls in one prediagnostic-and one postdiagnostic breast cancer case-control study were included. These controls were randomly drawn, but matched by age and time of inclusion in the NOWAC cohort. Women who either (1) did not answer the food frequency part of the questionnaire; (2) or did not answer the questions regarding tea and coffee consumption or (3) consumed less than 2500 KJ were excluded. Further details about dietary assessment are given below. From the 977 women in total, 958 women were left in the group "all women" after exclusion criteria were applied ( Figure 1). As smoking and tea consumption are highly confounding variables to coffee consumption, we performed a subgroup analysis of 298 nonsmokers who drank less than an average of half a cup of tea per day to isolate the effects of coffee from smoking and tea consumption.
The women gave written informed consent to donate blood samples for gene expression analysis. The NOWAC study was conducted in accordance with the Declaration of Helsinki, and approved by the Norwegian Data Inspectorate and the Regional Ethical Committee of North Norway (reference: REK NORD 2010/2075). Collection and storage of human biological material was approved by the REK in accordance with the Norwegian Biobank Act

Determination of Gene Expression Levels
Non-fasting blood samples were collected using the PAXgene™ Blood RNA System (PreAnalytiX GmbH, CH-8634 Hombrechtikon, Switzerland), with buffers specially designed for the conservation of mRNA. The samples were mailed overnight to the Department of Community Medicine at the University of Tromsø-The Arctic University of Norway, and immediately frozen at −80 °C. The samples were sent to the Genomics Core Facility at the Norwegian University of Science and Technology, and processed according to the PAXgene Blood RNA Kit protocol. Total RNA was extracted and purified using the PAXgene Blood miRNA isolation Kit. RNA purity was assessed by NanoDrop ND 8000 spectrophotometer (ThermoFisher Scientific, Wilmington, DE, USA), and RNA integrity by Bioanalyzer capillary electrophoresis (Agilent Technologies, Palo Alto, CA, USA). Complementary RNA (cRNA) was prepared using the Illumina TotalPrepT-96 RNA Amplification Kit (Ambion Inc., Austin, TX, USA), and hybridized to Illumina HumanHT-12 Expression BeadChip microarrays (Illumina, Inc. San Diego, CA, USA). The raw microarray images were processed in Illumina GenomeStudio.
The preprocessing of the dataset was performed by the Norwegian Computing Center, and the methods are further described in Günter et al. [22]. In short, the preprocessing involved (1) removal of case-control pairs where either case or control was an outlier (determined by density plot, principal component analysis, or inspection of laboratory quality measures). (2) Background correction was performed using negative control probes (R package limma: Function nec), and finally (3) filtering out probes that either were reported to have poor quality in Illumina, were detectable in <1% of samples, or that were not annotated before mapping probes to genes. The dataset was then normalized on original scale by quantile normalization (R lumi: LumiN) and log2 transformed (R lumi: LumiT). The packages R lumi: nuID2RefSeqID and R illuminaHumanv4.db were used to annotate the preprocessed dataset. The final dataset included 7741 probes and 977 individuals.

Determination of Gene Expression Levels
Non-fasting blood samples were collected using the PAXgene™ Blood RNA System (PreAnalytiX GmbH, CH-8634 Hombrechtikon, Switzerland), with buffers specially designed for the conservation of mRNA. The samples were mailed overnight to the Department of Community Medicine at the University of Tromsø-The Arctic University of Norway, and immediately frozen at −80 • C. The samples were sent to the Genomics Core Facility at the Norwegian University of Science and Technology, and processed according to the PAXgene Blood RNA Kit protocol. Total RNA was extracted and purified using the PAXgene Blood miRNA isolation Kit. RNA purity was assessed by NanoDrop ND 8000 spectrophotometer (ThermoFisher Scientific, Wilmington, DE, USA), and RNA integrity by Bioanalyzer capillary electrophoresis (Agilent Technologies, Palo Alto, CA, USA). Complementary RNA (cRNA) was prepared using the Illumina TotalPrepT-96 RNA Amplification Kit (Ambion Inc., Austin, TX, USA), and hybridized to Illumina HumanHT-12 Expression BeadChip microarrays (Illumina, Inc. San Diego, CA, USA). The raw microarray images were processed in Illumina GenomeStudio.
The preprocessing of the dataset was performed by the Norwegian Computing Center, and the methods are further described in Günter et al. [22]. In short, the preprocessing involved (1) removal of case-control pairs where either case or control was an outlier (determined by density plot, principal component analysis, or inspection of laboratory quality measures). (2) Background correction was performed using negative control probes (R package limma: Function nec), and finally (3) filtering out probes that either were reported to have poor quality in Illumina, were detectable in <1% of samples, or that were not annotated before mapping probes to genes. The dataset was then normalized on original scale by quantile normalization (R lumi: LumiN) and log2 transformed (R lumi: LumiT). The packages R lumi: nuID2RefSeqID and R illuminaHumanv4.db were used to annotate the preprocessed dataset. The final dataset included 7741 probes and 977 individuals.

Dietary Assessment and Descriptive Variables
The FFQ contains questions on quantity and frequency of the most commonly consumed food items. From these, grams per day (g/d) of the food items and total energy intake (kJ/d) were estimated. Standard portion sizes and weights were taken from the official Norwegian Weight and Measures for Foods [23], and intake of energy, alcohol and nutrients from the Norwegian Food Composition table [24]. The FFQ has been validated by test-retest reproducibility and by comparison with repeated 24-h dietary recalls [25,26]. The test-retest study concluded that the FFQ performed within the reported range for similar instruments, and the comparison with 24-h dietary recalls found that the FFQ gave a good ranking especially for foods consumed frequently. Coffee was found to have the best Spearman's rank correlation coefficient (0.82) when the FFQ was compared to the 24-h dietary recalls [26].
Coffee consumption was self-reported based on the question: "How many cups of coffee do you normally drink of each brewing method?" with the different brewing methods being boiled, filtered, and instant. The frequency of consumption was divided into seven categories: Never/seldom, 1-6 cups per week, 1 cup per day, 2-3 cups per day, 4-5 cups per day, 6-7 cups per day, and 8+ cups per day. Interval midpoints of the frequencies were used to add the different brewing methods together. Average total coffee consumption was divided into the categories: Low (<1 cup/day), moderate (≥1-≤3 cups/day), and high (>3 cups/day). This categorization of coffee cups is similar to previously conducted studies on coffee consumption in the NOWAC cohort, but due to a lower sample population in the current study only one high consumption category was used [27,28].
A second version of the FFQ also included espresso (received by 205 of 977 women), only 9 of the 78 women who answered the question on espresso consumption replied something else than never/seldom. One cup of espresso was considered equal to one cup of coffee in the analyses.
One question on green tea and one question on black tea were combined for total tea consumption. For group characteristics, the variable g/d was used. However, the sum of the midpoints of the tea consumption frequency intervals was used for further establishing a subgroup "low tea, nonsmokers" that consisted of nonsmoking women who on average consumed less than half a cup of tea per day. This was done to isolate the effects of coffee from smoking and potential caffeine-containing tea consumption.
The women reported their physical activity level (both activity at home and at work) in the FFQ on a scale from 1 to 10, with 1 being very low and 10 being very high. Education was reported as years in school, including lower education. Both information on smoking status and BMI from self-reported height and weight were taken from the two-page questionnaire filled in at time of blood sampling. The smoking question asked if the women had smoked in the week prior to the blood sampling (yes/no).

Statistical Analysis
Potential confounders were investigated by comparing the categories of coffee consumption as described above using a Kruskal-Wallis test, robust ANOVAs, and a Chi-square test with p < 0.05 as the significance threshold; subsequent post hoc methods were then used to establish a significance between coffee consumption categories. Both Kruskal Wallis and robust ANOVA showed similar results, but since no variables were normally distributed except for "red and processed meat," Kruskal-Wallis with Dunn's post hoc rank sum test is presented in the tables. Based on these initial analyses, further analyses of differential gene expression between coffee consumption categories were performed on a subgroup of "low tea, nonsmoking" consumers (298 women), in addition to "all women." In the "low tea, nonsmoking" group, the differences in age, education, and meat and dairy consumption found in the "all women" group were no longer significant, and were therefore not adjusted for. All analyses were performed using R v3.4.0 [29] and packages from R and the Bioconductor project. The R package limma [30] was used to find differentially expressed genes (false discovery rate (FDR) < 0.05 was used) between the three categories of coffee consumption. The lists of differentially expressed genes from limma were then used in clusterProfiler [31] to perform over-representation analysis (R clusterProfiler: EnrichGO) and to compare the enriched functional categories of each gene cluster between "all women" and "low tea, nonsmokers" (R clusterProfiler: CompareCluster) for biological processes within Gene Ontology (GO) terms. To ensure balanced comparisons between the gene lists of each group, the top 100 genes in each list were used to compare the groups.

Descriptors
The group "all women" consisted of 958 women with a median coffee consumption of 525 grams of brewed coffee per day. Of these 958 women, 132 (13.8%) had a low coffee consumption (<1 cup of coffee/day), 422 (44.1%) were moderate coffee consumers (≥1-≤3 cups of coffee/day), and 404 (42.2%) were high coffee consumers (>3 cups of coffee/day) ( Table 1). Filtered coffee was reported as the brewing method by 783 women, followed by instant coffee (205 women), boiled coffee (121 women), and espresso (nine women), with some women consuming more than one type of brewing method.
There was a higher percentage of women who smoked in the week before the blood sample was taken in the high coffee consumption group (36.8%) compared to both the low (14.4%) and moderate coffee consumption groups (17.3%). The high coffee consumption group also had the lowest median tea intake (0 g/d) of the three groups. The moderate group had higher median tea consumption (135 g/d) than the high coffee consumers, but the low coffee consumption group had a substantially higher intake than both moderate and high coffee consumers with a median of 405 g/d. Further, a low education level was more frequent in the high coffee consumption group than in the two other groups. There was a higher median intake of dairy products in the high (179 g/d) and moderate (175 g/d) coffee consumption groups compared to the low consumption group (128 g/d). Median consumption of red and processed meat was slightly higher in the high coffee consumption group (93 g/d), compared to the moderate (86 g/d) and low (86 g/d) consumption groups. However, for red and processed meat, the actual difference in grams was small, and this is therefore unlikely to be of clinical relevance. Table 2 describes the characteristics of the subgroup of women who did not smoke in the week before blood sample donation, and that drank less than 1-6 cups of tea per week (average of half a cup per day). This "low tea, nonsmoking" group consisted of 298 women with a median coffee consumption of 630 grams brewed coffee per day, of which 25 (8.4%) had a low coffee consumption, 139 (46.6%) were moderate coffee consumers, and 134 (45.0%) were in the high coffee consumption category.
In the "low tea, nonsmoking" group there was a difference in median energy intake among the coffee consumption categories, with a borderline significant difference (p = 0.054) between the high (7188 kJ/day) and low consumption group (6450 kJ/day), and a significant difference between the moderate (6625 kJ/day) and high group (p = 0.034).

Differential Gene Expression
When comparing high versus low coffee consumers in "all women," there were 139 significantly differentially expressed genes (FDR < 0.05) (Figure 2a, Table S1). The gene most differentially expressed (LRRN3) when comparing high versus low coffee consumers was also the only differentially expressed gene when comparing high versus moderate coffee consumption groups. When studying only those who did not smoke the week before blood sampling, 414 genes were significantly differentially expressed between high and low consumers (results not presented). In the group that consisted of the 298 women who neither smoked in the week before blood sampling nor drank more than an average of half a cup of tea per day ("low tea, nonsmoking"), 297 genes were significantly differentially expressed when comparing high versus low coffee consumers (Figure 2b, Table S2). Table 3 shows the top 20 significantly differentially expressed genes when comparing high versus low coffee consumers in the "low tea, nonsmoking" group. There were 36 genes in common between all the significantly differentially expressed genes in "all women" and "low tea, nonsmoking" groups, but there was only one gene in common between the top 50 genes for both groups. Table 3. Top 20 significantly differentially expressed genes (false discovery rate < 0.05) between high and low coffee consumers in the "low tea, nonsmoking" group.

Differential Gene Expression
When comparing high versus low coffee consumers in "all women," there were 139 significantly differentially expressed genes (FDR < 0.05) (Figure 2a, Table S1). The gene most differentially expressed (LRRN3) when comparing high versus low coffee consumers was also the only differentially expressed gene when comparing high versus moderate coffee consumption groups. When studying only those who did not smoke the week before blood sampling, 414 genes were significantly differentially expressed between high and low consumers (results not presented). In the group that consisted of the 298 women who neither smoked in the week before blood sampling nor drank more than an average of half a cup of tea per day ("low tea, nonsmoking"), 297 genes were significantly differentially expressed when comparing high versus low coffee consumers (Figure 2b, Table S2). Table 3 shows the top 20 significantly differentially expressed genes when comparing high versus low coffee consumers in the "low tea, nonsmoking" group. There were 36 genes in common between all the significantly differentially expressed genes in "all women" and "low tea, nonsmoking" groups, but there was only one gene in common between the top 50 genes for both groups. Figure 2. (a) Significantly up-(red) and down-(grey) regulated genes between coffee consumption categories for "all women." (b) Significantly up-(red) and down-(grey) regulated genes between coffee consumption categories for "low tea, nonsmokers." Table 3. Top 20 significantly differentially expressed genes (false discovery rate < 0.05) between high and low coffee consumers in the "low tea, nonsmoking" group.

Over-Representation Analysis
Over-representation analysis for the gene lists with significantly differentially expressed genes found no over-representation at FDR < 0.05. In the over-representation analysis for "all women" at p-value < 0.01 (n = 139 genes, Figure 3a), the top over-represented categories were involved in regulation and assembly of different tissues and cell constituents. In the "low tea, nonsmoking" group, processes related to immunological responses were indicated (n = 297 genes, Figure 3b). When separating the differentially expressed genes from the "low tea, nonsmoking" group into upregulated (146 genes) and downregulated (151 genes), the immunological responses were only apparent in the downregulated genes ( Figure S1).

Over-Representation Analysis
Over-representation analysis for the gene lists with significantly differentially expressed genes found no over-representation at FDR < 0.05. In the over-representation analysis for "all women" at pvalue < 0.01 (n = 139 genes, Figure 3a), the top over-represented categories were involved in regulation and assembly of different tissues and cell constituents. In the "low tea, nonsmoking" group, processes related to immunological responses were indicated (n = 297 genes, Figure 3b). When separating the differentially expressed genes from the "low tea, nonsmoking" group into upregulated (146 genes) and downregulated (151 genes), the immunological responses were only apparent in the downregulated genes ( Figure S1).

Figure 3.
Over-representation analysis of Gene Ontology biological process categories. In the figure, the color of the dots indicates the p-value, the size of the dots indicates gene count, and the GeneRatio indicate the "number of genes in common between gene list and GO-category/number of genes in gene list." (a) Over-representation analysis for "all women," using the 139 significantly differentially expressed genes between high and low coffee consumers. (b) Over-representation analysis for "low tea, nonsmokers," using the 297 significantly differentially expressed genes between high and low coffee consumers.

Figure 3.
Over-representation analysis of Gene Ontology biological process categories. In the figure, the color of the dots indicates the p-value, the size of the dots indicates gene count, and the GeneRatio indicate the "number of genes in common between gene list and GO-category/number of genes in gene list." (a) Over-representation analysis for "all women," using the 139 significantly differentially expressed genes between high and low coffee consumers. (b) Over-representation analysis for "low tea, nonsmokers," using the 297 significantly differentially expressed genes between high and low coffee consumers.
Genes related to metabolic processes were indicated in ontology categories in a group comparison of high and low coffee consumers between "all women" and "low tea, nonsmokers" when using the top 100 significantly differentially expressed genes for both groups (Figure 4).
Genes related to metabolic processes were indicated in ontology categories in a group comparison of high and low coffee consumers between "all women" and "low tea, nonsmokers" when using the top 100 significantly differentially expressed genes for both groups (Figure 4).

Discussion
In this study of Norwegian women, 139 differentially expressed genes were found in wholeblood between self-reported high and low coffee consumers. Subgroup analyses with nonsmoking, low tea consumers yielded a separate set with 297 differentially expressed genes, but comparisons of the top 100 differentially expressed genes in both groups show similar tendencies towards gene ontologies involved in general metabolic processes. An over-representation analysis of GO biological process categories for the differentially expressed genes from the "low tea, nonsmoking" group pointed towards involvement in inflammation related processes. Both the "all women" and "low tea, nonsmoking" groups demonstrated modest fold changes, and the changes were both upregulation and downregulation of expression. This indicates effects from coffee consumption on whole-blood gene expression.
The median intakes of coffee consumption found in the current study were in accordance with the average consumption (560 g/d) among Norwegian women in the age group 50-59 [2]. Energy intake in the "low tea, nonsmoking" group was highest among high consumers of coffee. Few studies have investigated the influence of coffee consumption on energy intake. The studies that exist somewhat contradict our finding, with coffee consumption either having no effect on single meal energy intake or leading to a small daily decrease in energy intake [32].
Genes indicated from the gene expression profiles in this study have not previously been associated with coffee consumption. However, we were not able to distinguish the findings from coffee consumption in the full study group due to confounding from especially smoking. Smoking is strongly associated with coffee consumption, with smokers consuming more coffee than nonsmokers do, possibly due to an increased caffeine metabolism [33][34][35]. The two top differentially expressed genes (LRRN3 and PID1) identified between current smokers and never smokers in a meta-analysis by Huan et al., [36] were the same two top differentially expressed genes between low and high consumers in the group "all women." LRRN3 was also the only gene differentially expressed between the moderate and high coffee consumers in the same group. The observation of LRRN3 and PID1 indicate a strong influence of smoking on the gene expression profiles for "all women." However, LRRN3 and PID1 were not differentially expressed between high and low consumers of coffee in the "low tea, nonsmoking" group.
SNPs linked to several genes have previously been associated with coffee consumption [17][18][19], of these only POR was found to be significantly differentially expressed in the current study, and . Group comparison of Gene Ontology biological process categories using a gene list of the top 100 significantly differentially expressed genes between high and low consumers in the "all women" group versus the "low tea, nonsmoking" group. GeneRatio indicates "number of genes in common between gene list and GO-category/number of genes in gene list."

Discussion
In this study of Norwegian women, 139 differentially expressed genes were found in whole-blood between self-reported high and low coffee consumers. Subgroup analyses with nonsmoking, low tea consumers yielded a separate set with 297 differentially expressed genes, but comparisons of the top 100 differentially expressed genes in both groups show similar tendencies towards gene ontologies involved in general metabolic processes. An over-representation analysis of GO biological process categories for the differentially expressed genes from the "low tea, nonsmoking" group pointed towards involvement in inflammation related processes. Both the "all women" and "low tea, nonsmoking" groups demonstrated modest fold changes, and the changes were both upregulation and downregulation of expression. This indicates effects from coffee consumption on whole-blood gene expression.
The median intakes of coffee consumption found in the current study were in accordance with the average consumption (560 g/d) among Norwegian women in the age group 50-59 [2]. Energy intake in the "low tea, nonsmoking" group was highest among high consumers of coffee. Few studies have investigated the influence of coffee consumption on energy intake. The studies that exist somewhat contradict our finding, with coffee consumption either having no effect on single meal energy intake or leading to a small daily decrease in energy intake [32].
Genes indicated from the gene expression profiles in this study have not previously been associated with coffee consumption. However, we were not able to distinguish the findings from coffee consumption in the full study group due to confounding from especially smoking. Smoking is strongly associated with coffee consumption, with smokers consuming more coffee than nonsmokers do, possibly due to an increased caffeine metabolism [33][34][35]. The two top differentially expressed genes (LRRN3 and PID1) identified between current smokers and never smokers in a meta-analysis by Huan et al., [36] were the same two top differentially expressed genes between low and high consumers in the group "all women." LRRN3 was also the only gene differentially expressed between the moderate and high coffee consumers in the same group. The observation of LRRN3 and PID1 indicate a strong influence of smoking on the gene expression profiles for "all women." However, LRRN3 and PID1 were not differentially expressed between high and low consumers of coffee in the "low tea, nonsmoking" group.
SNPs linked to several genes have previously been associated with coffee consumption [17][18][19], of these only POR was found to be significantly differentially expressed in the current study, and only in the group "low tea, nonsmokers." POR encodes P450 oxidoreductase that transfers electrons to microsomal CYP 450 enzymes, which are needed for the metabolism of caffeine [19].
Notably, some of the most prominent candidate genes (CYP1A1, CYP1A2 and AHR) involved in caffeine metabolism were filtered out from our expression data due to low detection rates, and we were therefore not able to assess the association between these and coffee consumption in the NOWAC cohort. Still, the fact that these genes had low detection rates indicates low expression of these genes in whole-blood. CYP1A2 is mainly expressed in the liver, and only low levels of CYP1A1 can usually be found in lymphocytes [37,38]. The association found between POR and coffee consumption might indicate that the CYP1 genes are affected in other ways than by transcriptional regulation in whole-blood. In general, genetic background must also be considered, especially sex and ethnicity can impact the expression of CYP1A2 [39,40].
Among the top 20 differentially expressed genes from the "low tea, nonsmoking" group, there were especially five genes, TXK, HLX, KDM6B, SPATA2L, and CDK5RAP1, that are of interest for further research concerning coffee consumption and gene expression. TXK and HLX are involved in development of T-helper 1 cells, which are necessary for human immune defense [41,42]. KDM6B, also known as JMJD3, takes part in inflammatory responses by participating in differentiation of macrophages [43], while SPATA2L is involved in processes related to inflammatory signaling [44]. CDK5RAP1 is a repressor of CDK5, which is a cyclin-dependent protein known to be involved in neurodegenerative diseases like Parkinson's and Alzheimer's [45,46]. However, among these five genes, only TXK was in the GO biological processes involving inflammatory responses found in the over-representation analysis.
Inflammatory response processes were indicated in the over-representation analysis on "low tea, nonsmokers." It should be taken into consideration that monocytes and lymphocytes in whole-blood are immune cells, so an expression of immune-related processes should be expected, and is often found in studies concerning diet and gene expression [47,48]. Epidemiological studies have previously discovered that coffee consumption is associated with reduced risk of death attributed to inflammatory diseases, and that coffee consumption is negatively associated with inflammatory processes [49,50]. Another study found increased concentrations of inflammatory markers among both men and females that consumed >200 mL coffee per day compared to non-consumers [51]. Other indicated effects of coffee consumption have been. e.g., increased serum cholesterol [10], reduced risk of Parkinson's disease [4][5][6], and reduced risk of type 2 diabetes [7], which are all health endpoints caused by inflammation. Thus, associations between high coffee consumption and inflammatory indicators in peripheral blood could indicate markers of related pathways. In the healthy Norwegian population over 60% of the antioxidant intake is estimated to originate from coffee [52]. The increased intake of antioxidants among coffee consumers is a plausible source for the positive influence of coffee on inflammatory processes. Negative influences on inflammatory processes might be related to cafestol and kahweol, two coffee lipids mainly found in unfiltered coffee. In particular, cafestol is associated with increased serum cholesterol, which is a known underlying factor of atherosclerosis [53,54].
Five GO categories of different biosynthetic and metabolic processes were found to be the top categories in the comparison between genes identified for "all women" and the "low tea, nonsmoking" group. The "low tea, nonsmoking" group had a higher proportion of the top 100 differentially expressed genes involved in the metabolic processes than the group "all women". The metabolic processes evident in this comparison indicate that at least certain genes found to be associated with coffee consumption are involved in the metabolism of constituents of coffee. However, when looking at the over-representation analyses, these metabolic processes were not evident.
Some strengths and limitations of this study should be considered. Gene expression profiles represent a snapshot of the mRNA transcripts available in the whole-blood at the time of blood sampling, while the FFQ represent long term dietary intake. The indicated effects are therefore likely impacted by the discrepancy between this reported long term intake and short term mRNA snapshot. The blood samples were not collected in a fasting state, and we have no data on time since coffee consumption. Caffeine has a half-life of approximately 5.5 h, but other coffee metabolites have a half-life below one hour [55]. Therefore, both in the high and low consumers there could be participants whose gene expression profiles underestimates their differential expression compared to their FFQ reported intake.
This study used a relatively large number of women compared to many other whole-blood nutrigenomic studies, the higher sample size mitigates some of the concerns of limited impact and reliability found in other studies [48].
The FFQs used in this study were comprehensive and contained most of the commonly consumed food and beverages in Norway. However, there was no question designed to capture caffeine-containing beverages other than tea and coffee, and this might lead to some residual confounding in our analyses. Coffee consumption and other dietary exposures were assessed based on self-reported data. Thus, some misclassification could have occurred in the dietary exposures, although it was likely non-differential. The participants reported cups of coffee, but was not given an estimate of an average cup size, which would have allowed more detailed assessment of consumption. However, coffee showed good validation with a Spearman's rank correlation coefficient of 0.82 when the FFQ was compared to the 24-h dietary recalls where the women reported coffee consumption either in exact amount or based on cup sizes from a picture booklet [26]. Coffee differs in chemical constituents depending on variables such as bean type, roasting of bean, grinding of bean, and soaking time of coffee grinds. This information was not available from the FFQ. Taken together, we cannot rule out the possibility of coffee category misclassification, and for some women the classification might differ between reported coffee consumption and some of its constituents due to difference in cup size, brewing strength, and other factors. In this paper, we focus on coffee per se, rather than its constituents, as this is what people consume. No biomarker assessed in blood was used to affirm the estimates of coffee.
The gene expression data was not adjusted for age, education, or consumption of red and processed meat or dairy, even though high coffee consumers reported lower education level and a higher intake of red meat and dairy. Smoking and tea consumption are two known confounders for coffee consumption, and were also associated with coffee consumption in this study. For that reason, subgroup analyses targeting women with low tea consumption (<0.5 cup/day) and no smoking in the week before blood sampling were performed. Subsequently, the associations found between coffee consumption and dairy, red and processed meat, age, and education disappeared, indicating that smoking might be driving the differences observed in the full study group, and not coffee consumption per se.
Whole-blood samples were used in the NOWAC post-genome cohort since these are relatively non-invasive and practical for cohort studies. The PAX gene Blood RNA System made it possible to ship blood samples by mail overnight without having to freeze them first, while at the same time conserving the mRNA over time. Whole-blood has been considered as a surrogate biopsy material for other tissues, due to its transporting role where it both interacts with all tissues and organs and is exposed to bioactive molecules such as nutrients, metabolites, pollutants, and waste products [56]. This makes whole-blood a viable candidate for capturing gene expression profiles associated with dietary exposure [56]. The most transcriptionally active blood cells are the leukocytes, which are important in immune responses. The gene expression microarrays were performed on whole-blood samples lacking information regarding disease status and immune cell subtypes. Gene expression profiles vary depending on differences in cellular components of the whole-blood [57], and infections or autoimmune diseases can introduce differences in these cellular components. By quantification of the blood composition, genes specific to immune cells could have been better elucidated.

Conclusions
In this exploratory cross-sectional study, we show that coffee consumption is significantly associated to differentially expressed genes in whole-blood. To the best of our knowledge this is the first study using mRNA gene expression data to elucidate how coffee consumption influences gene expression in whole-blood. Our results indicate that the differentially expressed genes between high and low coffee consumers were associated with both metabolic and inflammatory processes. Some of the top genes found to be differentially expressed are especially interesting in relation to the effect on inflammatory processes associated with coffee consumption, and warrant further investigation. However, since this is an exploratory cross-sectional study based on self-reported coffee consumptions, the results presented herein must be interpreted with care.