Dietary Advanced Glycation End-Products and Colorectal Cancer Risk in the European Prospective Investigation into Cancer and Nutrition (EPIC) Study

Dietary advanced glycation end-products (dAGEs) have been hypothesized to be associated with a higher risk of colorectal cancer (CRC) by promoting inflammation, metabolic dysfunction, and oxidative stress in the colonic epithelium. However, evidence from prospective cohort studies is scarce and inconclusive. We evaluated CRC risk associated with the intake of dAGEs in the European Prospective Investigation into Cancer and Nutrition (EPIC) study. Dietary intakes of three major dAGEs: Nε-carboxy-methyllysine (CML), Nε-carboxyethyllysine (CEL), and Nδ-(5-hydro-5-methyl-4-imidazolon-2-yl)-ornithine (MG-H1) were estimated in 450,111 participants (median follow-up = 13 years, with 6162 CRC cases) by matching to a detailed published European food composition database. Hazard ratios (HRs) and 95% confidence intervals (CIs) for the associations of dAGEs with CRC were computed using multivariable-adjusted Cox regression models. Inverse CRC risk associations were observed for CML (HR comparing extreme quintiles: HRQ5vs.Q1 = 0.92, 95% CI = 0.85–1.00) and MG-H1 (HRQ5vs.Q1 = 0.92, 95% CI = 0.85–1.00), but not for CEL (HRQ5vs.Q1 = 0.97, 95% CI = 0.89–1.05). The associations did not differ by sex or anatomical location of the tumor. Contrary to the initial hypothesis, our findings suggest an inverse association between dAGEs and CRC risk. More research is required to verify these findings and better differentiate the role of dAGEs from that of endogenously produced AGEs and their precursor compounds in CRC development.


Introduction
Colorectal cancer (CRC) is the third most common cancer globally, and the second leading cause of cancer-related deaths [1]. The incidence of CRC follows a geographical distribution pattern, with the highest figures observed in Western countries [2], most likely attributable to the "modern" lifestyle and diet rich in energy-dense processed foods with poor nutritional value [3][4][5]. The Western diet is a substantial source of advanced glycation end-products (AGEs), an expansive group of molecules produced by irreversible non-enzymatic combination of reducing sugars and proteins, lipids, or nucleic acids [6]. The typical Western diet can also promote endogenous formation of AGEs by supplying reducing sugars and AGE precursors such as reactive dicarbonyls, i.e., methylglyoxal, glyoxal, glycolaldehyde, and glyceraldehyde [7]. Dietary AGEs (dAGEs) are known for their pro-inflammation and pro-oxidation properties in the colon and have been reported in diverse colonic pathologies, such as inflammatory bowel diseases [8]. Around 70-90% of AGEs ingested are unabsorbed [9,10] and remain in the gastrointestinal tract where they can interact directly with colon epithelial cells. The human colon is, therefore, potentially exposed to AGEs from the diet, but also from the systemic milieu by way of circulating AGEs [11,12].
AGEs have been hypothesized to be associated with CRC development [13], mostly due to their ability to promote tumor cell growth in vitro [13]. A body of mechanistic evidence has linked AGEs to CRC through stimulation of the pro-inflammatory response via the activation of the receptor of AGEs (RAGE) [14], an increase in colonic barrier permeability-allowing closer interaction of AGEs with colonic epithelium-and consequential leakage of bacterial toxins into the systemic circulation [15]. Notwithstanding these numerous plausible mechanisms, no previous epidemiological studies have investigated the relationship between dAGEs and CRC, probably due to the lack of detailed food composition databases for these compounds. The development of food composition tables for estimating dietary AGEs is recent, and few tables exist, mainly for Japanese foods [16] and more recently for European foods [17]. Due to the large number of different AGEs, the tables developed focused on the major compounds, specifically N ε -(carboxymethyl)lysine (CML). The European food composition table provides data on CML and, additionally, on two other major dAGEs: N ε -(carboxyethyl)lysine (CEL) and N δ -(5-hydro-5-methyl-4imidazolon-2-yl)-ornithine (MG-H1).
Considering the potential direct interaction of dAGEs with the colonic epithelium and their numerous CRC-promoting effects, we hypothesize a positive CRC risk association with higher dAGEs consumption. We evaluated our hypothesis using information on dietary intake of CML, CEL, and MG-H1 in the prospective European Prospective Investigation into Cancer and Nutrition (EPIC) cohort.

Study Participants
We used data from the EPIC study, a large prospective cohort with over half a million participants (n = 521,324) from 10 European countries (Denmark, France, Germany, Greece, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom) [18]. In brief, participants aged between 35 and 75 years were recruited from 1992 to 2000 in 23 participating centers. Anthropometric measures, socio-demographic information, and lifestyle and dietary intake data were collected at recruitment from all participants. Standing height, weight, and waist and hip circumferences were measured, with self-reporting exceptions in France, Norway, and Oxford. Body mass index (BMI, in kg/m 2 ) was calculated.

Ethical Considerations
Ethical approval for the EPIC study was obtained from the Ethical Committee of the International Agency for Research on Cancer (IARC) and local ethical committees. All participants provided written consent to participate in the study.

Dietary Assessment and dAGEs Estimation
Usual diet was collected at baseline using a combination of country-or center-specific questionnaires that have been validated to reflect local contexts [19,20]. Dietary data were collected during interviews in Greece, Spain, and Naples and Ragusa (Italy), whereas in other EPIC centers self-administrated questionnaires were used. Quantitative dietary questionnaires were used in Germany, Greece, the Netherlands, and Northern Italy; semiquantitative food frequency questionnaires were employed in Denmark, Norway, Naples, Umea, and the UK; and in Malmo a combination of a non-quantitative food-frequency questionnaire and a food record was utilized.
To estimate intakes of individual AGEs, we used the database for protein-bound AGEs developed for 190 food items selected from the Dutch cohort of EPIC and the Dutch National Food Consumption survey [17]. These foods were matched to the EPIC food list by name and descriptors, especially considering preparation and processing to expand the EPIC Nutrient Database (ENDB) with extra food components, a procedure used for other nutrients/anti-nutrients and described in detail elsewhere [21,22]. For complex foods with multiple ingredients, the foods were decomposed into specific ingredients or food items to generate EPIC dAGEs composition data for each food item. Thereafter, for each participant, daily intakes of CML, CEL, and MG-H1 were estimated. ∑dAGEs was calculated as the sum of individual dAGEs (CML+CEL+MG-H1) and used to picture overall AGEs intake patterns. The main food contributors of dietary CML and MG-H1 were (from highest to lowest contribution): cereals and cereal products, meats and meat products, cakes, and biscuits (Supplementary Figure S1) [23]. For CEL, the main contributors were meat and meat products, cereal and cereal products, cakes, and biscuits. Dairy products, fish and fish products, and non-alcoholic drinks were also relevant dietary sources of the three dAGEs.
We excluded participants from Greece (n = 26,048) due to data use restrictions, those diagnosed with cancer at baseline (n = 25,184), those with missing follow-up information (n = 4148) or dietary questionnaire data (n = 6259), those in the highest or lowest 1% of energy intake versus energy requirements (n = 9573), and a participant who withdrew from EPIC. Our final dataset included 450,111 participants, among whom 318,686 were women (71%).

Identification of CRC Cases
Cancer cases were ascertained from cancer registries in Denmark, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom, or by using a combination of sources, including health insurance records, oncology and pathology records, or, in the specific cases of France and Germany, through active follow-up of the participants and their relatives. CRC cases were first incident, and histologically confirmed by a pathologist. We used the International Classification of Diseases for Oncology (ICD-O, codes C18-C20) to define the cases. Colon cancers were defined as tumors that occurred in the cecum, appendix, ascending colon, hepatic flexure, transverse colon, splenic flexure, or the descending or sigmoid colon (C18.0-C18.7), and overlapping and/or unspecified origin tumors (C18.8 and C18.9). Rectal cancers were defined as tumors that occurred at the recto-sigmoid junction (C19) or rectum (C20).

Statistical Analyses
Daily intakes of CML, CEL, and MG-H1 were natural log-transformed and their standardized residuals were computed by regressing the ln-transformed values on participant energy intake and center. Ln-transformed dAGEs were divided into quintiles, with the first quintile used as the reference in all our analyses. Cox proportional hazards regression models stratified by age at recruitment (one-year categories), sex, and center were used to compute hazard ratios (HRs) and 95% confidence intervals (CIs) for the association between individual dAGEs and CRC risk. Time at entry was age at recruitment, while exit time was set as the age at which any of the following first occurred: CRC diagnosis, death, emigration, or last date at which follow-up was considered complete. To test the trends of the associations, we ran Cox models using median values of each category as a continuous variable. Analyses were also conducted using continuous variables for dAGEs (per ln(SD) increment). No deviation from the proportional hazards assumption was observed after assessing Schoenfeld residuals. Three main models were run. Model 1 was stratified by age (1-year categories), sex, and center. Model 2 was additionally adjusted for BMI (continuous), height (continuous), and lifestyle factors, including education (none; primary; technical and professional; secondary, higher), physical activity (inactive; moderately inactive; moderately active; active), smoking status and intensity (never; current smokers, cigarettes/day: 1-15, 16-25, >26; former smokers who quit: <=10, 11-20, >20 years; occasional), and total energy intake (kcal/day, continuous). Model 3 was further adjusted for the Mediterranean diet score to consider the diet as a whole, and because this score has also been specifically associated with CRC risk [24]. We considered missing data as a separate category for physical activity (1.9%), education (2.3%), and smoking (3.3%). Restricted cubic splines were used to model possible nonlinear trends [25,26]. Linearity of the associations was tested using the likelihood ratio test, comparing the model with only the linear term with the model including both the linear and the cubic spline terms. There was no indication of nonlinear associations in any of our analyses. Analyses (using Model 3) by anatomical subsites of the colorectum were also run for rectal and colon cancer, and, specifically, for proximal and distal colon cancers. Potential differences in the associations by tumor sites, i.e., rectal vs. colon or proximal colon vs. distal colon, were tested using competing risk analyses [27,28]. Stratified analyses by country, BMI, sex, and years of follow-up were conducted and multiplicative interactions were included in the model to evaluate potential heterogeneity. To model the possible impact of reverse causation, we ran sensitivity analyses by excluding the first 2 years of follow-up.
All the analyses were carried out using Stata 14.0 (StataCorp., College Station, TX, USA). We considered two-sided p-values below 0.05 as statistically significant. Table 1 summarizes selected baseline characteristics of the study participants by quintiles of ∑dAGEs. Participants in the highest quintile consumed more processed meats, cakes, biscuits, cereals and cereal products, and legumes and less fruit. They also tended to consume less sugar, confectionery, and alcohol.
Analyses by tumor subsites showed no significant heterogeneity between colon and rectal cancers, although the association with rectal cancer was statistically significant for CML (HR per ln(SD) increment: rectal cancer HR ln(SD) = 0.93, 95% CI = 0.88-0.97) and MG-H1 (rectal cancer HR ln(SD) = 0.94, 95% CI = 0.90-0.99) ( Table 3). No significant difference in the association between the individual dAGEs and CRC risk was observed by sex (Supplementary Table S1). In stratified analyses, dAGEs-CRC risk did not differ by country, and tended to be restricted to participants with BMI <30 kg/m 2 (Supplementary Table S2). When the follow-up time of cases was considered, dAGEs-CRC showed a gradient in the association between higher CRC risk observed and lower follow-up. Excluding participants with follow-up less than 2 years did not materially change the results (not shown).    Abbreviations: AGE, advanced glycation end-product; CI, confidence interval; CML, Nε-carboxy-methyllysine; CEL, Nε-carboxyethyllysine; EPIC, European Prospective Investigation into Cancer and Nutrition; HR, hazard ratio; MG-H1, Nδ-(5-hydro-5-methyl-4-imidazolon-2-yl)-ornithine; models were adjusted for body mass index, height, education, physical activity, smoking, energy intake and Mediterranean diet score and stratified by age (1-year categories), sex, and center; p for heterogeneity between colon and rectal cancer was 0.391, 0.849, and 0.825 for CML, CEL, and MG-H1, respectively; p for heterogeneity between proximal and distal colon cancer was 0.878, 0.793, and 0.804 for CML, CEL, and MG-H1, respectively.

Discussion
In this large prospective study, we found that dietary intakes of CML and MG-H1, but not CEL, were inversely associated with the risk of CRC. Our analyses did not identify any heterogeneity in these findings by anatomical subsite of the tumor within the colorectum, by sex, or by follow-up time.
Our findings were contrary to our initial hypothesis that dAGE exposure could promote CRC development. This hypothesis was based on considerable experimental evidence suggesting cancer-promoting characteristics for these compounds. Three main mechanisms have been postulated: first, AGEs may bind to the RAGE receptor in colonocytes and, subsequently, promote and sustain inflammation and oxidative stress [29,30]; second, they may modify the composition of the microbiome towards microbial genera that are deleterious to gut health [31]; and finally, they may increase gut permeability, thereby allowing bacterial translocation and increased exposure of colonocytes to toxic bacterial compounds [32]. In in vitro enterocyte models, cells treated with AGEs have shown higher expression of RAGE and an increase in inflammatory factors, such as IL-8, IL-1β, and nuclear factor-kappa B (NF-κB) [13,33], suggesting that dAGEs may produce similar effects in the gut. Nevertheless, a main condition for this to occur in vivo is that CML, CEL, and MG-H1 need to reach the colon untransformed, in a protein-bound form which could be recognized by RAGE and be able to interact with the cell surface of colonocytes. This is because several studies have reported that free AGEs or those attached to single amino acids are not as recognizable by RAGE as protein-bound AGEs [34,35]. A recent study using a dynamic in vitro model showed that protein-bound dAGEs can survive intestinal digestion and remain in the gastrointestinal tract [36]. Zenker et al. [37], using a model with casein, have shown in a recent study that unglycated proteins could also interact with RAGE. This provides additional evidence supporting the complexity of the AGEs, particularly their interactions in the gastrointestinal milieu. It is evident that more knowledge is needed on the role of the microbiome and intestinal conditions in the conservation or degradation of protein-bound dAGEs, and how this may affect RAGE-specific inflammation.
Recent growing evidence suggested that the human gut microbiome can metabolize dAGEs, possibly as much as 40% for ingested CML [38]. CML has been shown to be metabolized by the microbiome into several sub-products, including biogenic amines and fatty acids, notably N-carboxymethylcadaverine, N-carboxymethylaminopentanoic acid, N-carboxymethyl-∆1-piperideinium ion, and 2-amino-6-(formylmethylamino)hexanoic acid [39,40]. Less is known about the specific actions of these catabolic products within the colorectum, or the possible downstream molecules that can be produced from them. It is possible that these compounds may not be recognized by RAGE, and, hence, not induce an inflammatory response within the gut. Nevertheless, dAGEs have been associated with reduced diversity and richness of the gut microbiome, which is thought to be conducive to a CRC-promoting environment [15]. Thus, the possible microbial metabolism of dAGEs may not entirely negate their deleterious properties. It is also noteworthy that the three dAGEs that we assessed are thought to bind to a single domain of RAGE (V domain), whereas other AGE compounds, such as pentosidine, which are much less abundant in the diet, could bind to additional domains (V and C1 domains) [41], potentially triggering a stronger inflammatory response. This suggests that future studies should also consider studying the potential deleterious properties of other less abundant dAGEs and their possible CRC risk associations.
We are unsure why we observed inverse CRC risk associations with CML and MG-H1, whereas CEL demonstrated no association. One possible explanation may relate to the different dietary sources for these compounds. In EPIC, CML and MG-H1 share very similar food sources (e.g., mostly cereals and cereal products), while CEL is derived to a greater extent from meats [23,42]. Cereals and cereal products are major sources of dietary fiber, which has been previously associated with lower CRC risk and could partially explain the inverse association observed with CML and MG-H1 [43]. Another potential explanation for the differential CRC risk associations for these compounds may be the chemical pathways through which they are produced. Although the three AGEs are all likely derived from reducing sugars, their main precursors are reactive dicarbonyl compounds, particularly glyoxal (GO) and methylglyoxal (MGO). CML originates from GO, whereas CEL and MG-H1 are mainly produced from MGO [44]. In addition, CEL and MG-H1 (the two MGO-derived AGEs) differ in their amino acid content, as CEL is produced from lysine, while MG-H1 derives from arginine [45]. Interestingly, reactive dicarbonyl compounds are thought to possess a glycating potential that may be thousands of times higher compared to that of sugars such as glucose or fructose [46] and, hence, the CRC risk association of these dietary compounds and their precursors may also warrant further study.
The importance of studying dAGEs in CRC development lies in several potential public health measures that may be taken to control their exposures at wider population levels. For example, dAGEs may be directly targeted through adoption of specific cooking methods (e.g., steaming) to reduce the dAGE content of specific processed foods, or possibly with anti-glycation dietary compounds (e.g., polyphenols) [47] to counter any possible adverse effects [48,49].
Although, to date, no large prospective studies have explored the association between dAGEs and CRC development, the role of circulating AGEs in CRC development has been explored in three separate case-control studies nested within prospective cohorts, with discordant findings. Two of the three studies reported inverse associations with CRC [50,51], whereas one reported a positive association in male smokers [52]. It is noteworthy that, among these previous studies, two [50,52] estimated AGEs using ELISA, which is now recognized as providing biased AGE estimation [53]. Nevertheless, taken together with our study, these observations suggest that the role of AGEs in CRC development is likely to be quite complex, and further studies of other AGE compounds beyond the three studied here are also warranted.
The strengths of our study include its large study population, its prospective design, and the large number of CRC cases, which allowed us to run extensive analysis and control for a comprehensive number of confounders. In addition, our study followed the recent "quality control" recommendations for studies on AGEs, i.e., the study of several specific AGEs and the use of a validated food composition database to estimate individual dAGE exposures [54]. However, our study was limited by the single collection of dAGEs and other covariates at baseline; thus, potential changes in diet or covariates during follow-up could not be accounted for. AGE levels in the foods are influenced by cooking methods, i.e., frying, baking, or broiling, and conditions, such as cooking temperature, humidity, and pH [55]. Hence, it is possible that country-specific differences in cooking conditions and varying geographical and/or individual preferences for doneness of similar foods items could have impacted our dietary AGE estimation. Additionally, it is possible that residual confounding cannot be completely ruled out.
In conclusion, we found inverse associations between the intake of CML and of MG-H1, but not of CEL, and the risk of CRC in the large EPIC prospective cohort. Our findings corroborate some previous findings from circulating AGEs, suggesting that the three AGEs included in our study may not be CRC-promotive, as has previously been suspected. Our study provides additional evidence of the complexity of AGEs and their interaction in CRC and calls for additional studies to confirm our findings and to explore the link between CRC and other dAGEs not studied herein.   Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Data described in the study, code book and analytic code will be made available upon request. For information on how to submit an application for gaining access to EPIC data and/or biospecimens, please follow the instructions at http://epic.iarc.fr/access/index.php.