Polymorphisms in the Angiogenesis-Related Genes EFNB2, MMP2 and JAG1 Are Associated with Survival of Colorectal Cancer Patients

An individual’s inherited genetic variation may contribute to the ‘angiogenic switch’, which is essential for blood supply and tumor growth of microscopic and macroscopic tumors. Polymorphisms in angiogenesis-related genes potentially predispose to colorectal cancer (CRC) or affect the survival of CRC patients. We investigated the association of 392 single nucleotide polymorphisms (SNPs) in 33 angiogenesis-related genes with CRC risk and survival of CRC patients in 1754 CRC cases and 1781 healthy controls within DACHS (Darmkrebs: Chancen der Verhütung durch Screening), a German population-based case-control study. Odds ratios and 95% confidence intervals (CI) were estimated from unconditional logistic regression to test for genetic associations with CRC risk. The Cox proportional hazard model was used to estimate hazard ratios (HR) and 95% CIs for survival. Multiple testing was adjusted for by a false discovery rate. No variant was associated with CRC risk. Variants in EFNB2, MMP2 and JAG1 were significantly associated with overall survival. The association of the EFNB2 tagging SNP rs9520090 (p < 0.0001) was confirmed in two validation datasets (p-values: 0.01 and 0.05). The associations of the tagging SNPs rs6040062 in JAG1 (p-value 0.0003) and rs2241145 in MMP2 (p-value 0.0005) showed the same direction of association with overall survival in the first and second validation sets, respectively, although they did not reach significance (p-values: 0.09 and 0.25, respectively). EFNB2, MMP2 and JAG1 are known for their functional role in angiogenesis and the present study points to novel evidence for the impact of angiogenesis-related genetic variants on the CRC outcome.


Introduction
Colorectal cancer (CRC) is the second most common cancer and the second leading cause of cancer death among men and women throughout the world asserting major public health problems [1]. Disease risk and prognosis are to a large proportion modifiable with obesity, red meat consumption and sedentary lifestyle increasing the risk and physical activity, NSAID (non-steroidal anti-inflammatory drug) use and fiber intake being protective [2][3][4]. However, there is also a genetic component to the disease, which is estimated to explain up to 35% of the heritability in colorectal cancer risk [5,6]. Some studies point to genetic associations with the CRC outcome [7][8][9]. However, evidence is still limited.
Angiogenesis, the generation of new blood vessels from pre-existing vessels, is crucial for tumor growth, progression and metastasis as it provides nutrients and oxygen to the growing tumor [10]. While in normal tissue, angiogenesis is tightly controlled and only transiently turned on, in carcinogenic progression an 'angiogenic switch' perpetuates the formation of new blood vessels [11].
VEGF (vascular endothelial growth factor) as one of the most potent endothelial cell mitogens is a major driver of angiogenesis. It can act in an autocrine or paracrine manner through interaction with its receptors (VEGFR1-3, also named FLT1 (fms related tyrosine kinase 1), KDR (kinase insert domain receptor) and FLT4 (fms related tyrosine kinase 4), respectively) and co-receptors (neuropilins, NRPs) [12]. Such interactions may further be regulated through ephrinB2 (EFNB2) and finally stimulate proliferation, migration of endothelial cells, cell degradation, remodeling of the extracellular matrix (ECM) and angiogenesis [13]. Ephrins and their receptors (Eph receptors) are crucial in numerous biological processes [14]. Their bidirectional signaling triggers cascades in both, the receptor-and the ligand-bearing cells and deregulated Eph/ephrin activation as well as altered expression of the receptors or their ligands have been frequently reported in a variety of tumors including colorectal neoplasms [14,15]. Based on these observations, targeted interference with Eph/ephrin signaling has been proposed for cancer therapy [16].
Matrix metalloproteinases (MMPs) are critical in ECM remodeling [13,17]. In addition to their capability to promote bioavailability of VEGF, the proteolytic activities of MMPs facilitate protein degradation and support vessel formation [18]. VEGF signaling also regulates DLL4 expression, which is one of two key ligands of NOTCH4 (notch receptor 4), the second ligand being JAG1 (jagged canonical Notch ligand 1) [19]. Concurrently, NOTCH4 promotes VEGFR expression [20].
An individual's inherited genetic background may modify angiogenesis-related signaling cascades that affect blood supply of premalignant lesions and/or macroscopic tumors [21]. We hypothesize that this may result in altered cancer risk or disease outcome after colorectal cancer diagnosis.
In the present study, we investigated the association between angiogenesis-related genetic variants and risk of CRC, overall survival (OS) and recurrence-free survival (RFS) of colorectal cancer patients in 1754 CRC cases and 1781 healthy controls using a comprehensive targeted genotyping approach and validated some of the associations.

Results
In this study 392 SNPs in 33 angiogenesis-related genes (Table S1) were investigated for their association with risk of colorectal cancer and survival of colorectal cancer patients in a population of 1754 colorectal cancer cases and 1781 healthy individuals (Tables S2, S3 and S4, respectively). Characteristics of the study population are presented in Table 1.

Risk
Higher education and the use of NSAIDs were associated with reduced risk of colorectal cancer. In addition, cases were more likely to smoke regularly, consume higher amounts of red meat and alcohol and were more often diabetic and obese (Table 1a).
No variant was associated with the risk of colorectal cancer after adjustment for multiple testing (Table S2). Furthermore, we did not observe statistically significant interactions of any of the investigated polymorphisms with NSAID use or smoking in relation to the risk of colorectal cancer.

Overall and Recurrence-Free Survival
During a median follow up of five years, 538 patients died and 468 had a recurrent event. Deceased patients were older, diagnosed with higher disease stages and tumor grade. They were less likely to consume alcohol and to be overweight or obese (Table 1b).
SNPs in MMP2, EFNB2 and JAG1, which were significantly associated with patients' survival, were subsequently analyzed in a first validation set of 2165 (for MMP2 and JAG1), and 1638 (for EFNB2) DACHS colorectal cancer patients and a second validations set of 372 colorectal cancer patients from TCGA where genome-wide data was available.
In the first validation set of imputed genotypes from the DACHS study with a median follow up of five years, the association with rs9520090 in EFNB2 was validated in a dominant model (HR GG vs. GC/CC : 1.22, 95% CI: 1.00-1.47, p-value: 0.05). The variant rs6040062 in JAG1 showed a similar direction (HR: 1.10) in the validation set, however the association was not significant (p-value: 0.25). No other association was validated in this dataset (Table S5).
The gene-environment interaction between overall survival and smoking, NSAID use, adjuvant 5-FU-based chemotherapy and microsatellite status was investigated for all variants (Table S7). Some indication for interaction was observed for two tagging SNPs in ETS1 (rs11221322 and rs7121854: p interaction : 0.0004 and 0.001, respectively) and the candidate missense variant in TEK V486I (p interaction : 0.003) with MSI (microsatellite instability), for the candidate missense variant KDR Q472H (rs1870377; p interaction : 0.03) with smoking, for the candidate missense variant TEK V486I (rs1334811, p interaction : 0.03) with NSAID use and for the IL10 candidate rs1800890 (p interaction : 0.01) with adjuvant 5-FU-based chemotherapy for overall survival of colorectal cancer patients. These associations did not remain significant after adjustment for multiple testing.

Discussion
Genetic variation may contribute to an 'angiogenic switch', which is essential for blood supply and tumor growth in microscopic and macroscopic tumors. In the present study, we investigated the association of polymorphisms in angiogenesis-related genes with the risk of colorectal cancer as well as with overall survival and recurrence-free survival of colorectal cancer patients in the DACHS population-based case-control study using 1754 CRC cases and 1781 controls. We chose a targeted candidate gene approach to obtain a more complete coverage of the hypothesized genes and avoided potentially high numbers of false-negative findings, which are often the result of large-scale genotyping studies. We observed significant associations between variants in EFNB2, MMP2 and JAG1 and overall survival. Some of the associations could be validated.
Ephrins are membrane-associated ligands that function through interaction with their receptors, erythropoietin-producing hepatocellular (Eph) receptor tyrosine kinase. The unique ligand-receptor interaction, which requires cell-cell contact, can trigger bidirectional signaling, with effects in both the ligand-and the receptor-bearing cell [22]. Downstream or upstream signaling cascades regulate cell proliferation, migration, adhesion, differentiation, survival and angiogenesis and are crucial for homeostasis in epithelial tissues. In this study, we observed significant associations between five variants in ephrin B2 (EFNB2) and OS of colorectal cancer patients. Rs9520087 is located in the 3 UTR of EFNB2, which may suggest regulatory effects on gene expression of EFNB2, however, the in silico functional follow up did not reveal any associations with gene expression. It is not linked to any known variant and it is covered only by a small number of genome-wide genotyping arrays. Imputation of the variant in a validation set of 2165 CRC cases resulted in 10% of inaccurately imputed alleles, which may be the reason why this variant has not been identified earlier [23]. Similarly, other EFNB2 variants are only present on a limited number of genome-wide genotyping arrays. Rs3742159, linked to rs9520088 (r 2 = 1) displayed a RegulomeDB rank of 3a. It is located within a region, which likely affects binding of transcription factors, one of which was identified to be NFKB [24]. None of the other linked SNPs seems to be functionally important. The SNPs rs2391333 and rs9520090 modify the methylation status of the site chr13:107163855 (methylation probe cg05493945) within EFNB2. Whether this results in changed gene expression of EFNB2 needs to be further investigated. A number of studies have shown that EFNB2 and its receptor EPHB4 are transcriptionally upregulated in colon cancer carcinoma cell lines and in colon cancer tissue as compared to adjacent mucosa, supporting a role of EFNB2 in carcinogenesis potentially through modulation of angiogenic signals, even though the exact mechanisms remain to be fully elucidated [25]. No other gene, which may be relevant for cancer progression is located in the vicinity of this region, thus it seems plausible, that the identified associations with survival of colorectal cancer patients are linked to EFNB2.
We observed a statistically significant association with OS of colorectal cancer patients for four MMP2 tagging SNPs (rs17301608, rs2241145, rs1561217 and rs243847), all located in introns or in non-coding exon regions, while a previous association of the MMP2 rs243865 (−1306 C > T-in our study the linked SNP rs243866 was genotyped, r 2 : 0.99) in smaller studies of OS or CRC risk was not confirmed [26,27]. The major task of MMPs is the degradation of proteins, which is an essential step in many hallmarks of cancer in regulating cell growth, differentiation, apoptosis, migration, invasion, immune surveillance and cancer angiogenesis [18]. Consequently, MMPs play a crucial role in malignant transformation and cancer progression. To our knowledge, neither of the polymorphisms or any of the linked variants has previously been reported for their association with OS of colorectal cancer patients. In silico functional follow up of rs17301608 revealed that three of the linked variants (rs1477017, rs11646643 and rs9302671; r 2 > 0.8) ranked low (1f) at RegulomeDB, supporting a regulatory role. They likely affect the binding of transcription factors (TF), potentially of NR2F2 (nuclear receptor subfamily 2 group F member 2), which was previously shown to regulate cell growth, invasiveness, metastasis and angiogenesis [28,29]. Furthermore, all three variants linked to rs17301608 are reported expression quantitative trait loci (eQTLs) for LPCAT2 (lysophosphatidylcholine acyltransferase 2), which was previously associated with the progression of breast cancer, ovarian cancer and CRC in human tissue and in cell lines [30]. Finally, data from TCGA supports the function of rs2241145 and rs17301608 as eQTLs on LPCAT2 expression in prostate adenocarcinoma [31].
Aberrant NOTCH signaling promotes the development and progression of colorectal cancer and is partly driven by the overexpression of JAG1, a ligand of NOTCH4 [32]. We identified one tagging SNP (rs6040062) in JAG1, which was associated with shorter OS of colorectal cancer patients. The variant and its linked SNPs rs3748480 (r 2 : 1.0) and rs112946915 (r 2 : 0.97) are intronic. The lowest functional RegulomeDB rank was reached by rs112946915 (3a), which modifies binding motifs for transcription factors STAT1 and IRF1 among others. JAG1 has been considered as an attractive target for colorectal cancer therapy, as it seems to be a more specific target than other members of the NOTCH pathway [33].
Our results are based on more than 1700 CRC cases and 1700 healthy controls, with comprehensive clinical, epidemiological and demographic data, which is a major strength of the study. The detailed information on lifestyle factors as well as the availability of clinical patients' characteristics enabled us to adequately account for relevant environmental factors and test for effect modification. Although our sample size was fairly large, some of the interaction analyses were based on small strata. Unfortunately, the MSI status was missing for 22% of the population and 9% of investigated tumors were MSI-high. Thus, associations should be considered with caution. Furthermore, we had access to two validation sets and validated the association of the EFNB2 tagging SNP rs9520090 (or its linked variant rs7983579) with the overall survival of colorectal cancer. In addition, tagging SNPs in JAG1 and MMP2 showed the same direction of association with overall survival in the first and second validation set, respectively.
The signals of the first validation set were weaker than we expected considering that the discovery and first validation sets were derived from the same study population. We thus compared both sample sets with respect to genetic, epidemiological and clinical data. For up to 67 individuals, imputed genotypes from genome-wide data and at the same time directly genotyped genotypes were available. We observed a discrepancy of 5%-25% between the directly genotyped and imputed genotypes, suggesting an insufficient imputation accuracy for these regions and supporting the choice of a targeted candidate gene approach. The accuracy of imputed genotypes strongly relies on linkage disequilibrium (LD), consequently they depend on the genotypes measured within the same LD-block. EFNB2 and JAG are located in regions of rather low LD, which reduces the probability to correctly impute missing genotypes [34]. We further observed differences between the two study sample sets with respect to overall survival, alcohol consumption and tumor grade of patients. We adjusted for these variables, however, other confounders not captured within this study may explain the discrepant association strengths between the two study samples. While 98% of participants reported to be of German nationality, the ethnic background of study participants was not assessed as part of this study. We cannot rule out an effect of ethnicity on the identified associations, even though this is likely to be very small, because they are expected to be of Caucasian origin.
In conclusion, we identified novel genetic associations with the overall survival of colorectal cancer patients. To our knowledge, no previous study has reported associations between SNPs in EFNB2 or JAG1 with the CRC outcome, both genes known for their functional role in angiogenesis and their impact on disease progression. The present study cannot point to the causal variant or elucidate the functional role of the identified polymorphisms and additional investigations in other populations are required to further validate these associations and to unravel the causal variants and their functionalities.

Study Population
The study population includes patients with colorectal cancer who participated in a long-term follow-up study of patients of the German population-based case-control study DACHS ("Darmkrebs: Chancen der Verhuetung durch Screening"; described in detail elsewhere) [35][36][37][38]. In brief, colorectal cancer patients with a primary, confirmed diagnosis of colorectal cancer were recruited from hospitals of the Rhein-Neckar-Odenwald region 2003-2007 if they were aged 30 years or older, resident in the study region and able to complete an in-person interview. Baseline interviews were performed using standardized questionnaires to collect information on demographic and established or suggested colorectal cancer risk factors as well as possible prognostic factors. Follow-up information on vital status was collected at three, five and ten years after diagnosis. Causes of death were verified by death certificates and coded based on ICD−10 classifications. Information on recurrences and secondary tumors were collected from general practitioners and specialists as applicable. In addition, clinical and histological data were extracted from patient and pathological records. Patients without baseline blood samples (post-diagnostic) or without mouthwash (<1%) were excluded from the study.
Population controls were randomly selected from lists of residents of the population registries of the cities and counties. Controls were matched to cases by sex, county of residence and 5-year age group. Controls were eligible if they were aged 30 years or older with no history of colorectal cancer.  (Table S1).
We used Haploview 4.2 (Broad Institute, Cambridge, MA, USA) to identify tagging SNPs located in the assigned genes as well as in the 5 and 3 flanking regions (up to 10 kb) for coverage of the genetic variation across the gene. The tagging SNP selection was based on the HapMap project (CEU [Utah Residents (CEPH) with Northern and Western European Ancestry] population, Phase II/Release 24), using a pairwise tagging approach applying r 2 ≥ 0.8 as the cutoff.
Genomic DNA was extracted from EDTA (Ethylenediaminetetraacetic acid) anticoagulated blood or mouthwash samples using the FlexiGene DNA kit (Qiagen GmbH, Hilden, Germany) and quantified using Quant-iT PicoGreen dsDNA reagent and kit (Invitrogen/Life Technologies, Darmstadt, Germany). Genotyping was performed using the Illumina GoldenGate assay or iPLEX assay (Sequenom, Hamburg, Germany) for the MassArray system. For quality control, all assays were validated using the 72 CEPH controls, distributed randomly among the study samples. Each assay batch contained negative and positive controls and 5% of the total number of samples were re-genotyped to confirm reproducibility. Call rates for all genotypes exceeded 96%. The concordance for blinded duplicates was >99.7%. Deviation from the Hardy-Weinberg equilibrium was evaluated using a χ2 statistic in all control subjects. A total of 392 single nucleotide polymorphisms in 33 genes were investigated in this study, which passed quality control, were not in linkage disequilibrium (LD) and had a cell count of at least 10.

Imputation
Imputation of environmental factors was performed using multiple imputation as most variables, except for tumor grade (11%), had only few missing values (<1%). For grading, we compared no imputation and multiple imputation of grading with best-case (all missing values set to grade 1) and worst-case (all missing values set to grade 4) resulting in similar effect estimates (<10% difference). Genotype imputation in the DACHS study was performed by using IMPUTE2 and the 1000 Genomes reference panel [39,40].
For each polymorphism, the risk of colorectal cancer was estimated using log-additive, and dominant models. Differences in baseline characteristics between deceased and non-deceased patients were tested using the χ2 statistic and t-tests for categorical and continuous variables, respectively. As an outcome, time to survival and recurrence-free survival times were calculated as the time from the diagnosis to the event of interest (death and recurrence or death, respectively) or censoring. Median follow-up time was calculated using the reverse Kaplan-Meier estimator and the cumulative survival curves for overall survival were generated using the Kaplan-Meier curves [41].
Cox proportional hazard models were used to estimate hazard ratios (HR) for overall survival and recurrence-free survival, and their 95% CIs associated with genetic variants. Dominant and log-additive modes of inheritance were assumed for the genetic variants. The models also included adjustment for relevant environmental and clinical factors obtained from a backward elimination procedure. The analyses were adjusted for age (<60, 60-70, 70-80 and 80+ years), sex, stage (I-IV), tumor grade (1 and 2 versus 3 and 4), current BMI (<18.5 kg/m 2 , 18.5 to <25 kg/m 2 (reference category), 25 to <30 kg/m 2 and ≥30 kg/m 2 ), and alcohol intake (no alcohol intake and quartiles: 0.1-6.1, 6.1-15.6, 15.6-32.6 and ≥32.6 g alcohol/day in the last year).
The effect modification was tested by using multiplicative interaction terms and stratified analyses for adjuvant 5-FU-based chemotherapy (yes/no) and microsatellite status (high/low; only for overall survival), NSAID use (never, ever used NSAIDs more than once/month ≥1 year), and smoking (never/yes).
All statistical analyses were two-sided with a significance level of 0.05 and were performed using SAS (version 9.3, SAS Institute, Cary, NC, USA) and R (version 3.0.2, R Foundation for Statistical Computing, Vienna, Austria).
The false discovery rate (FDR, indicated as the q-value) correction method was used to adjust for multiple testing of non-candidate polymorphisms [42].

Validation Sets
Validation of significant associations was performed using two datasets. The first set included DNA samples from an independent sample of 2165 DACHS colorectal cancer patients recruited 2007-2010, which were genotyped using Illumina's Global Screening Array, OmniExpress BeadChip and OncoArray as described previously [48]. Genotypes were imputed separately for each gene (±1.5 MB) and for each array using the Michigan Imputation Server [49]. The Haplotype Reference Consortium (HRC) reference panel (HRC r1.1 2016) has been used for imputation and Eagle for phasing. Subsequently, the imputed genotypes of each array were combined. The second set included genotype data from The Cancer Genome Atlas (TCGA) Colon Adenocarcinoma (COAD) cohort and Rectal Adenocarcinoma (TCGA-READ) cohort, which were downloaded from the NCI Genomic Data Commons.
A statistical analysis from our respective findings was performed for both validation datasets as described above. TCGA analyses were adjusted for the available variables age, sex and stage.
The data that support the findings of this study are available from the corresponding author upon reasonable request.