The Prognostic Value of Haplotypes in the Vascular Endothelial Growth Factor A Gene in Colorectal Cancer

New prognostic markers in patients with colorectal cancer (CRC) are a prerequisite for individualized treatment. Prognostic importance of single nucleotide polymorphisms (SNPs) in the vascular endothelial growth factor A (VEGF-A) gene has been proposed. The objective of the present study was to investigate the prognostic importance of haplotypes in the VEGF-A gene in patients with CRC. The study included 486 patients surgically resected for stage II and III CRC, divided into two independent cohorts. Three SNPs in the VEGF-A gene were analyzed by polymerase chain reaction. Haplotypes were estimated using the PHASE program. The prognostic influence was evaluated using Kaplan-Meir plots and log rank tests. Cox regression method was used to analyze the independent prognostic importance of different markers. All three SNPs were significantly related to survival. A haplotype combination, responsible for this effect, was present in approximately 30% of the patients and demonstrated a significant relationship with poor survival, and it remained an independent prognostic marker after multivariate analysis, hazard ratio 2.46 (95% confidence interval 1.49–4.06), p < 0.001. Validation was provided by consistent findings in a second and independent cohort. Haplotype combinations call for further investigation.


Introduction
Reliable prognostic markers are of great clinical importance in colorectal cancer (CRC). Several prognostic markers are already being used by clinicians to select patients for further postoperative treatment and adjuvant chemotherapy is standard of care for patients with high risk stage II and stage III CRC [1][2][3][4][5]. Nevertheless, tumor recurrence still occurs in patients with low risk stage II tumors and far from all patients receiving adjuvant chemotherapy benefit from their treatment. The identification of new prognostic markers in CRC is therefore a prerequisite for selection for adjuvant treatment in this patient category.
Angiogenesis, the development of new blood vessels from the pre-existing vasculature, is a normal physiologic phenomenon but it also represents one of the classical hallmarks in the malignant transformation of tumors [6]. The vascular endothelial growth factor (VEGF) system is one of the most essential growth factor systems involved in neoangiogenesis in malignant tumors [7]. It influences the vasculature and affects the growth of tumors [8], and it provides a potential route for the dissemination of tumor cells and thereby increases the risk of metastatic spread [9]. The biologic availability of VEGF-A, the most important ligand in the VEGF system, therefore has the potential to influence the prognosis of patients with malignant diseases.
The level of VEGF-A can, among other factors, be affected by normal sequence variations in the VEGF-A gene. Several of such normal variations, single nucleotide polymorphisms (SNPs), in the VEGF-A gene have been described. The −2578 C/A SNP (rs699947) and the −460 C/T SNP (rs833061) in the promoter region and the 405 G/C (rs2010963) in the 5' untranslated region (5'UTR) are all very common and they are often analyzed in clinical studies. Associations between SNPs in the VEGF-A gene and the promoter activity of the gene and protein concentrations of VEGF-A have been demonstrated in some studies [10,11] and clinical studies have also suggested association with clinical outcome in breast [12], renal cell [13], gastric [14], colorectal [15][16][17], and ovarian cancer [18]. A high degree of linkage disequilibrium exists between several of the SNPs in the promoter and 5'UTR of the VEGF-A gene. It is therefore possible that a haplotype effect, rather than an individual effect of SNPs, might explain some of the prognostic information related to the genetic variations in this region of the gene. In agreement with this, haplotype analyses have supplied further information on these genetic variations [12,14,15]. Focus, however, has so far been on individual haplotypes rather than haplotype combinations. The latter may be relevant to providing specific information on the prognosis for each individual patient [19].
The literature on the prognostic importance of SNPs in the VEGF-A gene in patients with CRC is still rather sparse, with conflicting results [15,17], and validation studies are warranted. Furthermore, only very few studies have reported on the effect of haplotypes. The aim of the present study was to perform a comprehensive analysis of the possible prognostic importance of haplotypes in the VEGF-A gene in two independent cohorts of patients with stage II and III CRC.

Patient Characteristics
Patient characteristics are presented in Table 1. The two cohorts were comparable. No significant differences were found comparing allele frequencies between the two cohorts, and all three SNPs were found to be in Hardy-Weinberg equilibrium (p > 0.05).  The median disease free survival (DFS) was 6.4 years (95% CI, 4.6-8.8) and the median overall survival (OS) was 7.2 years (95% CI, 5.3-9.3) in the test cohort. The median DFS was 5.8 years (95% CI, 5.1-5.8) in the validation cohort. The median OS was not reached in the validation cohort but does exceed 7.0 years. One hundred and two patients from the test cohort and 67 patients from the validation cohort died. Follow-up ended December 15, 2009 and data are reported with a median observation time of 9.8 years (range, 3.1-10.9) in the test cohort and 3.4 years (range, 1.0-7.8) in the validation cohort.
No significant associations were found between the genotypes of the three SNPs and patient characteristics, as listed in Table 1, in either of the cohorts (data not shown). Tumors with the T categories 1, 2 and 3 were grouped together due to the presence of very few T 1 and T 2 tumors. The standard prognostic markers, T and N category, stage, vascular invasion and peritoneal perforation were all significantly associated with survival in both cohorts (data not shown). Table 2 shows the relationship between SNP status and survival. The striking finding here is that the heterozygous VEGF-A genotypes (−2578 CA, −460 CT, and 405 GC) were all related to inferior survival rates compared to the corresponding homozygous genotypes in the test cohort. A comparison of the heterozygous genotypes to both homozygous genotypes was consequently performed although such a comparison differs from the conventional way of dividing genotypes. This strategy was kept in the following analyses in the validation cohort. The −460 C/T SNP were significantly related to DFS and OS in both cohorts. The poor survival related to the heterozygous genotypes in the univariate analyses was also seen in the multivariate analyses. This is presented in Table 3, which shows the results of the multivariate survival analysis from the validation cohort. The possible independent prognostic value of each SNP was assessed here individually. The table therefore summarizes the results from three separate Cox regression models all including T category, N category, vascular invasion, neuronal invasion, peritoneal perforation and adjuvant treatment. The −2578 C/A and the 405 G/C SNPs both demonstrated prognostic value independent of the standard prognostic markers regarding DFS. A possible independent value, although not significant, was seen for the −460 C/T SNP. A similar multivariate survival analysis adjusted for the same variables was initially performed on the test cohort, with similar findings although the 405 G/C SNP failed to demonstrate an independent prognostic value (data not shown).

Haplotype Analysis
The haplotype analysis was performed to evaluate a possible combined effect of the SNPs (−2578 C/A, −460 C/T, 405 G/C) on CRC survival. Three haplotypes with a frequency above 0.5%, (ACG, 52.3%; CTC, 31.3%; CTG, 16.4%) in the test cohort and (ACG, 47.8%; CTC, 34.7%; CTG, 17.0%) in the validation cohort, were defined by the PHASE program based on population frequencies of the three SNPs. This meant that each patient could be identified by one of six possible haplotype combinations due to our bi-allelic nature.
The initial analyses in the test cohort revealed a combination (CTC, ACG) present in 29% of the patients, related to inferior survival rates and differing significantly from four of the remaining five combinations. The survival rates for these four combinations, constituting 55% of the patients, were very similar and did not differ significantly from each other. The patients with the last combination (CTG, ACG) presented with an intermediate prognosis not differing significantly from any of the other combinations. This is illustrated graphically in Figure 1 showing the OS curves from the test cohort according to the haplotype combinations. Similar DFS curves were seen (data not shown) and it therefore seemed reasonable to pool the patients with haplotype combinations different from the (CTC, ACG) and (CTG, ACG) combinations in one group for the following analyses in the validation cohort. Test cohort, p = 0.02

Time (years)
Cumulative overall survival Figure 2 shows the significant differences in DFS and OS seen in both cohorts based on the above considerations. The results from the validation cohort confirmed the inferior survival rates seen in the test cohort for the patients with the (CTC, ACG) combination and furthermore the favorable survival rates seen for the combined group. The median DFS for the patients with the (CTC, ACG) combination was 2.6 years (95% CI, 2.0-4.4) in the test cohort compared to 3.2 years (95% CI, 2.2-5.8) in the validation cohort. The patients from the combined group presented with a median DFS of 9.5 years (95% CI, 6.5-9.5) in the test cohort. The median DFS for this group was not reached in the validation cohort but exceeds 5.5 years. The results from the validation cohort could not confirm the intermediate prognosis for the patients with the (CTG, ACG) combination who presented with survival rates similar to the survival rates seen for the combined group and consequently, only the (CTC, ACG) combination seems to be related to poor survival. Table 4 shows the results of the multivariate survival analysis according to the haplotype combinations, and involves the same patient characteristics as in Table 3. The (CTC, ACG) haplotype combination remained an independent prognostic marker in both cohorts with hazard ratios ranging from 2.04 to 2.46, respectively.
We also performed subgroup analyses running separate survival analyses for patients with colon and rectal cancers and for patients with stage II and III disease. These results did not indicate that the present findings were more pronounced in either of the groups (data not shown).

Discussion
New prognostic markers are a prerequisite for selecting patients with CRC for adjuvant treatment. A number of recent publications have suggested that SNPs in the VEGF-A gene are biomarkers of prognostic importance in different types of malignant tumors including CRC [15][16][17]. However, the results of these studies are contradictory, probably because of varying patient materials, and the role of haplotypes needs to be elucidated. Therefore, the aim of this study was to analyze the prognostic influence of haplotypes in the VEGF-A gene in patients with stage II or III CRC based on a test and a validation cohort.
The three SNPs in the present study were chosen, as they are very common and because previous studies have reported on functional as well as prognostic influences [10][11][12][13][14][15][16][17][18]. Furthermore, they are all located in rather close proximity in a region of the gene in which a high degree of linkage disequilibrium exists and haplotype effects might therefore be a possibility. The genotype frequencies of these SNPs in our study are in rather good agreement with those reported in the literature on CRC patients [15,[20][21][22].
The −2578 C/A, the −460 C/T and the 405 G/C SNPs all showed a significant relationship with survival, and in all three cases the heterozygous genotypes were related to poor survival. We know that comparing heterozygous versus both homozygous is not the conventional way of grouping genotypes but the present results called for this unconventional approach. Multivariate analysis of DFS (performed with each SNP individually) confirmed a significant prognostic value related to the −2578 C/A and 405 G/C SNPs independent of standard prognostic markers. Kim et al. examined the prognostic influence of VEGF-A SNPs in 445 Korean patients operated for CRC stage I to IV [15]. The results suggested that compared to the other genotypes the VEGF-A 405 GG genotype was associated with inferior survival rates. The −2578 C/A SNP did not show any relationship with survival. A second study, by Dassoulas et al. reported that the VEGF-A −2578 AA and 405 CC genotypes were related to significantly lower OS in 312 Greek patients operated for CRC stage I to IV. The influence by the −460 C/T SNP did not reach statistical significance [17]. The results from our study, the Greek, and the Korean studies, clearly demonstrate the diversity between studies dealing with the prognostic importance of SNPs. First of all, any difference in the allelic frequencies and the phenotypic outcome, survival, would make it difficult to compare the prognostic value of genetic markers between ethnically different groups. Furthermore, differences in the disease stages included in the studies might also explain some of the discrepancies. Stage is a very strong prognostic parameter, and the fraction of patients with stage I and IV disease differed a great deal between the Greek and Korean studies. We only included patients with stage II and III disease because this is the group of patients in which new prognostic markers are most warranted.
As expected, we observed rather similar results for all three SNPs presumably due to the high degree of linkage disequilibrium known to exist in this region of the gene. The link between the −2578 C/A and the −460 C/T SNPs actually turned out to be stronger than initially expected. The distribution of genotypes was identical in the test cohort and only differed in a small number of patients in the validation cohort. Therefore, it can not be ruled out that an underlying haplotype effect can explain some of the apparent prognostic power associated with the genetic variations in this region of the gene.
The haplotype frequencies from the present study were in a rather good agreement with the haplotype frequencies reported in the literature [15,17,22,23]. Any differences could very well be explained by ethnical differences, but also the source of DNA could account for some of the differences. The Korean studies are based on tumor DNA, where the presence of loss of heterozygosity (LOH) would lead to changes in the observed frequencies of the haplotypes.
The present haplotype analysis identified three frequent haplotypes and all patients could consequently be identified by one of six haplotype combinations. The haplotype analysis demonstrated a significant relationship between poor survival rates and patients with the (CTC, ACG) combination compared to survival rates for the remaining patients. A group of patients (CTG, ACG) with a presumable intermediate prognosis was identified in the test cohort, but this was not confirmed by the following analysis in the validation cohort suggesting that patients with haplotype combinations different from (CTC, ACG) have a somewhat similar prognosis. The (CTC, ACG) combination also remained an independent prognostic marker after the multivariate survival analysis in both cohorts.
The two haplotypes, CTC and ACG, might be linked to two independent genetic variations, both being of prognostic importance for different reasons and both acting in a dominant fashion. This would explain the inferior survival rates demonstrated for patients harboring both haplotypes compared to patients with only one of the haplotypes on one or both alleles. It should be pointed out that the associations to heterozygosity for −2578 C/A, −460 C/T and 405 G/C demonstrated in Tables 2 and 3 are easily explained by the fact that the haplotype combination (CTC, ACG) will result in heterozygosity for these three SNPs. These genetic variations may lead to a differential influence on the tumor vasculature and thereby the risk of dissemination of tumor cells in the individual patients ultimately resulting in a difference in prognosis.
The Korean study by Kim et al. [15] found the −2578 A, 405 G, 936 T haplotype to be associated with inferior survival rates. The Greek study by Dassoulas et al. [17] assessed the clinical importance of haplotypes based on the −2578 C/A, 405 G/C and 936 C/T SNPs but found no associations with survival or clinicopathological characteristics. Given the strong linkage between the −2578 C/A and the −460 C/T SNPs, the haplotype associated with inferior survival rates in the Korean study probably represent the ACG haplotypes from the present study. So despite rather large differences between our results on genotype level, some agreement seems to exist on the haplotype level.
Using genomic DNA derived from blood holds several technical as well as biological advantages compared to tumor DNA. Genomic DNA is easy to assess through a blood test compared to more invasive procedures such as biopsies. It is constant over time and not influenced by tumor biology or treatment. The quality of blood derived DNA is often higher than the tissue derived DNA thereby improving the quality of the analyses and the time spent on optimizing procedures.
The retrospective design of the present study has its limitations, but our conclusions are strengthened by the validation performed on an independent cohort. The study presented here with focus on CRC stages II and III is the largest one in the field so far but the sample size is still too small to draw any definite conclusions and prospective validation is still warranted to provide further evidence.

Study Population
This retrospective study on 486 patients, all Caucasians, consisted of a test cohort of 191 patients operated between January 1999 and December 2000 and a validation cohort of 295 patients operated between January 2002 and December 2008. Besides being operated at different time periods, the two cohorts also differed with regard to the patient materials available for analyses, formalin fixated paraffin embedded (FFPE) tissue in the test cohort and blood samples in the validation cohort. All patients underwent surgical resection of histologically verified adenocarcinomas of the colon or rectum at the Department of Surgery, Vejle Hospital, Denmark. The study only included patients with stage II and III disease. Patients having received preoperative chemoradiation of rectal cancer were not included. Furthermore, patients who died of post-operative complications or within one month from the operation were excluded (16 patients from the test cohort and 14 patients from the validation cohort). All patients from the two cohorts meeting these inclusion criteria were included in the study and hence no specific power calculations were applied to determine the sample size. Pre-treatment examinations included a chest X-ray and ultrasound or CT scan of the abdomen. Postoperatively the tumors were histologically classified and staged according to the pTNM system. Information regarding patient characteristics, relapse status and survival were based on patient records and registries. The study was approved by the Regional Scientific Ethical Committee for Southern Denmark according to Danish law, and informed consent was obtained from all patients enrolled in the study.

Analysis of Single Nucleotide Polymorphism
Genomic DNA from the test cohort was derived from FFPE tissue. Fifty-μm sections of FFPE normal colorectal tissue were initially treated with xylene for deparaffinization and then washed two times in ethanol (99%). Samples were then incubated for two days in a lysis buffer and proteinase K at 56 °C. Genomic DNA from the validation cohort was derived from whole blood, which was obtained at the operation and stored at −20 °C. The DNA from both cohorts was isolated using the NucleoSpin® Tissue method according to the user manual (Machery-Nagel, Düren, Germany,) (http://www.mn-net.com/Portals/8/attachments/Redakteure_Bio/Protocols/Genomic%20DNA/UM_g DNATissue.pdf) and by the Maxwell® method (after 2006) according to the user manual (Promega Corporation, WI, USA) (http://www.promega.com/tbs/tm284/tm284.pdf).
The PCR analysis was performed using the ABI PRISM 7900 HT fast real-time PCR system (Applied Biosystem, Foster City, CA, U.S.A.). Commercial assays (which were functionally tested or validated from Applied Biosystem) were used for the analysis of the VEGF-A SNPs. Assay numbers and the approximate length of the amplification products (estimated by gel electrophoresis) were as follows: The VEGF-A −2578 C/A SNP; rs699947; C___8311602_10; 110/90 base pairs (depending on genotypes), the VEGF-A −460 C/T SNP; rs833061; C___1647381_10; 120 base pairs and the VEGF-A 405 G/C SNP; rs2010963; C___8311614_10; 100 base pairs.
In each well, of a 96 well microtiter plate, 10 μL were added consisting of 2 μL DNA and 8 μL of a mastermix containing the two primers and probes from the assay and universal PCR mix (also Applied Biosystems). Controls were analyzed along with the samples for final identification of the genotypes.
The PCR analysis was conducted according to standard procedures with cycling conditions initially at 50 °C for 2 min and 95 °C for 10 min. This was followed by 40 cycles at 95 °C for 15 s (for denaturation), and 60 °C for 1 min (for annealing and elongation). Following end-point reading, genotypes were visualised on an allelic discrimination plot. Samples, from which the test results did not meet the quality value threshold, 95% for DNA derived from the blood samples and 98% for DNA derived from FFPE tissue, were diluted and reanalyzed. This was only necessary for a few samples. The SNP analysis was performed without knowledge of the clinical data.

Statistical Analysis
Fisher's exact test was used for comparison between genotypes and patient characteristics. Chi-square statistics were used to test for Hardy-Weinberg equilibrium. The haplotypes and their frequencies were estimated using the PHASE program, version 2.1, which implements a Bayesian statistical method for reconstructing haplotypes from genotype data [24,25]. No patients had missing SNP data. Disease free survival was defined as the time from surgery until the first documented tumor recurrence or death. Overall survival was defined as the time from surgery until death. Survival curves were illustrated according to the Kaplan-Meier method and the logrank test was used to test for differences between the groups. Survival data from patients diagnosed with a new malignancy after their surgical resection for CRC (12 patients from the test cohort and nine from the validation cohort) were censored from the date of their new cancer diagnosis. This was done to prevent any possible bias related to the presence of a new cancer or the chemotherapeutic treatments used. Furthermore, survival data from seven patients were incomplete (five patients from the test cohort and two patients from the validation cohort) and were censored from the date of their last patient record. Cox regression method was used to analyze the independent prognostic importance of different markers. All statistical calculations were carried out using the NCSS statistical software (NCSS Statistical Software, Kaysville, UT 84037, USA, version 2007). P values <0.05 were considered significant, and all tests were two sided.

Conclusions
In conclusion, this validating study of nearly 500 patients from two independent cohorts with stage II and III CRC identified a genetic signature related to the prognosis of patients with stage II and III CRC based on genetic variations in the promoter and 5'UTR of the VEGF-A gene. Analysing haplotype combinations, we were able to identify a group with a rather favorable prognosis and a group in which adjuvant chemotherapy seems indicated. The possible benefit from such a treatment, however, cannot be assessed from the present results. This unfavorable haplotype combination remained an independent prognostic marker. Future studies should focus more on haplotype analyses because of higher degrees of consistency between studies. The haplotype combination approach calls for further investigation.