The Role of Genotypes That Modify the Toxicity of Chemical Mutagens in the Risk for Myeloproliferative Neoplasms

Background: The etiology of myeloproliferative neoplasms (MPN) (polycythemia vera; essential thrombocythemia; primary myelofibrosis) is unknown, however they are associated with a somatic mutation—JAK2 V617F—suggesting a potential role for environmental mutagens. Methods: We conducted a population-based case-control study in three rural Pennsylvania counties of persons born 1921–1968 and residing in the area between 2000–2008. Twenty seven MPN cases and 292 controls were recruited through random digit dialing. Subjects were genotyped and odds ratios estimated for a select set of polymorphisms in environmentally sensitive genes that might implicate specific environmental mutagens if found to be associated with a disease. Results: The presence of NAT2 slow acetylator genotype, and CYP1A2, GSTA1, and GSTM3 variants were associated with an average 3–5 fold increased risk. Conclusions: Exposures, such as to aromatic compounds, whose toxicity is modified by genotypes associated with outcome in our analysis may play a role in the environmental etiology of MPNs.


Introduction
Myeloproliferative neoplasms (MPNs), including polycythemia vera (PV), essential thrombocythemia (ET), and primary myelofibrosis (PMF), are rare cancers characterized by an overproduction of red blood cells and platelets [1]. Mandatory reporting on MPNs in the US began in 2001. The National Cancer Institute (NCI) estimated the incidence of PV, ET and PMF at 2.8, 1.5, and 0.4 cases per 100,000 personyears for 2001-2004 [2]. In 2008 the World Health Organization (WHO) revised the diagnostic criterion to include a diagnostic algorithm with genetic information (JAK2 V617F) along with clinical information [3]. There are no known causes of MPNs although there have been a number of speculations about an environmental etiology [4].
The presence of the acquired JAK2 V617F somatic point mutation-a single-base substitution resulting in a change from valine to phenylalanine at position 617 in the protein on chromosome 9p [5]-is suspected to be pathognomonic of PV. It is present in nearly all PV patients (>95%), about half of those with ET or PMF, and less frequently in other hematologic diseases but not in any other cancers [6][7][8]. JAK2 is a protein that acts as an on-and-off switch regulating bone marrow activity [9]. In the presence of the mutation, a tyrosine kinase complex in bone marrow, normally responsible for regulating blood cell production, is activated to increase blood cell production [6]. The causes of the JAK2 V617F mutation are unknown [7].
There is strong evidence supporting familial risk for MPNs [4,10]. The role of inherited genetics has been suspected to influence MPN phenotype and susceptibility, including potential germ-line mutations yet to be identified [11]. Although functional genes that may modify the biological dose of a chemical mutagen have been studied for a wide range of cancers, no studies have investigated associations of these genotypes with MPNs [12]. Because of the role of the somatic JAK2 V617F mutation in the etiology of MPNs, genes that modify susceptibility to mutagenic chemicals are of particular interest.
If we assume that there are genes that are not sufficient independent causes of MPNs but act exclusively to increase susceptibility to prevalent environmental mutagens, we can evaluate the main effect associations of these genes with MPNs to efficiently explore the potential role of the exposures whose effect they modify (i.e., under these assumptions genotype associations infer existence of the geneenvironment interactions) [13][14][15]. This approach is related to the Mendelian randomization that allows one to infer causation without directly measuring the exposure of interest by making the defensible assumption, in the given case, that genes are allocated independent of exposure and confounders [13]. The main objective of such analyses is not so much to quantify the extent of the gene-environment interaction but to use evidence of such interactions (assessed via genotype) to implicate specific exposures in the etiology of a disease.
The aim of the current analysis is thus to use this approach to identify potential classes of environmental exposures implicated in MPN etiology.

Methods
The study protocol was approved by the Drexel University and Geisinger Health Systems Institutional Review Boards. Participants received a $25 gift card for completing the phone survey phase and an additional $25 gift card if they gave a blood sample which was used in genotyping.

Selection of Cases and Controls
A case-control study was conducted in a tri-county area in Northeast Pennsylvania (Carbon, Luzerne, and Schuylkill counties) [16]. Cases and controls were eligible if they were born between 1921 and 1968 (i.e., 42-89 years old at the time of the study), resided in the tri-county area between 2000 and 2008 and completed a telephone survey. Cases were identified from the Pennsylvania Cancer Registry (PCR) using the International Classification of Diseases codes (ICD-O) for MPNs (codes for PV, M-9962/3; ETM, 9961/3; PMF, M-9931/3) as well as a PV cluster investigation in 2009 that included cases diagnosed between 2001 and 2006 in the three county area [17,18]. We included cases diagnosed up to 31 December 2010. We attempted to contact eligible cases that had relocated from the tri-county area. Controls were selected using a random digit dialing sample, stratified by age, sex and county to obtain distributions that reflected those of the source counties. Controls had to be free of MPNs during the time period that cases were ascertained. Among persons who consented to an interview, 57% of the cases and 62% of controls also provided DNA for the current analysis. Medical records of all cases not previously confirmed by the ATSDR cluster investigation [18] were evaluated by one of two expert panels (one created by the Pennsylvania Department of Health and one used in case ascertainment for a related study done by the University of Pittsburgh) [19]. Both panels used a similar methodology, and a two-thirds majority decision rule for case determination. Of the 31 cases genotyped and identified through the PCR and aforementioned ATSDR cluster investigation, 27 were confirmed as consistent with clinical records. These 27 cases of MPNs and 292 controls were eligible for the current analysis.

Selection of Environmentally Sensitive Genes
The National Institute of Environmental Health Sciences' Environmental Genome Project (NIEHS EGP) has enumerated 647 known environmentally sensitive genes (ESG). We considered only genes identified by the NIEHS EGP that contained non-synonymous single nucleotide polymorphisms (nsSNPs), with a >5% minor allele frequency (MAF). Thus, 114 metabolism genes were given initial consideration, predominantly, the Phase I and Phase II detoxifying enzyme classes. The functional gene groups of interest regulate enzymes of two categories: metabolism or detoxification. Although other functional categories may be associated with the effect of chemicals of interest, they are out of the scope of this research. To facilitate the nsSNP selection, we used the Genome Variation Server (GVS) to obtain SNP information for nearly all EGP genes [20]. Only 82 of the 114 genes had coding SNP data in the GVS database. We reviewed the remaining 82 genes with nsSNP data for relevance to chemicals acting through a mutagenic pathway and the ability to be tested using the Illumina Bead Express platform. We ultimately included 21 genes using this platform for this project, in addition to the three genes without coding data, as shown in Table 1. In general, the variants included in this analysis modify enzyme function to magnify the mutagenic effect of the xenobiotic substrate [21] and include variants of the following genes: AHR, ARNT, CYP1A1, CYP1A2, CYP1B1, CYP2B6, CYP2C9, CYP2E1, CYP3A5, CYP4B1, EPHX1, GSTA1, GSTP1, GSTM1, GSTM3, GSTT1, GSTZ1, MPO, NAT2, NQO1, OGG1, TP53 and XRCC1 (Table 1) [20,22]. A tag SNP rs1495741 was included for NAT2 and was used to infer NAT2 slow phenotype [23].
Genotyping DNA was extracted from white blood cells by a standard salting-out protocol [24]. DNA quality was assessed by absorption at 260 and 280 nm. Samples were aliquoted into 96 well plates for analysis. Genotyping for all selected SNPs, except rs1048943 and rs4646903, was carried out using the Illumina Bead Express platform that employs VeraCode technology (Illumina, San Diego, CA, USA). Rs1048943 and rs4646903 were genotyped by TaqMan™ assays (LifeTechnologies/Applied Biosystems, Carlsbad, CA, USA) in a 384 well plate format using an Applied Biosystems 7900 PCR system. About 7% of samples were run in duplicate for both SNP genotyping assays. Deletions in GSTM1 and GSTT1 were determined using TaqMan Copy Number Assays™ with RNase P as the control gene. Samples were run in triplicate and CopyCaller™ Software was used for determination of the copy number. The bead express for the genotyping was run by Dr. Robin J. Leach, Co-Director of the Genomics Resource Core of the University of Texas, Health Services Center, School of Medicine, San Antonio. The JAK2 V617F testing was completed by Dr. Mingjiang Xu at Mt. Sinai School of Medicine.

Statistical Analysis
Descriptive analysis was conducted on the characteristics of the study population. Logistic regression estimated adjusted odds ratios (OR) and associated 95% confidence intervals (CI). All ORs were adjusted for the design variables used to stratify the population for selection of controls (sex, age, county). We used the highest frequency of the homozygous genotype as the reference unless the literature indicated a different referent group (see Table A1). We conducted analysis of genotypes in the control population to evaluate Hardy-Weinberg equilibrium in the genetic variants. We applied "gene-only analysis" [13] to estimate the main effect of the gene of interest as a signal for gene-environment interaction. Each genotype was tested one at a time. We also restricted analysis to only confirmed PV cases as well as only JAK2 V617F as additional case categories. In addition, an analysis that considered the total number of deleterious SNPs was also performed. Statistical analyses were conducted in SAS v. 9.2 (SAS Institute, Cary, NC, USA). Table 1. Genes associated with a mutagenic chemical and minor allele frequency in the general population [19].

AHR
This gene encodes a ligand-activated transcription factor involved in the regulation of biological responses to planar aromatic hydrocarbons. 10

CYP2E1
Inactivates a number of drugs and xenobiotics and also bioactivates many xenobiotic substrates including ethanol to their hepatotoxic or carcinogenic forms.

CYP3A5
In liver microsomes, this enzyme is involved in a NADPH-dependent transport pathway. It oxidizes a variety of structurally related compounds such as xenobiotics, fatty acids and steroids.

31
Drug metabolism and synthesis of cholesterol, steroids and other lipids

Results
The majority of MPN cases were confirmed to be primary PV (24/27). A greater proportion of cases were older (median age = 71 vs. 63 years.) and male (56% vs. 40%) compared to controls but otherwise were demographically similar ( Table 2). The study population was entirely Caucasian with only two Latino controls. None of the cases and only six controls reported Jewish ancestry. All examined genes existed in Hardy Weinberg equilibrium (details not shown). The prevalence of CYP1A2, GSTA1, GSTM3, and NAT2 risk genotypes in controls were 7%, 18%, 9% and 57%, respectfully, which is in agreement with the reported frequency in the literature (Table 1). Results for associations of MPNs with the environmentally sensitive genes are summarized in Table 3. The crude estimates were very similar to the effect estimates adjusted for the design variables. It must be noted that all estimates of ORs had wide confidence intervals and any interpretation of the magnitude of the effect and reliability of the hypothesis test should be approached with considerable caution. Having the most common homozygous CYP1A1 rs4646903, CYP1A2, EPHX1 rs2234922 and Tp53 alleles increased the odds of MPNs by four-to five-fold, with the point estimates of the odds ratios of 5.1, 4.1, 5.0, and 5.4, respectively. The CYP3A5 rs776746 AA genotype increased the risk of having an MPN on average 9-fold. The GSTA1 rs3957356 AA genotype was associated with an increase in effect estimates for any MPNs, with an average OR of 1.9. The GSTM3 rs7483 AA genotype was associated, on average, with elevated risk of any MPNs (OR = 3.9). The GSTM1 null and GSTZ1rs7972 AA genotypes followed a similar trend of doubling the odds of having an MPN, e.g., OR of 2.4 and 2.8, respectively. We found that the NAT2 slow acetylator AA genotype (rs1495741) was, on average, three times more common than wild-type genotype among cases, e.g., for any MPNs (OR = 3.1). We were not able to test for all seven of the other NAT2 SNPs shown in the literature to determine slow acetylator phenotype, but we were able to test for four of them. We observed a 2-fold increase for CYP2E1 across all case definitions. A similar increased risk was found for individuals with NQO1 rs1800566 AG (OR = 1.9) and a 3-fold increase for ARNT rs12410394 AG or GG genotypes with an OR of 3.2. Table 3. Association of myeloproliferative neoplasm with environmentally sensitive gene in case-control analysis: Odds ratios and (95% confidence intervals), adjusted for age, sex, and county 1 .   1 Odds ratios could not be calculated because there were no cases/controls with those variants for missing results and specifically, MPO rs2333277 genotype CC and XRCCI, rs25489 AG genotypes had no cases with any case definition; 2 Susceptible genes to mutagenic chemicals and Single Nucleotide Polymorphisms (SNPs) with rs number.

Sequence Variant All MPN Cases (n = 27) JAK2 V617F Cases (n = 22) PV Cases (n = 24)
All but one case (97%) harbored at least two of the evaluated SNPs that signal association of the outcomes with exposure to xenobiotics (AHR, CYP1E2, GSTM1, Tp53, GSTT1, GSTM3, CYP1A2, NAT2, or GSTA1 30/31) compared to 63% of controls (Table 4). None of the results, either for individual genotype or presence of two or more of these SNPs varied materially in analyses restricted to only PV cases or to cases with confirmed JAK2 somatic mutations.

Discussion
After studying the main effects of 24 environmentally sensitive genes, we found that variants in NAT2, CYP1A2, GSTA1, and GSTM3 were statistically significantly associated with MPN risk with ORs between 1.5 and 4. In addition, variants in CYPA1, CYP2E1, CYP3A5, EPHX1, TP53, MPO, GSTZ1, ARNT, and NQO1 were associated with MPNs in this study with ORs between 1.4 and 9. While these results do not confirm gene-environment interaction for any one specific chemical, the findings encourage further explanation of the interaction hypothesis with respect to biological pathways and chemical exposures implicated by these genes. These same genes appear to be associated with the presence of the JAK2 V617F mutation that is pathogenomic of MPNs.
To detect the potential for existence of gene-environment interactions, the main effects of genes were used in this analysis. We did not have measures of the exposures of interest available and therefore could not estimate interactions or stratified effects directly. If we assume that genes alone do not cause MPNs, but can only act by modifying the toxicity of an environmental exposure, then testing the main effect of the genes is an efficient way to generate evidence supporting qualitative gene-environment interaction [13,14]. Based on this reasoning, we are essentially assuming the existence of gene-environment interaction without an independent gene effect. A main gene effect without exposure is unlikely, based on the knowledge about the pathway for these diseases [6,7]. Therefore we have no reason to believe that MPNs are caused exclusively by the genotypes under investigation.
In our study sample, we did not find any associations with smoking or occupational exposure to polycyclic aromatic hydrocarbons with risk of developing an MPN [16]. However, perhaps paradoxically, our findings suggest that specific genotypes that modify the toxicity of these exposures may play a role in MPNs. The lack of an association of smoking with MPNs is consistent with the limited prior literature [4]. However, the existing literature on occupational and chemical exposures and MPN is neither specific, nor consistent [25][26][27][28][29].
Moore et al., reported an association of bladder cancer with NAT2 slow acetylator genotype and smoking intensity [30]. Since aromatic amines are detoxified by NAT2, interaction is biologically plausible. Thus, if carcinogens in tobacco smoke where implicated in MPNs, as they are in bladder cancer, then we should have detected the main effect of NAT2 in our study. However, absence of an effect from smoking per se suggests that compounds not important to the toxicity of tobacco smoke but affected by NAT2 should be scrutinized.
Functional SNPs associated with benzene exposure were also explored in this analysis: CYP2E1, GSTM1, AHR, and GSTT1 variants modify the biological dose of benzene. In addition, Tp53 has been reported to be involved with benzene hematopoietic stem-cell toxicity in mice [31]. Effect estimates for AHR and GSTT1 were essentially null but not for GSTM1 and Tp53. Work by Quiroga, Kaplan et al., and Mele et al., implicated benzene or petroleum products as risk factors [25,32,33]. Among analyses with the best available estimates of exposure to benzene, a pooled analysis of 29 cases of myelodysplastic syndrome (MDS), showed a monotonic-dose response relationship with cumulative exposure to benzene (odds ratio (OR) = 4.33 95% CI: 1.31, 14.3) but not for other subtypes including 30 myeloproliferative disease (MPD) cases (OR = 1.79, 95% CI: 0.68, 4.74) [29]. Work of Schnatter et al. implies that diagnostic heterogeneity may be complicating attribution to specific MPNs to environmental exposures [29]. Overall, our results are consistent with the supporting prior body of literature that suggests that benzene may be implicated in some MPNs, although clearly the evidence is far from being consistent and conclusive.
Our study was limited by the small number of cases. MPNs are rare hematological malignancies with only a limited number of cases available for recruitment-even under cluster outbreak conditions. Our study is most informative about PV cases only. We included a total of 24 PV cases, which is in the range of other studies of MPN etiology with case groups ranging from 10 to 133 [32,33] (see also n = 53 [31] and n = 30 [29]). Another limitation could be an unbalance in age and sex distribution of cases and controls. We attempted to control this by including these factors in the logistic regression. But residual confounding cannot be ruled out.
Although MPNs are classified at malignant, they became a reportable disease in the US only recently, in 2001, and in Pennsylvania only hospitals were able to report until recently, so there is the possibility of under-reporting of these MPNs to the cancer registry. There were also changes in the diagnostic criteria by the WHO in 2001 and again in 2008 [34]. The 2008 diagnostic criteria included molecular as well as histological information for diagnosis, including the JAK2 V617F mutation [34]. Our ability to test for this mutation directly helped minimize outcome misclassification. Furthermore, by considering only individuals with JAK2 mutation to be cases in some of our analyses, we reduced the possibly of outcome misclassification if one were to assume that persons with the mutation experience exposures that already placed them on pathway towards developing clinical MPN.
The misclassification of genotypes was not a concern in this study because our call rate was high (>95%), and we only had one SNP (UGT1A) that consistently did not perform well [35,36]. Our study is vulnerable to false positive discovery due to "multiple comparisons" [37]. However, unlike GWAS studies, we started with 648 genes of interest and only examined genes that met our a priori "plausible candidate". Our selective genotypes were functional SNPs which directly affect the enzymatic activity of the gene, influencing biological pathways that affect metabolic activation or detoxification processes of metabolism for mutagenic chemicals. We did not correct for elevated type II error associations in applying a correction factor in the analysis.

Conclusions
Our research made two main pivotal assumptions. First, we assumed that genotype and environmental exposures in this study population are independent. Second, we assumed that disease risk will not vary with genotype for subjects without environmental exposure. From this, we exploit the most generic form of Mendelian randomization that individuals receive a random allocation of alleles from their parents. Gustasfon and Burstyn presented this method and concluded that when both of these assumptions are met, using data on genotype and disease jointly, with only knowledge of the prevalence of exposure without individual level data on exposure, can be a practical approach to investigate gene-environment interaction [14]. Through this approach, a signal of increased risk will only result if an environmental exposure is operating through gene-environment interaction [27]. Our findings therefore suggest that aromatic compounds and heterocyclic amines play a role in MPNs. Future research on the exposures (other than smoking) affected by the NAT2, GST, and CYP genes may be fruitful avenues to better evaluate the environmental etiology of MPNs.