The Association between Gene-Environment Interactions and Diseases Involving the Human GST Superfamily with SNP Variants

Exposure to environmental hazards has been associated with diseases in humans. The identification of single nucleotide polymorphisms (SNPs) in human populations exposed to different environmental hazards, is vital for detecting the genetic risks of some important human diseases. Several studies in this field have been conducted on glutathione S-transferases (GSTs), a phase II detoxification superfamily, to investigate its role in the occurrence of diseases. Human GSTs consist of cytosolic and microsomal superfamilies that are further divided into subfamilies. Based on scientific search engines and a review of the literature, we have found a large amount of published articles on human GST super- and subfamilies that have greatly assisted in our efforts to examine their role in health and disease. Because of its polymorphic variations in relation to environmental hazards such as air pollutants, cigarette smoke, pesticides, heavy metals, carcinogens, pharmaceutical drugs, and xenobiotics, GST is considered as a significant biomarker. This review examines the studies on gene-environment interactions related to various diseases with respect to single nucleotide polymorphisms (SNPs) found in the GST superfamily. Overall, it can be concluded that interactions between GST genes and environmental factors play an important role in human diseases.


Introduction
Human health or disease development is highly influenced by interactions between gene expression and environment [1]. Glutathione S-transferase (GST) is a dimeric, multifunctional protein superfamily, present in all kingdoms [2,3]; which has sparked an interest in the area of gene-environment interaction and diseases. Recent findings have demonstrated the importance of different allelic frequencies of polymorphic genes, such as GSTs, and susceptibility to certain diseases [4]. Changes or variations in an individual's DNA, such as single nucleotide polymorphisms (SNPs), are deemed to produce numerous diseases [5,6].
GSTs are vital in Phase II detoxification enzymes pathway in humans, and provide protection against toxins by catalyzing toxin conjugation with GSH or passively binding to various exogenous/endogenous toxic molecules, including environmental toxins, carcinogens, chemotherapeutic agents, or products of oxidative stress [3,7]. They are also involved in preventing cellular mutations and aiding in antioxidant defense mechanism. However, there are some instances where conjugation reactions can lead to the formation of compounds that are far more toxic than the initial substrate, thereby potentially causing a disease [8]. Some GSTs, which undergo polymorphisms, pose an interest in exploring the relationship between specific allelic variants and the risk of developing a disease [9][10][11][12][13]. Therefore, our goal is to examine studies on gene-environment interactions related diseases, with respect to SNPs found in the GST subfamilies as shown in Table 1 [14]. Table 1. Major types of GST genes and their SNP distributions (based on [14]). The mechanisms for the genetic variances can be complex. Conventionally there are two kinds of mutations, germinal and somatic mutations [15,16]. Germinal mutations can be passed down to offspring generations to become inherited mutations or polymorphisms. Inherited polymorphisms, unlike somatic mutations, are congenital and not induced by environmental factors. Somatic mutations develop at some stage of the cell lifespan due to exposure to environmental hazards. The focus of this article is on the SNP polymorphisms caused by somatic mutations induced by environmental risk factors. Nonetheless, the subtle differences (variances) of inherited genetics for different race/ethnic group populations in different locations, may confer different susceptibilities to somatic mutations induced by some factors (Table 2).

Glutathione S-transferases (GSTs)
The first GST structure, in 1991, leads to an overflow of structural data among GSTs of the three distinct superfamilies: the cytosolic GSTs (largest), the mitochondrial GSTs and microsomal, also known as the membrane-associated proteins in eicosanoid and glutathione metabolism (MAPEG family) [17]. Several mammalian GST classes have been identified and characterized, forming eight distinct classes: alpha, mu, pi, omega, sigma, theta, zeta, and kappa; with the first seven being cytosolic specific [18,19]. Alpha, mu, and pi are noted being the major classes [20] with their SNP distributions shown in Table 1 [14]. An extensive amount of literature has been published on theta, as well [14,21], which SNP distribution can also be seen in Table 1. In some cases, GSTM1 (glutathione S-transferase mu), GSTT1 (glutathione S-transferase theta), and GSTP1 (glutathione S-transferase pi) are evaluated in experiments altogether. GSTs are present in virtually all tissues, yet in humans the liver has the highest cytosolic GST activity level. Those major cellular detoxification enzymes are present mostly in the liver and kidney, as well as in the intestine [22].
The four widely studied GSTs (GSTA, GSTM, GSTP, and GSTT) are parts of the cytosolic GST group and are involved in the detoxification of xenobiotics, carcinogens, and therapeutic drugs. Hence, they are deemed insightful biomarkers as some of their polymorphic states can increase an individual or population's susceptibility to a disease (e.g., cancer). Class alpha GST (GSTA), located on chromosome 6 and highly expressed in the liver, plays a role in cell protection in the presence of peroxidation products and reactive oxygen species (ROS). It can serve as an indicator for liver injury, as it can be detected at lower levels in acute hepatic injury [23]. Class mu (GSTM), located on chromosome 1, is known to modify the toxicity and effectiveness of medical drugs. Class pi (GSTP), located on chromosome 11, is involved in the protection of cells from cytotoxic and carcinogenic agents [24]. Lastly, class theta (GSTT), located on chromosome 22, shares a similarity with 55% amino acid sequence identity and possesses a possible role in human carcinogenesis. Most GSTs' functions are associated with detoxification or anti-oxidation processes. Therefore, their structural changes due to SNP variants (especially missense mutations) have a strong correlation with chronic diseases such as cancers.
Glutathione S-transferase alpha (GSTA) family consists of GSTA1, GSTA2, GSTA3, GSTA4, GSTA5, GSTA6P (pseudogene), and GSTA7P (pseudogene) that are located on chromosome 6. This class encodes enzymes with glutathione peroxidase activities that function in the detoxification of lipid peroxidation products. Glutathione S-transferase mu (GSTM) family includes GSTM1, GSTM2 (muscle), GSTM3 (brain), GSTM4, and GSTM5, found on chromosome 1. GSTM1 enzyme encodes a major detoxification phase enzyme that helps detoxify various xenobiotics. Deficiency in GSTM1 activity is related to homozygous deletion of GSTM1 (GSTM1 null), leading to a lack of corresponding enzymatic activity [8,25]. Glutathione S-transferase pi (GSTP1) has an enzymatic activity that provides a "caretaker" function. It has been documented that inactivation of the GSTP1 gene was often observed in human neoplasia (prostate, breast, and liver cancer, as well as leukemia) and researchers highlight GSTP1 epigenetic modifications as biomarkers for early diagnosis for cancers and potential targets of preventive or therapeutic treatments [26,27]. Glutathione S-transferase theta (GSTT) family is comprised of GSTT1, GSTT2 (gene/pseudogene), and GSTT2B (gene/pseudogene) and is positioned on chromosome 22. This gene is polymorphic in human and the null genotype results in the absence of enzyme function, which may influence alterations in the response of xenobiotics. In recent years, many studies have assessed the associations between diabetes mellitus (DM), Type 2 diabetes mellitus (T2DM) and GSTT1 polymorphism; although no significant association was found [28], GST polymorphic genes (GSTM1-null and GSTT1-null) can be used as biological markers to determine the diabetic risk of individuals [4].
DNA mutations are associated with many human diseases and are the reasons for the variations among individuals. Many polymorphisms in the DNA sequence of these GSTs are reported and many studies have demonstrated that the polymorphisms of these GSTs are associated with different types of cancer [5,6,29,30]. As studied by Yadav et al. [14] based on NCBI/dbSNP database, the major studied GST genes [14] contained 3193 SNPs with 237 coding nsSNP among them (Table 1) which should be the focus of the investigation due to their potential effects on the structure, function, interactions, and other properties of DNA and expressed proteins. Obviously, intron noncoding SNPs consist of the most percentage (84.87%) of total SNPs under study; these intronic SNPs will not be the focus of our current review, although some of them may relate to regulatory or splicing mechanism. Significant coding nsSNPs will be identified and highlighted in the next section. Table 2. Population susceptibility for different diseases related to major types of GST.

GST Single Nucleotide Polymorphisms (SNPs)
GST genes are organized in chromosomal clusters, and most of these genes are polymorphic, mainly due to single nucleotide substitutions or variations (i.e., SNPs). Genetic variations can be classified as synonymous or nonsynonymous. Synonymous variation is the alteration in the DNA, which produces a change in the amino acid due to the nucleotide change, but does not affect the function. However, missense, nonsense, and frameshift changes are types of nonsynonymous mutations that all generate a significant change in the protein. In recent years, a few researchers suggested that deleterious nonsynonymous single nucleotide polymorphisms (nsSNP) of GSTs are associated with diseases. One study [14] discovered that five (GSTA3/R13W, GSTA3/Y147D, GSTM3/R191L, GSTM4/R18L, and GSTT1/W101R) of 237 nsSNP were identified as potential target mutations that induced structural changes and possibly alter the detoxification process that could lead to carcinogenesis events (Table 3).

Significant GST SNPs
Based on the literature reviewed, we compiled the most significant GST SNPs (related to disease) in Table 3. The criteria used to obtain information for this paper were based on the major GST-related & disease-associated SNPs found in PubMed, a service that provides access to literature, and dbSNP, NCBI's database containing genetic variation data. In Table 3, we included the significant or potential SNPs that fit our criteria which are: (1) the SNP was located in GST gene or close to it; (2) the SNP was studied and described in some literature(s) in PubMed; (3) the SNP has a dbSNP ID; (4) the SNP formation was susceptible to some kind of environmental factor including variances in ethnic group and living locations; (5) the SNP was associated with a disease or has the potential to cause a disease. Although some significant SNPs are located in the noncoding areas (e.g., GSTM5/rs3754446, GSTA1/rs3957357, GSTP1/rs4147581, GSTP1/rs947895, GSTT1/rs4630, GSTO2/rs7085725, and GSTZ1/rs1468951 in Table 3), most significant GST SNPs are found in the coding areas which can have a strong effect on the structure and function of the translated protein due to consequent amino acid change on the SNP site. Most of the SNPs in Table 3 have a connection with the environmental factors including carcinogen, toxin, heavy metal, cigarette smoke, air pollutant, UV exposure, and other environmental hazards. GSTA2 (P110S, S112T, and E210A) are in linkage disequilibrium (with one another), as shown in Table 3, which displays potential damage by these three GSTA2 variants collectively [42] although it has only been shown that GSTA2 S112T serine allele homozygosity is a prognostic factor for poorer survival, for increased any time-and 100-day transplant-related mortality [43]. It was determined that some contributing factors related to SNP formation can be associated with ethnic group's different susceptibility, location for industrial waste toxin or heavy metal, and other environmental factors. Our focus is on the somatic variances that are associated with environmental factors which may induce a potential disease, including the possibilities of disease susceptibility depending on a specific ethnic/population group with different living habits in a certain location ( Table 2). In addition, most GST SNPs are linked to carcinogenesis.  ).* Strong LD: GSTA2 (P110S, S112T, and E210A) are in linkage disequilibrium (with one another), which displays potential damage by these three GSTA2 variants collectively [42] although it has only been shown that GSTA2 S112T serine allele homozygosity is a prognostic factor for poorer survival, for increased any time-and 100-day transplant-related mortality [43].

Databases for GST SNP Studies
Based on the literatures published so far, we found that dbSNP, HapMap, and HGDP datasets are often utilized and examined for the GST SNP analyses (Table 4). Also, SNPedia and PharmGKB are useful for SNP annotations (Table 4). There are additional SNP datasets to be explored in more recent projects as seen in the databases of COSMIC, ICGC, and TCGA which focus on the genetic changes in cancers (Table 4).

Programs for GST SNP Analyses
Based on the reviews, there are many programs for SNP analyses and depending on the stage of the analyses, specific program should be used in each stage. For simplicity, we selected a few typical and popular programs to be included in Table 5 for an overall view of the whole process in SNP data analyses. For example, GATK [76] can be the first program to manage the high-throughput raw data for SNP variant discovery and/or genotyping, PLINK [77] can be used to do genome wide association study (GWAS), VEGAS (VEGAS2) [78,79] prepares a gene-based association test via SNP GWAS results, Arlequin [69,80] can perform population allele comparison and analysis of molecular variance (AMOVA), Haploview [69,81] is capable of producing visualizations and plots for the PLINK GWAS results, R/QTL [82] is able to generate a quantitative trait loci analyses to pinpoint the causative SNP loci for the disease, and finally, Triton, SIFT, and Polyphen2 [14,[83][84][85][86] can evaluate and predict if the SNP is damaging or not (Table 5). Starting from these programs, some more similar or related programs can be identified.

Environmental Hazards and Diseases Associated with GST Gene-Environment Interaction
Living organisms encounter exposure to various toxins and/or toxicants, i.e., industrial chemicals, pesticides, herbicides, air pollutants, pharmaceuticals and several natural occurring substances, that can have a detrimental effect on a human's health. These environmental exposures can induce changes in gene regulation associated with human diseases [7]. Exposure to the same environment does not warrant the same effect on different individuals within or outside of a particular ethnicity, due to the differences in a person's DNA. For example, genome-wide association studies (GWAS) have identified a number of genetic variants connected with the risk of bladder cancer in populations of European descent [87].
A correlation among gene polymorphisms and environmental toxins, such as heavy metals (arsenic, lead, and platinum), air pollutants, and other factors as seen in Table 6, have been the focus for some studies to display the potential risks [88]. Some findings provide information on exposures to environmental lead and an analysis of blood lead levels in men that exhibited genetic polymorphisms in GSTs (deletions of GSTM1/GSTT1 and GSTP1 rs1695), resulting in adverse alterations in inflammatory response [9]. Also, mothers exposed to environmental tobacco smoke show an increased probability of a negative impact on birth weight (i.e., low birth weight) [25]. There is an association between cancer incidence and various disorders of GSH-related enzyme functions especially the alterations of glutathione S-transferases (GSTs) [89]. It has been suggested that GSTM1/T1 polymorphisms are related with many diseases, such as rheumatoid arthritis, age-related macular degeneration, oral leukoplakia, prostate cancer, lung cancer, and cervical neoplasia [14,45,52,53,90]. GSTM1 null genotype and GSTP1 Ile105Val polymorphism are associated with the increased risk of Alzheimer's disease [12]. The pi class GST (P1) is often overexpressed in human tumors, including carcinomas of the colon, breast, lung, kidney, ovary, pancreas, esophagus, stomach, prostate, liver, and blood [19,27]. In relation to prostate cancer, there are no consistent associations between GSTM1, GSTT1 or GSTP1 genotypes [91] and related studies produced as recent as 2012 give the same information. Although one study states that its findings revealed no apparent interaction between GST gene variants and hypertension due to exposure to air pollution [21], there are instances where polymorphisms of GSTs are involved (positively or negatively). GSTO1 (glutathione S-transferase omega) related SNP in arsenic (As) metabolism exhibited nominally significant interactions with well-water "As" for connections with cardiovascular disease (CVD), coronary heart disease (CHD), or stroke [92]; in addition, GSTT1 polymorphisms serve as a potential genetic factor for arsenic-induced skin cancer [93]. It is suggested that GSTP1 aids in the detoxification of arsenic [13]. The GSTT1 also encodes enzymes involved in the metabolism and detoxification of polycyclic aromatic hydrocarbons (PAHs), and the protection against genotoxic damage due to the ethylene oxide present in tobacco smoke [25].
In Table 6, it is illustrated that the diseases related to GST SNP variants can be classified into five categories and most of the affecting factors have an association with the environmental toxins or toxicants. The criteria for the creation of this table are to focus on classification of the diseases highly related to SNPs found in GSTs. These five categories of disease are: (1) Cancers; (2) Inflammatory or Immunological Disorders; (3) Neurological Disorders; (4) Aging-related or Metabolic Disorders; and (5) Reproductive Disorders. Environmental toxins are the most important affecting factors for all five categories of diseases and are the causing factors for all cancers, Inflammatory or Immunological Disorders, and Reproductive Disorders.

Conclusions
The findings reviewed in this article display the role of environmental factors and how they influence the genome and its regulation, providing the clue that xenobiotics found in the environment as a result of anthropogenic activities can promote disease by altering gene allele. The study of gene-environment interactions is relevant in improving the human health, as researchers seek to determine risk factors that are potentially due to environmental exposures that produce differences in gene sequences [97]. This would aid in understanding and determining the initiation of the disease and would enhance the chance of protection against those diseases. Though GSTs' detoxifying activity aids in the protection of cells from certain diseases, they are also vulnerable to environmental toxin or hazard for the gene allele change leading to some life-threatening diseases. Hence, we postulate that interactions between GST genes and environmental factors play an important role in adverse health effects among humans. In addition, from recent international projects, there are more cancer SNP datasets available that haven't been fully explored yet, and their detailed examination and analyses in the future can identify more GST SNPs related to various cancers. This can be one step further to approach the practice of gene therapy (editing) via CRISPR/Cas9 [98][99][100] or disease treatment via personalized precision medicine [101] in the future.