Genome-Wide Association Study of Serum Selenium Concentrations

Selenium is an essential trace element and circulating selenium concentrations have been associated with a wide range of diseases. Candidate gene studies suggest that circulating selenium concentrations may be impacted by genetic variation; however, no study has comprehensively investigated this hypothesis. Therefore, we conducted a two-stage genome-wide association study to identify genetic variants associated with serum selenium concentrations in 1203 European descents from two cohorts: the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening and the Women’s Health Initiative (WHI). We tested association between 2,474,333 single nucleotide polymorphisms (SNPs) and serum selenium concentrations using linear regression models. In the first stage (PLCO) 41 SNPs clustered in 15 regions had p < 1 × 10−5. None of these 41 SNPs reached the significant threshold (p = 0.05/15 regions = 0.003) in the second stage (WHI). Three SNPs had p < 0.05 in the second stage (rs1395479 and rs1506807 in 4q34.3/AGA-NEIL3; and rs891684 in 17q24.3/SLC39A11) and had p between 2.62 × 10−7 and 4.04 × 10−7 in the combined analysis (PLCO + WHI). Additional studies are needed to replicate these findings. Identification of genetic variation that impacts selenium concentrations may contribute to a better understanding of which genes regulate circulating selenium concentrations.

Blood selenium concentrations tend to vary substantially [16] and are influenced by both exogenous factors such as diet, supplements, or smoking status, as well as endogenous factors such as selenium storage, transport and excretion [1,17,18]. Identification of genetic variation that impacts selenium concentrations may contribute to better understanding of which genes impact endogenous factors that affect circulating selenium concentrations, ultimately, leading to improved prevention of seleniumrelated health outcomes.
Studies showed that genetic variation in GPX1 may change enzyme activity, correlation between GPX1 activity and selenium concentrations and impacts overall selenium concentrations after supplementations. These findings may suggest that genetic variants in selenoproteins impact circulating selenium concentrations. Glutathione peroxidase 1 (GPX1) is not only important for the anti-oxidative properties of selenium in the human body but also impacts selenium storage [19]. Studies showed that genetic variants in the GPX1 gene can change its enzyme activity [20][21][22], leading to changes in plasma selenium concentrations [23,24]. Selenoprotein P (SEPP1) is estimated to contain at least 40% of total plasma selenium [25] and, hence, has a central role in selenium transport [17].
Genetic variants in SEPP1 appear to impact functions and synthesis of SEPP1 and affect activity of other selenoproteins (e.g., GPX1) [26][27][28], which might result in changes in selenium concentrations. To date only a limited set of genetic variants have been investigated, and there is no comprehensive evaluation of the impact of genetic variants across the genome on circulating selenium concentrations. In this study, we conducted a two-stage genome-wide association study (GWAS) by using the data from two cohorts to examine the effects of genetic variation on serum selenium concentrations.

Study Population
This study is based on two cohorts with measurements of serum selenium concentrations and genome-wide association study data: (1) the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO), a large population based randomized trial designed to determine the effects of screening on cancer-related mortality and secondary endpoints [29], and (2) the Women's Health Initiative (WHI) observation study, a long-term national health study that has focused on strategies for preventing heart disease, cancer, and osteoporotic fractures in postmenopausal women [30]. In PLCO, participants for this analysis were previously selected for a nested-case-control study for colorectal cancer to conduct a GWAS analysis as well as to measure serum selenium. Five hundred and eighty-two PLCO participants with genotyping data and serum selenium concentrations available were included in this study. Similarly, participants in WHI had availability of both genotyping data and serum selenium measurements as part of a nested case-control study of colorectal cancer (n = 621). Both studies were restricted to participants of European descent because the number of participants of non-European descent was too small to allow a stratified analysis. All participants gave informed consent, and studies were approved by the Institutional Review Boards at respective institutes.
As imputation of genotypes is established as standard practice in the analysis of genotype array data, all autosomal SNPs of both studies were imputed to the CEU population (Caucasian residents of European ancestry from Utah, USA) in HapMap II release 24 using MACH (Markov Chain Haplotyping algorithm) [32]. Imputed data were merged with genotype data such that genotype data were preferentially selected if a SNP had both types of data, unless there was a difference in terms of reference allele frequency (>0.1) or position (>100 base pairs), in which case imputed data were used. As a measurement of imputation accuracy we calculated R 2 . Imputed SNPs were restricted based on MAF > 1% and imputation accuracy R 2 > 0.3. In total we included 2,474,333 SNPs (either directly genotyped or imputed) in the genome-wide analysis.
In PLCO, serum selenium concentrations were determined by using an inductively coupled plasma mass spectrometry method [33]. Blinded quality control samples (15%) were randomly inserted within each batch and monitored throughout the analysis [34]. The coefficient of variance (CV), estimated from the blinded duplicates, was 9.4%. In WHI, serum selenium levels were measured using atomic absorption spectrometry (Perkin Elmer, Fremont, CA, USA) [35][36][37]. Blinded quality control samples (6%) were included in batches with study samples. The mean CV for the blinded duplicates was 5.8% [38]. Each sample was run in duplicate and considered acceptable if the CV was less than 10%. Internal quality control samples were run before and after each batch to ensure the quality of essay [38].

Statistical Analysis
We used linear regression models assuming additive effects for genotyped SNP (codes 0, 1, and 2 for the number of variant) and using the expected number of variants for imputed SNPs (a number between 0 and 2) to examine the associations between SNPs and serum selenium concentrations. Serum selenium concentration was transformed by taking a nature log due to its positively skewed distribution. We used a two stage design, where we selected the most significant SNPs (those with p < 1 × 10 −5 ) from the first stage analysis in PLCO for the second stage analysis in WHI, which allows for an independent validation of findings from the first stage. SNP that were taken into the second stage were defined as statistical significant if they reached a p-value < 0.05/15 of regions for the SNPs from the first stage. We adjusted for number of regions rather than SNPs selected from the first stage to account for the correlation between SNPs within each region. As secondary analysis we conducted combined analyses across PLCO and WHI for SNPs selected in the first stage analysis. In the first and second stage, we ran linear regression models with adjustment for age, BMI, smoking status (ever vs. never smoker), cancer status based on nested case-control study (Yes/No), and the first three principle components of ancestry to examine the effects of SNP on serum selenium concentrations. In combined analyses, we pooled the two cohorts together while adjusting for the same variables as in the two-stage analysis as well as a cohort indicator variable. LocusZoom plots [39] were used to graphically show the GWAS results within a given genomic region. All analyses were performed using the R software (version 2.14.0). All statistical tests were two-sided.
We generated quantile-quantile (Q-Q) plots to assess whether the distribution of the p values was consistent with the null distribution (except for the extreme tail). We also calculated the genomic inflation factor (λ) to measure the over-dispersion of the test statistics from the association tests by dividing the median of the squared Z statistics by 0.455, the median of a chi-squared distribution with 1 degree of freedom. The inflation factor λ was 1.09 based on all SNPs including both directly genotyped and imputed, indicating there is little evidence of residual population substructure, cryptic relatedness, or differential genotyping between cases and controls. This result was consistent with the visual inspection of the Q-Q plot (Supplementary Figure S1).

Results
The study characteristics of 582 PLCO and 621 WHI participants are shown in Table 1. As both cohorts focused on chronic disease participants tend to be older (the average age is 64 years for PLCO and 67 years for WHI). Also the BMI was very similar across cohorts, while the fraction of smokers was slightly higher in PLCO than in WHI as can be expected given that we only included men from the PLCO study which tend to smoke more than women. Average serum selenium concentrations were similar and relatively high in both cohorts.  Table S1). Among these, the SNP rs119902616 in 2q23. 3 (Table 2; Supplementary Table S1). For all three SNPs with combined p values <5 × 10 −7 was the beta-estimates for serum selenium in the same direction for both cohorts with slightly weaker effects in WHI compared with PLCO (Table 2). To show associations of surrounding SNPs in both regions (4q34.3 and 17q24.3) we used LocusZoom plots as shown Figure 2a,b. In the 4q34.3 region, rs1395479 and rs1506807 are highly correlated with each other (r 2 = 1.0) and show similar low p values for the association with selenium concentrations. In the 17q24.3 region, rs891684 has the smallest p value. This SNP was in high LD (r 2 > 0.8) with 3 SNPs (rs9899648, rs2567504, and rs16977351) that also showed low p values for an association with selenium concentrations.

Discussion
In this study, we used a two-stage design to take advantage of the GWAS data from two independent populations to examine the association between genetic variants and serum selenium concentrations. In the first stage we observed 15 regions associated with serum selenium concentrations at p < 1 × 10 −5 . However, none of the regions reached the significance threshold in the second stage. Only two regions (4q34.3 and 17q24.3) had p < 0.05 in the second stage; and in the joint analysis the associations the two regions has p < 5 × 10 −7 but did not reach the conventional genome-wide significance level of p < 5 × 10 −8 . However, as has been previously shown [40], a large fraction of SNPs with borderline genome-wide significant associations replicated when results from additional studies were added, suggesting that further follow-up of these two regions is warranted.
Interestingly the most significant SNP rs1395479 in 4q34.3 was also found to be associated with heart rate traits (p = 6.9 × 10 −6 ) in a genome-wide scan conducted within the Framingham Heart Study [41]. rs1395479 is located in an intergenic region between the aspartylglucosaminidase (AGA) gene (~33 kb downstream) and the NEIL3 gene (~35 kb downstream). AGA is involved in the lysosomal breakdown of glycoproteins [42]. Glycoproteins occur in the cytosol, cell membrane, and extracellular space [43]. They are equipped with an extremely large number of functions such as transport molecule, function as hormones, enzymes (oxidoreductases, transferases, and hydrolases), and receptors (adhering cells to cells and cells to substratum) [44][45][46][47][48]. Due to the diverse functions of glycoproteins, they appear in nearly every biological process studied [44][45][46][47][48]. It is of interest that SEPP1 that plays a central role in selenium transport is a highly glycosylated protein and AGA might influence selenium concentration through its effects on glycosylation which might change SEPP1's secretion from endoplasmic reticulum, interaction with chaperons, or catabolism [49]. If this finding is replicated, it may provide evidence that AGA impacts serum selenium concentrations through its regulation and control of glycoproteins. NEIL3 belongs to a class of DNA glycosylases and is involved in DNA repair by cleaving damaged bases [50]. Whether it is related to cancer is still unknown. No association between selenium metabolism and NEIL3 has been reported. Thus, it remains unclear at this point if genetic variation in NEIL3 affects selenium concentrations.
In 17q24.3, the most significant SNP rs891684 is located in an intron of the gene SLC39A11. The function of SLC39A11 is not well described but it was found to be associated with survival of amyotrophic lateral sclerosis [51] and visceral adipose tissue [52] in previous GWAS. Additionally, the 17q24.3 locus was found to be associated with prostate cancer [53]. Interestingly, animal studies suggested that adipose tissue may be related to selenium storage and selenium levels are associated with obesity [54,55], which may provide further support for an association between genetic variations in SLC39A11 and circulating selenium concentrations.
Selenoproteins which incorporate selenium into their active center play in an important role in selenium metabolism in human body. For instance, GPX1 is important for selenium storage [19] and SEPP1 plays a central role in selenium transport [17]. We found that SNPs within or up/downstream of GPX1 and SEPP1 were nominally associated with serum selenium concentrations (p = 0.01-0.05), and that the directions of effect estimates for these SNPs are consistent between the two cohorts. Considering the usually moderate effects of common genetic variants, power of our study may limit us to detect the associations among SNPs in GPX1 and SEPP1.
Our study has several strengths. To the best of our knowledge, this is the first genome wide scan on circulating selenium concentrations. We used a two-stage design and followed it with a joint analysis in two cohorts to reduce the likelihood of false positive results. The three SNPs with p < 5 × 10 −7 in the joint analysis showed no evidence for heterogeneity in SNP-selenium association between two cohorts. However, our study also has limitations. In PLCO and WHI, average serum selenium concentrations were relative high and there were few participants with selenium-deplete concentrations. If the impact of genetic variation is largest in subjects with low selenium concentrations, we expect that our power to identify selenium-related genetic variants would be improved if we had included more subjects with low serum selenium concentrations. About 94% of the WHI participants were cancer cases and their selenium levels may not represent those from women without cancers, although we did not observed difference between cancer cases and matched controls [38,56]. Also, there was no difference in circulating selenium concentrations (nature log scale) between individuals with and without cancer (p value of t test = 0.16) in our study. Furthermore, we controlled for cancer status in our analyses to reduce the chance of false positive results. Some genomic regions may have gender-specific associations with serum selenium concentrations, in which case combined analysis of study participants in PLCO (all men) and in WHI (all women) may lead to false negative results. However, the vast majority of GWAS findings have been observed in men and women [57]. Given the differences between the study populations one may argue for separate analyses of both studies; however, this reduces the overall power and does not provide an opportunity to replicate findings. Therefore, we decided to conduct a combined analysis although it may be possible that more homogenous study populations would result in additional findings. This study focused on circulating selenium concentrations and did not examine the levels of the major plasma selenoproteins such as SEPP1 and GPX3. We identified two possible regions associated with serum selenium concentrations. If replicated, further work is needed to uncover the potential biological mechanisms underlying these associations. Our most significant SNPs with p < 5 × 10 −7 did not reach the conventional genome-wide significant threshold (p < 5 × 10 −8 ). Therefore, further studies are required to replicate our findings.

Conclusions
In conclusion, we performed a genome-wide association study on circulating selenium concentrations in two cohort studies and identified two potential regions, 4q34.3/AGA-NEIL3 and 17q24.3/SLC39A11. Further investigations are needed to replicate the observed associations and reveal the biological mechanism by which SNPs influence selenium levels. In addition, larger genome-wide studies are needed to discover additional regions associated with circulating selenium concentrations. 32122, 42107-26, 42129-32, and 44221).
For PLCO, we thank Christine Berg and Philip Prorok, Division of Cancer Prevention, National Cancer Institute, the Screening Center investigators and staff or the Prostate, Lung, Colorectal, and Ovarian (PLCO), Cancer Screening Trial, Tom Riley and staff, Information Management Services, Inc., Barbara O'Brien and staff, Westat, Inc., and Bill Kopp, Wen Shao, and staff, SAIC-Frederick. Most importantly, we acknowledge the study participants for their contributions to making this study possible. Funding for this work was provided through the National Institutes of Health, Genes, Environment and Health Initiative [NIH GEI] (Z01 CP 010200). The human subjects participating in the GWAS are derived from the Prostate, Lung, Colon and Ovarian Screening Trial and the study is supported by intramural resources of the National Cancer Institute. Assistance with genotype cleaning, as well as with general study coordination, was provided by the Gene Environment Association Studies, GENEVA Coordinating Center (U01 HG004446). Assistance with data cleaning was provided by the National Center for Biotechnology Information. Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research, was provided by the NIH GEI (U01 HG 004438). The datasets used for the analyses described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000093.