Risk Allele Frequency Analysis of Single-Nucleotide Polymorphisms for Vitamin D Concentrations in Different Ethnic Group

The prevalence of vitamin D deficiency varies from 20.8% to 61.6% among populations of different ethnicities, suggesting the existence of a genetic component. The purpose of this study was to provide insights into the genetic causes of vitamin D concentration differences among individuals of diverse ancestry. We collected 320 single-nucleotide polymorphisms (SNPs) associated with vitamin D concentrations from a genome-wide association studies catalog. Their population-level allele frequencies were derived based on the 1000 Genomes Project and Korean Reference Genome Database. We used Fisher’s exact tests to assess the significance of the enrichment or depletion of the effect allele at a given SNP in the database. In addition, we calculated the SNP-based genetic risk score (GRS) and performed correlation analysis with vitamin D concentration that included latitude. European, American, and South Asian populations showed similar heatmap patterns, whereas African, East Asian, and Korean populations had distinct ones. The GRS calculated from allele frequencies of vitamin D concentration was highest among Europeans, followed by East Asians and Africans. In addition, the difference in vitamin D concentration was highly correlated with genetic factors rather than latitude effects.


Introduction
Vitamin D, a fat-soluble vitamin, plays an essential role in bone mineralization and calcium homeostasis. Its deficiency is closely related to metabolic bone disease [1] and non-skeletal conditions, such as cardiovascular, infectious, and autoimmune diseases, as well as malignancies and diabetes [2][3][4][5]. Vitamin D is produced in the skin from 7dehydrocholesterol by UV irradiation. Serum 25-hydroxy-vitamin D (25(OH)D 3 ), the major circulating biomarker of vitamin D status, is converted to active vitamin D, 1,25(OH) 2 D, primarily in the kidney and, to a lesser extent, in the extra-renal tissue [6]. Serum vitamin D levels are strongly influenced by numerous factors, including age, obesity, skin color, dietary intake, exposure to ultraviolet B (UVB) sunlight, geographical latitude, and dietary supplements [7]. Studies have estimated that 1 billion people worldwide have vitamin D deficiency or insufficiency [1,8] which is a significant public health concern [7].
According to a global overview, the prevalence of vitamin D deficiency or the average vitamin D concentrations varies according to ethnicity. For instance, despite the serum 25(OH)D 3 cutoff point being set at 20 ng/mL for adults, 54% of patients with African ancestry (the average 25(OH)D 3 concentrations: 21.0 ± 10.4 ng/mL) fall below this level,

Ethical Considerations
This study was approved by the Institutional Review Board (IRB) of the Veterans Health Service Medical Center, Korea (IRB No. 2019-07-008 and IRB No.2020-01-053). In addition, the need for informed consent was waived due to the use of de-identified data.

Comparison of Vitamin D-Related SNPs among the Global Population and East Asia
The most commonly used cut-points for serum [25(OH)D 3 ] levels in adults are 11-19 ng/mL, and ≤10 ng/mL for, deficiency, and severe deficiency, respectively. However, we used the average vitamin D concentrations for each cohort instead of the prevalence of vitamin D deficiency for two reasons: first, the data among African and South Asians are limited [9]; second, the prevalence of vitamin D deficiency and average 25(OH)D 3 concentration are related. We searched the NHGRI-EBI GWAS catalog (https://www.ebi.ac.uk/ gwas/home, 30 December 2020) for SNPs associated with vitamin D measurements (EFO 0004631). The catalog included 13 studies and 546 associations. After eliminating repetitive SNPs and removing data not found in the 1000 Genome Projects database, 320 SNPs from the GWAS catalog were used for the analysis of allele frequencies associated with vitamin D concentrations.
The details and advantages of our method have been described elsewhere [19][20][21]. In brief, the population-level allele frequencies of SNPs were derived from the 1000 Genomes Project phase 3 and the Korean Reference Genome Database (KRGDB) produced by the Korea National Institute of Health in 2016. The former surveyed genetic variations among 2504 individuals from 26 worldwide populations grouped into African, East Asian, European, South Asian, and American categories based on their geographical locations and ancestry [22]. These data were downloaded (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/ release/20130502/, last accessed: 15 January 2020). The variant coordinates were based on the human genome assembly GRCh37. The latter included data on 1722 individuals from the Korean population since the 1000 Genomes Project did not have this information [23]. Data on the population frequencies of the SNPs were downloaded from the web-based database (http://152.99.75.168:9090/KRGDB/menuPages/download.jsp/, last accessed: 15 January 2020). In order to compare the distributions of risk alleles in the Korean population, individual genotyping results from the second phase of KRGDB were obtained from 1099 individuals from the National Human Resource Bank of Korea. After statistical analysis, we performed expression quantitative trait locus (eQTL) analysis for significant SNPs using the Genotype-Tissue Expression (GTEx) portal (https://www.gtexportal.org/ accessed on 30 December 2020) for the significance of vitamin D SNP enrichment. Gene and transcript expression on the GTEx portal are shown in the Transcripts Per Million (TPM) unit, calculated as where "n t " refers to the number of reads for transcript/gene, the normalized transcript/gene length, and "T" is the set of all transcripts or genes depending on whether the quantification is at the gene level. The normalized expression (norm expression) values were calculated with edgeR (https://gtexportal.org/home/documentationPage-#staticTextAnalysisMethods accessed on 30 December 2020).

Calculation of Genetic Risk Scores Using SNPs Related to Vitamin D Concentration
To compare the composite genetic risk of vitamin D deficiency, we adopted the following equation provided by Mao et al. [21]: where "I" refers to the number of vitamin D concentration-related SNPs, and "Xi" to the copies of risk alleles (Xi {0,1,2} the ith SNP. Thus, if a person had two copies of the risk allele at each vitamin D concentration-related SNP, their risk score was set as 1. In contrast, if a person had no copies, their risk score was 0. A person with a composite genetic risk score (GRS) of 1 has the highest possible genetic risk of higher vitamin D concentration, whereas a person with a score of 0 has the lowest. If copies of effect alleles (0/1/2) were randomly assigned to each SNP, the expected value of the risk score was set at 0.5. SNPs with frequency differences of more than 10% between the total (n = 1722) and second-phase (n = 1099) data of KRGDB were excluded from the GRS calculation. We used the average composite GRS to determine correlations with population vitamin D concentration data from similar geographical latitudes (51 • ) and fitting curve vitamin D concentration for its original endogenous population vs. GRS [9]. In addition, the correlation analysis of the difference of vitamin D concentrations was performed included both latitude factor and GRS factor since a previous study showed the impact of these on vitamin D concentration in patients of African and European ancestry [24].

Statistical Analyses
We used Fisher's exact test to assess whether the effect allele at a given SNP was significantly higher or lower compared to the global population frequency in the 1000 Genomes Project database, and the p values were initially log 10 -transformed. In the heatmap generated to visualize allele patterns in different populations, red and blue colors were used to indicate higher and lower frequencies, respectively, compared to the global average. If the effect allele was enriched in a population, then the negative log 10 of the p-value (a positive number) was used to represent the SNP associated with that population in the heatmap. In contrast, if it was depleted, then the log 10 of the p-value (a negative number) was used. Statistical analyses were performed using R software version 4.0.1 (R Foundation, Vienna, Austria), and statistical significance was set at p < 0.05.

Vitamin D Concentration-Related SNPs in the Global Population
We collected 320 vitamin D concentration-associated SNPs from 13 GWASs using the NHGRI-EBI catalog. We determined the effect allele frequencies (EAFs) for each of the continental groups and the Korean population based on the information from the 1000 Genomes Project and KRGDB (Supplementary Table S1). The heatmap shows how significantly the effect allele was enriched or depleted across these populations (Supplementary Figure S1). In the Korean population, 106 vitamin D-related SNPs were significantly enriched, 120 were depleted, and 94 were comparable to the global EAF. The hierarchical clustering tree showed the differences among the populations, with Europeans, Americans, and South Asians in one cluster and Africans, East Asians, and Koreans in another. In addition, SNPs with significantly different frequencies among the Korean population (Log-adjusted p-value of Fisher's exact test in Koreans >100 or <−100) are summarized in Table 1 and Figure 1.  From the data, rs10818769 and rs9409266 were found to be depleted in Koreans, East Asians, and Africans but enriched in Europeans. The SNP (rs10818769, rs9409266) is located in an intronic region of the RABGAP1 gene, which encodes guanosine triphosphatase-activating protein of RAB6A, and has alleles of C > G and G > A, respectively. The major allele was detected in 85% of Europeans and 26% of Koreans. Although the RABGAP1 gene is known to be related to body height and birth weight, these SNPs may be related to modulation of the evolution-related gene for the ethnic component of vitamin D concentration. The box plots of eQTL of the RABGAP1 genes related to vitamin D in skin tissues of both sun-exposed and non-exposed areas show a significantly different  From the data, rs10818769 and rs9409266 were found to be depleted in Koreans, East Asians, and Africans but enriched in Europeans. The SNP (rs10818769, rs9409266) is located in an intronic region of the RABGAP1 gene, which encodes guanosine triphosphataseactivating protein of RAB6A, and has alleles of C > G and G > A, respectively. The major allele was detected in 85% of Europeans and 26% of Koreans. Although the RABGAP1 gene is known to be related to body height and birth weight, these SNPs may be related to modulation of the evolution-related gene for the ethnic component of vitamin D concentration. The box plots of eQTL of the RABGAP1 genes related to vitamin D in skin tissues of both sun-exposed and non-exposed areas show a significantly different expression, according to the alleles of rs10818769 and rs9409266 in the GTEx data ( Figure 2). Comparison of allele frequency of major vitamin D-related genes, such as GC, NADSYN1/DHCR7, CYP2R1, and CYP24A1, are summarized in Supplementary Table S2.

Genetic Risk Scores Calculated Using SNPs Related to Vitamin D Levels
We calculated the composite GRS based on the number of copies of effect alleles at the 320 vitamin D-associated SNPs, assuming that allelic associations from most GWASidentified variants could be replicated in non-European populations. The GRS of vitamin D concentration was highest among Europeans, followed by Americans, South Asians, East Asians, and Africans ( Figure 3).

Genetic Risk Scores Calculated Using SNPs Related to Vitamin D Levels
We calculated the composite GRS based on the number of copies of effect alleles at the 320 vitamin D-associated SNPs, assuming that allelic associations from most GWASidentified variants could be replicated in non-European populations. The GRS of vitamin D concentration was highest among Europeans, followed by Americans, South Asians, East Asians, and Africans ( Figure 3).

Genetic Risk Scores Calculated Using SNPs Related to Vitamin D Levels
We calculated the composite GRS based on the number of copies of effect alleles at the 320 vitamin D-associated SNPs, assuming that allelic associations from most GWASidentified variants could be replicated in non-European populations. The GRS of vitamin D concentration was highest among Europeans, followed by Americans, South Asians, East Asians, and Africans ( Figure 3).  A strong correlation was observed between the vitamin D concentration from several studies [25][26][27][28][29][30] and GRS with a similar geographic latitude (51 • , R 2 = 0.59) in the grey dashed line (Figure 4). In addition, the vitamin D concentration for its original endogenous population vs. GRS fitted to the U curves of the black line ( Figure 4). Correlation plot of the difference between vitamin D concentration and GRS using related SNPs or latitude Vitamin D concentration was strongly correlated with average GRS rather than latitude effect, when reviewing the vitamin D difference vs. GRS with R 2 value of 0.9996 (A), instead of latitude difference with an R 2 value of 0.6438 (B) in Figure 5. studies [25][26][27][28][29][30] and GRS with a similar geographic latitude (51°, R 2 = 0.59) in the grey dashed line (Figure 4). In addition, the vitamin D concentration for its original endogenous population vs GRS fitted to the U curves of the black line (Figure 4). Correlation plot of the difference between vitamin D concentration and GRS using related SNPs or latitude Vitamin D concentration was strongly correlated with average GRS rather than latitude effect, when reviewing the vitamin D difference vs. GRS with R 2 value of 0.9996 (A), instead of latitude difference with an R 2 value of 0.6438 (B) in Figure 5.  Vitamin D concentration was strongly correlated with average genetic risk score rather than latitude effect when reviewing the vitamin D difference versus genetic risk score with an R 2 value of 0.9996, instead of latitude difference with an R 2 value of 0.6438.

Discussion
Vitamin D deficiency is associated with unfavorable bone conditions and chronic diseases such as cancer and diabetes [31]. Thus, in this study, we aimed to assess the different risk alleles of ethnic groups that may reflect vitamin D concentrations. We found that al- Vitamin D concentration was strongly correlated with average genetic risk score rather than latitude effect when reviewing the vitamin D difference versus genetic risk score with an R 2 value of 0.9996, instead of latitude difference with an R 2 value of 0.6438.

Discussion
Vitamin D deficiency is associated with unfavorable bone conditions and chronic diseases such as cancer and diabetes [31]. Thus, in this study, we aimed to assess the different risk alleles of ethnic groups that may reflect vitamin D concentrations. We found that allele frequencies were found to differ dependent on ethnic group, and the SNP-based genetic score was shown to have a strong correlation with real-world data of vitamin D levels.
Previously conducted GWASs revealed several significant loci, including GC, NADSYN1/DHCR7, CYP2R1, and CYP24A1 [11][12][13][14][15][16][17], that played an important role in vitamin D concentrations. Subsequently, using these significant loci (46 SNPs), Jones et al. showed variations of vitamin D levels among European, East Asian, and African populations by UVB exposure and ancestry [24]. Our study hypothesized that different allele frequencies of ethnic groups and Koreans might have significant loci for the evolution of different vitamin D concentrations regardless of environmental factors. In our study, the rs200641845 and rs7041 related GC, encoding vitamin D binding carrier protein, were highly depleted in Koreans and Africans. The rs3829251 and rs11233933 associated with NADSYN1 were depleted in Africans whereas they were enriched in Koreans. This could be one piece of the evidence in relation to Koreans and Africans having a different mechanism related to low vitamin D levels, as NADSYN1/DHCR7 is involved in UVB-induced vitamin D metabolism in the skin.
Additionally, we found some SNPs [rs10818769 (RABGAP1), rs9409266 (RABGAP1), rs12881545 (DLK1), rs10070734 (LINC00461), and rs17765311 (AC007950.2)] that were highly underexpressed in East Asians (including Koreans) and Africans, while they wereoverexpressed in Europeans. The eQTL analysis showed that rs10818769 and rs9409266 affected RABGAP1 expression in the skin regardless of sun exposure. The pigmentation-associated allele evolution has been shown to include SLC24A5 [32] and RABGAP1 in a previous study [33], and RABGAP1 was the signature gene for vitamin D deficiency and skin pigmentation. It was found to be underexpressed among East Asians (including Koreans) and Africans, while highly expressed among Europeans in our study. This gene may provide a possible link between skin pigmentation and vitamin D concentration; however, further experimental studies are needed to confirm this. This result is consistent with the nutrigenomics of vitamin D in that the main evolutionary driver of decreased skin pigmentation was the need for sufficient endogenous vitamin D production [34]. Skin color and genetic variation may explain vitamin D deficiency and adaptation to life in the latitudes [35].
The GRS was the highest among Europeans, followed by Americans, South Asians, East Asians, and Africans, and was correlated with vitamin D concentrations. This result is consistent with the estimates of 25(OH)D 3 levels <20 ng/mL that have been reported as 24% in the USA, 37% in Canada, and 40% in Europe [36,37]. European Caucasians have been shown to have lower rates of vitamin D deficiency compared with nonwhite individuals [36,38]. In addition to genetics, environmental factors such as nutrition and sunlight exposure are important determinants of vitamin D concentration, and latitude was one of the factors considered in this study. Moreover, vitamin D deficiency is common in non-western immigrants due to low sunshine exposure, pigment skin, and low calcium intake [39]. In this regard, the comparison of multi-ethnic group data in a single country would be desirable. According to the study by Van der Meer et al., the mean vitamin D concentration was 26.8 ng/mL among the Dutch and 13.2 ng/mL among Africans in the Netherlands at a latitude of 51 • [40]. These results are consistent with those of another study showing that African Americans had lower levels of vitamin D than European Americans [41]. The pooled prevalence of low vitamin D status in Africa was 33.22% (26.22-43.68%) with a cutoff of serum 25(OH)D 3 concentration of <20 ng/mL and an overall mean of 26.8 ng/mL [42]. Furthermore, vitamin D concentrations were strongly correlated with GRS rather than latitude effects when examining GRS vs. vitamin D concentration differences instead of latitudinal differences. Thus, latitude factors should be considered for vitamin D concentration assessment in genetic models, as was performed in our study.
A major strength of our study was the inclusion of the Korean whole-genome dataset of 1722 individuals that reflected the allele frequency of SNPs related to vitamin D deficiency. Moreover, we computed the risk model using a significant number of alleles (n = 320) related to vitamin D compared to a previous study that used only the major loci [24]. Additionally, we did not systematically organize the new vitamin D cohort and analyze the effect; instead, we compared the data from the 1000 Genomes Project with the vitamin D-related SNP data from the GWAS catalog. Despite these strengths, there are some limitations to this study. First, the GWAS catalog contained data where the risk allele was not clearly defined according to the minor allele frequency (MAF). We did not exclude these from our study because the majority of MAFs were likely to be risk alleles. Therefore, inaccurate subgroup analysis could have arisen. To address this issue, risk allele curation is necessary for the GWAS catalog based on the results of additional large population studies using cohorts in whom vitamin D was measured. Moreover, the statistical significance of EAF in the Korean population was high and should be interpreted with caution since Fisher's test can decrease the p-value as the number of subjects increases, even with the same odds ratios. Third, latitude and genomic modeling were only used for vitamin D analysis; other environmental factors (such as nutrition) were not considered. A previous study on a multi-ethnic population of Norway has shown that there are many modifiable risk factors related to 25(OH)D 3 levels [43]. Finally, we used the composite GRS instead of polygenic risk score for two reasons: the weighted-odd ratios of vitamin D concentrations varied according to the ethnic group even for the same SNP, and as there were inaccuracies of weighted-odd ratios due to insufficient study data among the African and American populations. In the future, a polygenic risk score with the effect size-weighted odd ratio should be evaluated.

Conclusions
Our study found a substantial population difference in terms of allele frequencies in vitamin D-related SNPs. The GRS for vitamin D concentrations was higher in Europeans compared to that found in East Asians and Africans, which were highly correlated with actual data. In addition, vitamin D concentration was strongly correlated with average GRS rather than latitude effect. From the public health perspective of vitamin D deficiency, genetic variants associated with vitamin D, as well as environmental factors (latitude, UVB exposures), should be considered. Further studies are needed to identify variant SNPs in genes such as RABGAP1 (rs10818769, rs9409266) that reflect vitamin D deficiency in East Asians or Africans and to assess their modifiable roles for evolutionary differences.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/genes12101530/s1, Figure S1: Entire differences of single-nucleotide polymorphisms related to the vitamin D concentration in the global population, Table S1: Effect allele frequencies (EAFs) of vitamin D concentrations related single nucleotide polymorphisms in continental groups, including Koreans, Table S2: Effect allele frequencies (EAFs) of vitamin D concentrations related single nucleotide polymorphisms in continental groups, including Koreans.  Informed Consent Statement: Patient consent was waived due to retrospective data analysis and de-identify data, Institutional Review Board of Veterans Health Service Medical Center approved wavier of informed consent.

Data Availability Statement:
The raw datasets generated and analyzed during the current study are not publicly available since any data providing the whole-genome sequencing data is considered to be personal property by the Korea Bioethics law. However, the raw whole-genome sequencing data for research are available at the reasonable request under the permission of the National Biobank of Korea contact at [http://nih.go.kr/biobank/cmm/main/mainPage.do?/, accessed on 15 January 2020] and e-mail [biobank@korea.kr]. The allele frequency of Korea reference genome data base (KRGDB) is available [http://152.99.75.168:9090/KRGDBDN/dnKRGinput.jsp, accessed on 15 January 2020], files required are all three of 'the totally merged sets' of common variants, rare variants, and indels. The 1000genomes data is available, all the files from the following folder were downloaded, [ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/] (last accessed: 15 January 2020). The genome-wide association study (GWAS) catalog data is available in the (NHGRI-EBI, [https://www.ebi.ac.uk/gwas/docs/file-downloads, accessed on 15 January 2020], "All associations v1.0.2-with added ontology annotations, GWAS Catalog study accession numbers and genotyping technology", December 2020).