A Genome-Wide Association Study of Genetic Variants of Apolipoprotein A1 Levels and Their Association with Vitamin D in Korean Cohorts

Dyslipidemia is an important independent risk factor for cardiovascular disease (CVD). Specifically, apolipoprotein A1 (ApoA1), apolipoprotein B (ApoB), and the ApoB/A1 ratio have been linked to CVD. We conducted a genome-wide association study meta-analysis of two Korean cohorts containing a total of 12,924 patients to identify novel single nucleotide polymorphisms (SNPs) associated with ApoA1 and ApoB levels and the ApoB/A1 ratio. Additionally, an expression quantitative trait locus (eQTL) and differentially expressed genes (DEGs) analysis were performed. The statistically significant eQTL, DEG, and Gene Ontology (GO) results were used to explore the predicted interaction networks and retrieve the interacting genes and proteins. We identified three novel SNPs (rs11066280, p = 3.46 × 10−21; rs1227162, p = 2.98 × 10−15; rs73216931, p = 5.62 × 10−9) associated with ApoA1. SNP rs73216931 was an eQTL for KMT5A in the pancreas and whole blood. The network analysis revealed that HECTD4 and MYL2:LINC1405 are associated with AKT1. Our in silico analysis of ApoA1 genetic variants revealed heart muscle-related signals. ApoA1 also correlated positively with vitamin D, and genes associated with ApoA1 and vitamin D were found. Our data imply that more research into ApoA1 is needed to understand the links between dyslipidemia and CVD and vitamin D and CVD.


Introduction
Globally, cardiovascular disease (CVD) is the leading cause of mortality, claiming an estimated 17.9 million lives annually [1]. It is widely accepted that dyslipidemia contributes to CVD [2,3], and lipoprotein levels are an independent risk factor for arteriosclerosis and CVD [4]. Apolipoprotein A1 (ApoA1) is a crucial functional component of high-density lipoprotein (HDL) particles, which reverse cholesterol transport from peripheral tissue to the liver [5], and ApoA1 decreases blood cholesterol levels and prevents atherosclerosis formation [6]. Apolipoprotein B (ApoB) is the main lipoprotein in low-density lipoprotein (LDL) cholesterol. Therefore, ApoA1 and ApoB analytical investigations are valuable in studying disorders such as CVDs and metabolic disease. In addition, the ApoB/ApoA1 ratio is now widely used to predict the occurrence of CVDs and metabolic syndrome [7] because it is a composite index that comprehensively reflects the lipid metabolism balance [8][9][10].

Study Participants
Schematic plots of the analytical study design are provided in Figure 1. Data were obtained from two independent cohorts: The Korean Association Resource from Ansan and Ansung (KARE) cohort (n = 5918) and the Cardiovascular Disease Association Study (CAVAS, n = 8105) cohort. Each cohort has its own distinct characteristics. The KARE cohort is a representative cohort for genome research in Korea. It is a longitudinal cohort of people from the Ansan and Ansung communities in Korea gathered to study complex diseases, including diabetes. The CAVAS cohort is an elderly cohort of people from rural areas that includes many patients with CVDs. The exclusion criteria for this study were as follows: Missing genotype and phenotype data for ApoA1 and ApoB. Among the 14,023 participants in the 2 cohorts, 1099 were excluded, and 12,924 were enrolled ( Figure 1). The institutional review boards of the Veterans Health Service Medical Center approved this study protocol (IRB No. 2020-03-009 and IRB No. 2019-08-014) with an informed consent waiver because our analyses were retrospective. This study was conducted in compliance with the Helsinki Declaration. The committee of the National Biobank of Korea approved the use of bioresources for this study (KBN-2019-054 and KBN-2020-071).

Biochemical Measurements
Serum levels of ApoA1 and ApoB were measured by immunoturbidimetric assay using a Roche-Hitachi Cobas 8000 c702 (Roche Diagnostics System, Switzerland) for both the KARE and CAVAS cohorts. Serum 25(OH)D levels were measured by a chemiluminescent microparticle immunoassay using an Architect i2000SR system (Abbott, Singapore) for the KARE and CAVAS cohorts.

Genotyping
We used a genome-wide set of variants genotyped with the Affymetrix Genome-Wide Human SNP Array 5.0 for the KARE cohort (n = 5918) and the Korea Biobank arrays (KoreanChip) for the CAVAS cohort (n = 8105). The KoreanChip was developed by the Center for Genome Science at the Korea National Institutes of Health using the Affymetrix Axiom ® Array (Affymetrix, Santa Clara, CA, USA). The KARE and CAVAS cohorts have already been described in further detail [27,28]. We used SHAPEIT2 (with the default setting of 2 Mb windows and 100 states) and IMPUTE2 (with chunk sizes of 5 Mb and the command option "-Ne 20000") for haplotype phasing and imputation, respectively [29,30]. For imputation, the 1000 Genomes Phase III data were used as a reference panel. Any variant with genotype call rates under 95%, minor allele frequency (MAF) values below 0.05, or in violation of the Hardy-Weinberg equilibrium (p < 1 × 10 −6 ) was removed.
Only SNPs with quality scores greater than 0.5 were kept, resulting in 3,417,851 SNPs for the KARE cohort and 3,505,163 SNPs for the CAVAS cohort ( Figure 1). The National Center for Biotechnology Information Human Genome Build 37 (hg19) was used to confirm gene locations.

Biochemical Measurements
Serum levels of ApoA1 and ApoB were measured by immunoturbidimetric assay using a Roche-Hitachi Cobas 8000 c702 (Roche Diagnostics System, Switzerland) for both the KARE and CAVAS cohorts. Serum 25(OH)D levels were measured by a chemiluminescent microparticle immunoassay using an Architect i2000SR system (Abbott, Singapore) for the KARE and CAVAS cohorts.

Genotyping
We used a genome-wide set of variants genotyped with the Affymetrix Genome-Wide Human SNP Array 5.0 for the KARE cohort (n = 5918) and the Korea Biobank arrays (KoreanChip) for the CAVAS cohort (n = 8105). The KoreanChip was developed by the Center for Genome Science at the Korea National Institutes of Health using the Affymetrix Axiom ® Array (Affymetrix, Santa Clara, CA, USA). The KARE and CAVAS cohorts have already been described in further detail [27,28]. We used SHAPEIT2 (with the default setting of 2 Mb windows and 100 states) and IMPUTE2 (with chunk sizes of 5 Mb and the command option "-Ne 20000") for haplotype phasing and imputation, respectively [29,30]. For imputation, the 1000 Genomes Phase III data were used as a reference panel. Any variant with genotype call rates under 95%, minor allele frequency (MAF) values below 0.05, or in violation of the Hardy-Weinberg equilibrium (p < 1 × 10 −6 ) was removed. Only SNPs with quality scores greater than 0.5 were kept, resulting in 3,417,851 SNPs for the KARE cohort and 3,505,163 SNPs for the CAVAS cohort ( Figure 1). The National Center for Biotechnology Information Human Genome Build 37 (hg19) was used to confirm gene locations.

Statistical Analyses
The baseline characteristics of the study population are presented as means with standard deviations for continuous variables and numbers with proportions for categorical variables. Association analysis between vitamin D and apolipoproteins (ApoA1, ApoB, ApoB/ApoA1) was conducted because vitamin D is associated with CVDs [24]. In addition, gene loci in the GWAS Catalog (https://www.ebi.ac.uk/gwas/ (accessed on 27 June 2022)) [31] associated with both vitamin D levels (including the Korean cohorts study [25]) and apolipoprotein levels were investigated and identified in our study.
Genome-wide association analyses for serum ApoA1, ApoB, and ApoB/ApoA1 were conducted using linear regression with an additive model. PLINK 1.9 was used within each cohort. Age, sex, body mass index, and ten principal component scores were included as covariates. Meta-analyses of the KARE and CAVAS cohorts were performed using METAL software (http://csg.sph.umich.edu/abecasis/meta (accessed on 19 April 2022)). Cochran's Q-test for heterogeneity was conducted. Its p-value is marked with 'HetPVal' [32], where HetPVal < 0.05 indicates heterogeneity between the two datasets [33]. The genomic inflation factor, λ, was calculated on the whole genome level by dividing the observed chi-squared test statistics by the expected chi-squared distribution median to ensure that no confounding was caused by population stratification in this study, close to 1 indicates no genetic inflation [34]. The regional plot of significant genetic variation was created using LocusZoom software [35]. The threshold for statistical significance in this model was p < 5.0 × 10 −8 , which is conventionally considered to reflect genome-wide significance.

Functional Annotation Analyses
Expression quantitative trait locus (eQTL) studies were performed using the Genotype-Tissue Expression dataset (https://gtexportal.org/home/ (accessed on 27 April 2022)), which contains a variety of densely genotyped data from donors to assess genetic variations within their genomes. Detailed methods, including normalization techniques, can be found at https://www.gtexportal.org/home/methods (accessed on 27 April 2022). The associated genes were further investigated for differentially expressed genes (DEGs) in human coronary artery cells treated with ApoA1 and phosphate-buffered saline controls using data from the Gene Expression Omnibus dataset (GSE53201) [36]. All array data are accessible from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/ (accessed on 3 May 2022)) with the GSE53201 accession code. The GEO also provides GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/ (accessed on 3 May 2022)), an R-based web application that helps users analyze GEO data. Data after quantile normalization were used, and the Benjamini-Hochberg false discovery rate-adjusted 0.05 significance level was applied for multiple hypothesis test corrections [37]. Moreover, we annotated genes by constructing gene interaction networks with the STRING v.11 online platform (https://string-db.org/ (accessed on 29 June 2022)). The results for the corresponding biological processes, cellular components, and molecular functions from the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses [38] were also identified from the STRING database.

Characteristics of the Study Participants
A total of 14,023 eligible subjects were included in this study (5918 subjects from the KARE cohort and 8105 subjects from CAVAS). One thousand and ninety-nine subjects were excluded due to the exclusion criteria ( Figure 1). Therefore, 12,924 participants (4938 subjects from the KARE cohort and 7986 from the CAVAS cohort) were analyzed in this study.

GWAS Meta-Analysis of ApoA1, ApoB, and ApoB/ApoA1
Overall, 3,417,851 SNPs from the KARE cohort and 3,505,163 SNPs from the CAVAS cohort were used. A total of 2,374,443 overlapping SNPs were selected as the final genetic markers for the GWAS meta-analysis of ApoA1, ApoB, and ApoB/ApoA1. Quantile-quantile (Q-Q) and Manhattan plots for ApoA1 are shown in Figure 2. The Q-Q plot revealed no evidence that the test statistics were inflated (λ = 1.024). The 16 genome-wide-significant variants for ApoA1 are listed in Table 2 Figure 3. The Q-Q plot revealed no evidence of any inflation of the test statistics (λ = 1.015). The GWAS meta-analysis for ApoB revealed 8 genome-wide-significant loci, all of which were previously reported ( Table 3). The Q-Q and Manhattan plots for the ApoB/ApoA1 ratio are shown in Figure S1. The Q-Q plot revealed no evidence of test statistic inflation (λ = 1.016). The GWAS meta-analysis for ApoB/ApoA1 revealed 9 genome-wide-significant loci (Table S1).

Regional Analysis and Functional Annotation
For the genome-wide-significant novel phenotype variants of ApoA1, the regional plots with the lead SNPs are displayed in Figure 4. The eQTL analysis showed KMT5A expression in the pancreas and whole blood and high MPHOSPH9 expression in the heart ventricles and atria (Table S2). In the DEG analysis with GSE53201, several genes, including CCL20 (adjusted p = 3.70 × 10 −4 , p = 1.29 × 10 −8 ), PTGS2 (adjusted p = 1.42 × 10 −2 , p = 9.83 × 10 −7 ), and TNIP3 (adjusted p = 1.90 × 10 −2 , p = 1.97 × 10 −6 ) were expressed more in the ApoA1 treatment group than in the control group ( Table 4). All of the 11 genes that were significant in ApoA1 treatment were also significant in another data set (naive HDL treatment and control) from GSE53201 (Table S3). The network analysis of HECTD4 and MYL2 revealed that the pathways related to AKT1 were also related to the ApoA1 level (  (Table 5). In addition, four KEGG pathways were related to cardiomyopathies (hsa05410, hsa05414, hsa04261, hsa04260, Table 5). With the exception of the novel loci identified in our investigation, subsequent GO and KEGG pathway analyses of the remaining significant loci related to ApoA1 and ApoB/ApoA1 showed a dyslipidemia-related function ( Figure S2 and Table S4). However, GO analysis of ApoB-related genes, which was significant in our GWAS, revealed no significant GO terms.

Association between Vitamin D and ApoA1, ApoB, and ApoB/ApoA1
Vitamin D was positively associated with ApoA1 in the KARE cohort (β = 0.235, p < 0.001), CAVAS cohort (β = 0.447, p < 0.001), and combined set (β = 0.387, p < 0.001, Table 6). In addition, the ApoB/ApoA1 ratio was negatively associated with vitamin D levels in the KARE cohort (β = −0.002, p < 0.001), CAVAS cohort (β = −0.001, p < 0.001), and combined set (β = −0.002, p < 0.001). However, we found no clear evidence of an association between ApoB and vitamin D levels. Genetic variants associated with the ApoA1 and vitamin D levels had similar directions, implying that the ApoA1 and vitamin D levels might share common risk factors for CVDs (Table S5).         Log FC = estimate of the log2-fold-change corresponding to the effect or contrast; AveExpr = average log2-expression for the probe over all arrays and channels; B = log-odds that the gene is differentially expressed.  It is the ratio between the number of proteins in the network that are annotated with a term and the number of proteins that could be expected to be annotated with this term in a random network of the same size.
3, x FOR PEER REVIEW 11 of 17 Log FC = estimate of the log2-fold-change corresponding to the effect or contrast; AveExpr = average log2-expression for the probe over all arrays and channels; B = log-odds that the gene is differentially expressed.

Discussion
Our study has demonstrated that three novel SNPs (rs11066280, rs1227162, and rs73216931) are linked to ApoA1 with GWAS significance. The functional analysis suggests that HECTD4, MYL2, and KMT5A are potential genes involved in the pathophysiology of ApoA1-related CVDs. Our GO and KEGG analyses revealed that HECTD4 and MYL2 are associated with cardiomyopathy, suggesting that the connection between AKT1 and the ApoA1 level has a direct connection to CVDs. Furthermore, the findings of the study, with the exception of the novel loci, demonstrated the dyslipidemia-related functions of ApoA1 and ApoB/ApoA1-related genes. In addition, using regression analyses, we found that the vitamin D concentration was positively linked to the ApoA1 concentration, and we have summarized the genes connected to both vitamin D and ApoA1. Thus, a genomic analysis of the combined effects of ApoA1 and vitamin D on CVDs could uncover meaningful pathological mechanisms.
ApoA1 is a major component of HDL, which drives the reverse transport of cholesterol from extrahepatic tissue to the liver and plays an essential role in protecting the arteries [39]. Because ApoB is a risk factor for CVDs, we expected the ApoB/ApoA1 to provide cardiovascular information in the GWAS results. In this study, we confirmed several loci, including three novel genes related to ApoA1. Additionally, although no novel loci were identified for ApoB or ApoB/ApoA1, SNPs associated with them were found.
The functional analysis of our data for two new loci (HECTD4 and MYL2) shows that they are associated with cardiomyopathy, suggesting that the association between AKT1 and the ApoA1 level has a direct connection to CVDs. AKT1 (protein kinase B) is a serin/threonine kinase, and ApoA1 enhances insulin-dependent and insulin-independent glucose uptake by skeletal muscle [40]. By activating downstream effectors and controlling the cell cycle transition, growth, and proliferation, PI3K/Akt signaling plays a vital role in regulating several physiological activities. This pathway has a role in the pathophysiology of various human ailments, including heart disease, by modulating cardiomyocyte growth and survival, angiogenic processes, and inflammatory responses [41]. Although a previous study showed that the PI3K/Akt pathway was not the main driver of HDL-mediated cell protection [42], our results indicate that it could nonetheless be involved. From this perspective, it is crucial to examine the causal relationships between CVDs and apolipoproteins. A previous observation analysis of ApoA1 and CVDs showed that ApoA1 was associated with a lower risk of CVDs. However, it was unrelated to the risk of CVD in that same study's MR analysis [43]. Those results suggest that genetic evidence does not support a cardioprotective role for ApoA1. Several investigations of the connection between vitamin D and CVDs have been conducted [44,45], but few causal results are available. The observational and MR analyses of participants with a vitamin D deficiency (<25 nmol/L) in a recent study provided strong evidence for an inverse association between vitamin D deficiency and all-cause mortality during follow-up (odd ratio (OR):0.69 95CI:0.59-0.80, p < 0.001) and a non-significant inverse association between vitamin D deficiency and CVDs (OR 0.89 [0.76-1.04], p = 0.14) [46]. Due to the existence of many risk factors, such as hypertension and atherosclerosis, the ability to analyze the importance of a single metabolic phenotype relationship with CVDs is limited.
Our study found novel loci for ApoA1 (HECTD4, MYL2, and KMT5A) in a Korean cohort, and our results will carry more weight if replication studies in other groups of people support them. In this regard, we conducted phenome-wide association studies using the Common Metabolic Disease Knowledge Portal (https://hugeamp.org (accessed on 29 June 2022)). We found that SNPs such as rs11066280, rs1227162, and rs73216931 are related to metabolic conditions such as waist-hip ratio, lipid metabolism (HDL cholesterol and LDL cholesterol), type 2 diabetes, and body fat percentage in the European population (Table S6).
A major strength of our study is the inclusion of a relatively large study sample drawn from two Korean cohorts. However, this study also has a few limitations. First, we did not conduct an MR analysis for ApoA1 and CVDs, and because the KARE and CAVAS cohorts are both community-based cohorts, the number of patients with CVDs is small. To compensate for that limitation, additional research focusing on a heart-disease cohort is needed. Second, we could not perform DEGs analysis for ApoB or ApoB/ApoA1 because we could not find available data for differentially expressed genes (DEGs) analysis of ApoB or ApoB/ApoA1 in the Gene Expression Omnibus database. Third, due to a large number of missing values, we were unable to consider various confounding factors such as HOMA-IR and HOMA-beta cells in our regression analysis. ApoB and vitamin D correlated negatively in the KARE cohort but positively in the CAVAS cohort. Although we do not know the exact reason for this, it is possible that the pattern may be different, as the KARE is a cohort of chronic disease studies, and the CAVAS is a cohort of CVD studies. The relationship between vitamin D level and lipid metabolism warrants additional investigation. Fourth, we did not include vitamin D GWAS results in this study because a previous study had already conducted a vitamin D GWAS analysis in the KARE cohort [25]. Although our regression analysis revealed relationships between vitamin D and ApoA1 levels, along with genetic connections, causation was not demonstrated. To establish causation, experimental research using the genomic biomarkers identified in our work will be needed.

Conclusions
This study used a GWAS meta-analysis to identify three novel loci (rs11066280, rs1227162, and rs73216931) related to ApoA1. Furthermore, GO and KEGG analyses revealed that HECTD4 and MYL2 are associated with cardiomyopathy, suggesting that the association between AKT1 and the ApoA1 level has a direct connection with CVDs. The analyses of associations between vitamin D and ApoA1 and vitamin D and ApoB/ApoA1 showed a real association in data from two Korean cohorts. In addition, shared genetic variants between ApoA1 and vitamin D levels could suggest that vitamin D plays an additive role in CVDs through an ApoA1-related pathway. Further study is needed to support the pathological mechanisms of ApoA1 in CVDs suggested by this genomic analysis.  Figure S2: Network analysis with a list of replicated genes (genes related to ApoA1, ApoB, and ApoB/ApoA1, that have been reported in other studies and were also found to be significant in this study). (A) ApoA1, (B) ApoB, and (C) ApoB/ApoA1. Table S1: Results of GWAS meta-analysis for ApoB/ApoA1 (leading SNPs). Table S2: cis-eQTLs of SNPs associated with ApoA1. Table S3: Differentially expressed genes for naive HDL in the Gene Expression Omnibus GSE53201 database (11 genes that were significantly differentially expressed for ApoA1). Table S4: Gene Ontology and KEGG pathway analyses using genes for ApoA1, ApoB and ApoB/ApoA1 are significant in our study and in previously reported studies. Table S5: List of SNPs Associated with Serum 25-Hydroxyvitamin D Concentration and Apolipoprotein A1 in Genome-Wide Analyses. Table S6: Links to phenome-wide association study (pheWAS) results using the "Common Metabolic Diseases Knowledge Portal" (https://hugeamp.org/ (accessed on 29 June 2022)).