Comprehensive Statistical and Bioinformatics Analysis in the Deciphering of Putative Mechanisms by Which Lipid-Associated GWAS Loci Contribute to Coronary Artery Disease

The study was designed to evaluate putative mechanisms by which lipid-associated loci identified by genome-wide association studies (GWAS) are involved in the molecular pathogenesis of coronary artery disease (CAD) using a comprehensive statistical and bioinformatics analysis. A total of 1700 unrelated individuals of Slavic origin from the Central Russia, including 991 CAD patients and 709 healthy controls were examined. Sixteen lipid-associated GWAS loci were selected from European studies and genotyped using the MassArray-4 system. The polymorphisms were associated with plasma lipids such as total cholesterol (rs12328675, rs4846914, rs55730499, and rs838880), LDL-cholesterol (rs3764261, rs55730499, rs1689800, and rs838880), HDL-cholesterol (rs3764261) as well as carotid intima-media thickness/CIMT (rs12328675, rs11220463, and rs1689800). Polymorphisms such as rs4420638 of APOC1 (p = 0.009), rs55730499 of LPA (p = 0.0007), rs3136441 of F2 (p < 0.0001), and rs6065906 of PLTP (p = 0.002) showed significant associations with the risk of CAD, regardless of sex, age, and body mass index. A majority of the observed associations were successfully replicated in large independent cohorts. Bioinformatics analysis allowed establishing (1) phenotype-specific and shared epistatic gene–gene and gene–smoking interactions contributing to all studied cardiovascular phenotypes; (2) lipid-associated GWAS loci might be allele-specific binding sites for transcription factors from gene regulatory networks controlling multifaceted molecular mechanisms of atherosclerosis.


Introduction
Cardiovascular diseases (CVD) were defined by the World Health Organization as the leading cause of death worldwide with an estimated 17.5 million of these cases occurring in 2017, and 42.3% of them were attributed to coronary artery disease (CAD) [1]. CAD is one of the most common cardiovascular disorders responsible for, in addition to high mortality rates, an increased disability as well as a decreased quality of life in the world [1,2]. It is well known that atherosclerosis of coronary arteries is the major cause of CAD, a pathological process affecting elastic and muscular-elastic type of arteries. Disorders of lipid metabolism are characterized by the deposition of cholesterol and atherogenic lipoproteins in the vascular wall, thereby contributing to the development of coronary atherosclerosis [3]. Epidemiological studies have shown that the severity of CAD correlates closely with the levels of plasma cholesterol and low-density lipoproteins [4,5]. Carotid intima-media thickness (CIMT) along with atherogenic lipid metabolism disorders represents well-characterized diagnostic markers for assessing the initial stages of vascular wall modifications underlying atherosclerosis [6].
Coronary artery disease is a typical multifactorial disorder determined by complex interactions between genetic and environmental factors [7]. A wide variety of genomic regions linked to CAD susceptibility has been identified; however, polymorphic genes involved in the regulation of lipid metabolism have attracted greater attention [8,9]. In recent years, genome-wide association studies (GWAS) have discovered a wide range of single nucleotide polymorphisms (SNP) associated with coronary artery disease in various populations of the world, and these data have been deposited into the GWAS catalogue [10]. In particular, several large-scale GWAS have been done and over 15,000 SNP-CAD associations have been identified in 2015, and hundreds of these loci showed substantial effects on the metabolism of lipids and lipoproteins [11,12].
Despite considerable progress in GWAS towards discovering dozens of gene polymorphisms associated with lipid traits, the molecular mechanisms underlying these relationships in the context of CAD pathogenesis remain understudied. The problem is often is exacerbated by the non-replication of associations between GWAS loci and cardiovascular phenotypes in independent populations, which might be attributed to both ethnic specific genetic background and environmental factors [13][14][15]. It is important to note that polymorphic loci associated with lipids are predominantly located in intronic or intergenic regions, thereby complicating pathophysiological interpretation of SNP-phenotype correlation and limiting the clinical use of such markers. In addition, a majority of lipid-associated GWAS loci have not been annotated for their functional role for CAD pathogenesis [9,16], neither by the use of bioinformatics SNP annotation tools nor by assessing with intermediate cardiovascular phenotypes such as carotid intima-media thickness. In this regard, there is a need for comprehensive investigation that, on the one hand, integrates cardiovascular phenotypes such as plasma lipids, CIMT, and CAD, on the other hand, uses modern bioinformatics approaches to deciphering the molecular mechanisms by which lipid-associated GWAS loci are linked to coronary atherosclerosis. Hence, the present study was designed to evaluate putative mechanisms by which lipid-associated GWAS loci are involved in the molecular pathogenesis of coronary artery disease using a comprehensive statistical and bioinformatics analysis.

Study Participants
The study protocol was approved by the Regional Ethics Committee of Kursk State Medical University. All participants gave their informed consent for participating in this study. DNA samples were obtained from the biobank of Research Institute for Genetic and Molecular Epidemiology of Kursk State Medical University. The samples were collected previously in the frames of genetic studies of cardiovascular and other chronic diseases for the period from 2003 to 2018 [17][18][19][20][21][22][23]. A total number of participants comprises 1700 unrelated individuals of Slavic origin, inhabitants of the Kursk region, and included 991 patients with a diagnosis of coronary artery disease (633 men and 358 women, medium age 59.9 ± 8.8 years) who were hospitalized in Cardiology Divisions of Kursk Emergency Hospital and Vascular Surgery Division of Regional Clinical Hospital (Kursk). The following criteria were used to include patients in the case group: (1) diagnosis of coronary artery disease verified by experienced cardiologists and confirmed by clinical and instrumental methods (electrocardiography monitoring, angiography of coronary arteries); (2) CAD patients did not take lipid-lowering drugs; (3) Slavic origin (Russian, Ukrainian, or Belarusian); and (4) informed consent for participation in the study. The control group included 709 relatively healthy individuals (452 men and 257 women, mean age 60.4 ± 8.1 years) without cardiovascular and other chronic diseases. The control group was recruited for the same period of time among patients admitted at annual medical examinations, as well as among staff of medical and educational organizations. The inclusion criteria for the control group were: (1) absence of any chronic diseases, (2) Slavic origin, and (3) informed consent of a patient to participate in the study. Table 1 summarizes the demographic, clinical, and biochemical characteristics (plasma lipids) of the study participants. The case and control groups were matched according to both sex and age (p > 0.05). Body mass index was greater in CAD patients then in healthy subjects (p < 0.001). The majority of CAD patients (94.3%) suffered from hypertension, and approximately one-fifth had diabetes. The number of smokers was lower in the case group then in healthy controls (p = 0.002).

Clinical Examination of Patients
Patients with congenital heart and vascular defects, cardiomyopathies, cancer, connective tissue diseases, and chronic inflammatory diseases were not included in the case study. All study participants were interviewed for cardiovascular risk factors with a questionnaire used in our previous studies [18][19][20]. Body mass index (BMI) was calculated as the ratio of body weight in kilograms to the height in square meters (kg/m 2 ). BMI of patients with coronary artery disease was 29.8 ± 5.4, that significantly exceeded (p < 0.0001) BMI in the control group (27.0 ± 4.5). CAD patients underwent an ultrasound examination of the brachiocephalic arteries to assess carotid intima-media thickness. CIMT was measured (in mm) using the ultrasound diagnostic system MyLab™ 40 in B-mode with an ultra-high resolution linear transducer (Esaote Europe B.V., Maastricht, The Netherlands) bilaterally in distal third of the common carotid artery at a distance of 1-1.5 cm proximal to the posterior wall bifurcation.

Biochemical Investigations
Venous blood samples for biochemical investigation have been taken in the morning after a 12 h fast. The following parameters of lipid metabolism in plasma were assessed: total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides (TG). The level of TC and TG was determined with a direct enzymatic method on an automatic analyzer Vitalab Flexor E (Vital Scientific N.V., Spankeren, The Netherlands) using reagents Analitycon (Analyticon Biotechnologies AG, Lichtenfels, Germany). The levels of HDL-C was determined an automatic biochemical analyzer "Cobas c 311" (Roche Diagnostics GmbH, Mannheim, Germany) using a kit of reagents Roche Diagnostics. The levels of total cholesterol, LDL-C, HDL-C, and TG were expressed in mmol/L. The level of LDL-C was calculated using the Friedwald's formula and expressed in mmol/L.

SNP Genotyping
Whole blood samples with 0.5M EDTA were used to isolate genomic DNA by standard phenol-chloroform extraction and ethanol precipitation. Single nucleotide polymorphisms at genes involved in the lipid metabolism were selected using public internet resources such as GWAS Catalog (www.ebi.ac.uk/gwas, accessed date 12 August 2019), PubMed (www. ncbi.nlm.nih.gov/pubmed, accessed date 4 March 2019), and HuGE Literature Finder (https://phgkb.cdc.gov/PHGKB/startPagePubLit.action, accessed date 18 April 2019), as described previously [24]. The criteria for SNP selection included: (1) SNP is associated with one or more plasma lipids such as TC, LDL-C, HDL-C, and TG at a genome-wide significance level (p ≤ 5 × 10 −8 ), (2) association of SNP and lipid(s) was observed in European population, and (3) minor allele frequency (MAF) is ≥5%. A total 16 SNPs such as rs1883025 of ABCA1, rs4420638 of APOC1, rs3764261 of CETP, rs12328675 of COBLL1, rs3136441 of F2, rs4846914 of GALNT2, rs386000 of LILRA3, rs55730499 of LPA, rs21740 of NPC1L1, rs6065906 of PLTP, rs16942887 of PSKH1, rs11220463 of ST3GAL4, rs881844 of STARD3, rs1689800 of ZNF648, rs838880 of SCARB1, and rs9987289 of PPP1R3B (Table 2) were selected from a few dozens of lipid-associated GWAS loci. The selection of the above SNPs satisfied for their simultaneous (multiplex) detection using the iPLEX MALDI-TOF technology implemented in the MassARRAY-4 system (Agena Bioscience, San Diego, CA, USA). MassARRAY Assay Design Suite software (https://agenacx.com, accessed date 14 September 2019) was used to design the genotyping panel. Primers for genotyping were synthesized by the Evrogen company (Moscow, Russia). Primer sequences are available upon request. The quality of genotyping was assessed in 95 randomly selected DNA samples blindly to the case and control status. Repeated multiplex genotyping of these samples using the MassARRAY-4 system reached concordant results.

Statistical Analysis
The distribution of genotype frequencies according to the Hardy-Weinberg equilibrium (HWE) and the comparison of allele and genotype frequencies between the study groups were evaluated with Fisher's exact test. The Kolmogorov-Smirnov test was used to assess the studied quantitative traits (i.e., plasma lipids, CIMT) for the normal distribution. Associations of SNPs with the risk of CAD were assessed by the odds ratio (OR) and 95% confidence intervals (95% CI) using multiple logistic regression with adjustments for sex, age, and BMI. Since blood lipids showed a deviation from the normal distribution (p < 0.05), their relationship with the polymorphisms was assessed by the Kruskal-Wallis test using software STATISTICA v13.0 (Statsoft, Tulsa, OK, USA). Plasma lipids and CIMT were expressed as medians (Me) and interquartile ranges (Q1/Q3). SNPstats statistical software [28] was used to assess the phenotypic effects of SNPs on plasma lipids and CIMT after rank-based inverse normal transformation of the traits. The impact of paired SNP combinations (diplotypes) on the normalized plasma lipids, carotid intima-media thickness, and the risk of coronary artery disease was analyzed by the likelihood-ratio test (LRT) with SNPassoc package for R [29] and adjusted for sex, age, and BMI. Replication for associations between SNPs and cardiovascular phenotypes such as coronary artery disease, TC, LDL-C, HDL-C, and TG was performed using by whole-genome genotype datasets from the Cardiovascular Knowledge Portal (https://cvd.hugeamp.org, accessed date 15 January 2022).

Bioinformatics Analysis
The model based multifactor dimensionality reduction (mbmdr) method [30,31] was used to evaluate gene-gene (G × G) and gene-environment (G × E) interactions associated with plasma lipids, CIMT and the susceptibility to coronary artery disease. Mbmdr is an extended variant of classic method Multifactor Dimensionality Reduction proposed by Ritchie and Hahn [32,33] for nonparametric assessment of gene-gene and gene-environment interactions underlying complex traits. Online bioinformatics tools and resources such as FuncPred (https://snpinfo.niehs.nih.gov accessed date 4 June 2019), GTEx portal (https: //www.gtexportal.org, accessed date 7 June 2019), rSNPBase (http://rsnp.psych.ac.cn, accessed date 15 June 2019), and atSNP (atsnp.biostat.wisc.edu, accessed date 26 June 2019), were used for a comprehensive functional annotation of the studied polymorphisms. A web bioinformatics resource atSNP Search was used to statistically evaluate influence of human genetic variation on transcription factor binding [34]. A threshold significant p-value ≤ 0.01 for the SNP impact (indicates that the SNP is candidate for potential gainor loss-of-function) was selected to assign the tested DNA motif covering a SNP area to the potential target for binding to a specific transcription factor. The molecular effects of in silico predicted transcription factors on gene expression were annotated with Gene Ontology (GO) terms (http://geneontology.org, accessed date 14 June 2019) and Uniprot database (https://www.uniprot.org, accessed date 29 June 2019). The sets of such transcription factors were enriched with GO terms to explore molecular functions, biological processes, and pathways in which these transcription factors are involved. Enrichr bioinformatics resource (https://maayanlab.cloud/Enrichr/, accessed date 2 July 2019) was used for the enrichment analysis.

Association of Gene Polymorphisms with the Risk of Coronary Artery Disease, Plasma Lipids, and Carotid Intima-Media Thickness
Genotype distribution for a majority of SNPs was in Hardy-Weinberg equilibrium. However, genotype frequencies of SNPs rs3764261 of CETP, rs12328675 of COBLL1, and rs838880 of SCARB1 showed significant deviations from those expected for HWE (p < 0.05). The allele frequencies of 16 lipid-associated GWAS loci in populations of Central Russia and Europe are shown in Table 3. Significant inter-population differences in MAF were found for two SNPs such as rs4420638 of APOC1 (p = 0.02) and rs3136441 of F2 (p < 0.0001). The frequency of allele rs3136441-C of the F2 gene was higher in the Central Russian population than in Europeans. Table 4 shows associations of the studied polymorphisms with the risk of coronary artery disease in a population of Central Russia. As can be seen form Table 4, four of sixteen polymorphisms showed significant associations with the risk of coronary artery disease regardless of sex, age, and body mass index. In particular, SNP rs4420638 of APOC1 was associated with an increased risk of CAD (OR = 1.49, 95% CI 1.10-2.03, p = 0.009, overdominant effect). Polymorphism rs55730499 of the LPA gene was associated with increased disease risk (OR = 1.92, 95% CI 1.30-2.83, p = 0.0007, recessive effect). SNPs rs3136441 of F2 (OR = 0.49, 95% CI 0.37-0.64, p < 0.0001, dominant effect) and rs6065906 of PLTP (OR = 0.66, 95% CI 0.50-0.86, p = 0.002, recessive effect) showed associations with decreased disease risk.
Next, we analyzed the relationship between the studied polymorphisms and plasma lipids in CAD patients ( Table 5). The normalized values of plasma lipids were used for the statistical analysis. It was found that the levels of total cholesterol are associated with SNPs such as rs12328675 of COBLL1 (p = 0.02, overdominant effect), rs4846914 of GALNT2 (p = 0.02, overdominant effect), rs55730499 of LPA (p = 0.03, overdominant effect), and rs838880 of SCARB1 (p = 0.035, recessive effect). The levels of LDL-C were found to be associated with polymorphisms rs3764261 of CETP (p = 0.042, dominant effect), rs55730499 of LPA (p = 0.0007, recessive effect), rs1689800 of ZNF648 (p = 0.026, recessive effect), and rs838880 of SCARB1 (p = 0.043, overdominant effect). The levels of HDL-C showed an association with rs3764261 of the CETP gene (p = 0.005, overdominant effect). The levels of triglycerides were associated with polymorphisms rs217406 of NPC1L1 (p = 0.02, recessive effect) and rs6065906 of PLTP (p = 0.03, dominant effect).    Table 6 shows associations of the studied SNPs on carotid intima-media thickness in patients with CAD. The analysis was performed on both non-transformed (Kruskal-Wallis test) and normalized (linear regression analysis) values of carotid intima-media thickness after adjustment for cofactors such as sex, age, and body mass index. As a result, three polymorphisms were established to be associated with the CIMT index in patients with coronary artery disease. In particular, SNP rs12328675 of the COBLL1 gene was associated with a decrease in CIMT (p = 0.009, additive effect). The rs11220463 polymorphism of the ST3GAL4 gene also showed association with decreased carotid intima-media thickness (p = 0.01, overdominant effect). However, SNP rs1689800 of ZNF648 was associated increased CIMT in CAD patients (p = 0.02, dominant effect) No significant associations were found between CIMT and other gene polymorphisms.

Smoking as a Triggering Factor Modifying the Genetic Effects on Plasma Lipids, CIMT, and CAD Risk
The multifactorial nature of atherosclerosis and coronary artery disease means that genetic and environmental factors are jointly involved in the mechanisms underlying the disease [35,36]. In this regard, a joint analysis of genetic and environmental factors may provide deeper insights into the disease mechanisms. Hence, we analyzed associations of the studied gene polymorphisms with the risk of coronary artery disease, plasma lipids and carotid intima-media thickness in the groups stratified by smoking habit, one of the most important risk factors for cardiovascular disease. Linear regression analysis adjusted for cofactors (sex, age, and body mass index) was used to evaluate effects of the SNPs on the normalized quantitative values of plasma lipids and CIMT. The association analysis results are summarized in Table 7. Gene polymorphisms such as rs12328675 of COBLL1 (p = 0.03), rs3136441 of F2 (p = 0.02), and rs16942887 of PSKH1 (p = 0.049) were associated with increased levels of total cholesterol in plasma exclusively in smokers. A similar trend in the associations was observed for increased levels of low-density lipoprotein cholesterol and polymorphisms rs1883025 of the ABCA1 gene (p = 0.003), rs55730499 of the LPA gene (p = 0.0001), and rs1689800 of the ZNF648 gene (p = 0.02) in smokers, while these associations were not established in non-smokers (p > 0.05).   Notably, the levels of LDL-C were increased more than two times in the carriers of genotype rs55730499-T/T of LPA (5.8 mmol/L) then in the carriers of genotypes C/C and C/T (2.5 mmol/L). SNP rs3764261 of the CETP gene was associated with increased levels of high-density lipoprotein cholesterol only in non-smokers (p = 0.006), while this antiatherogenic effect was not seen in smokers. A relationship between a decrease in plasma triglycerides and polymorphisms rs4846914 of GALNT2 (p = 0.008) and rs386000 of LILRA3 (p = 0.03) was found exclusively in non-smoking subjects. However, SNP rs4846914 of GALNT2 was associated with increased CIMT in smokers (p = 0.05). On the contrary, SNPs rs217406 of NPC1L1 (p = 0.02) and rs881844 of STARD3 (p = 0.03) were associated with a lower CIMT, but only in non-smoker individuals. An interesting finding that SNP rs3136441 of the F2 gene was associated with decreased risk of coronary artery disease in both smokers (p = 0.004) and non-smokers (p = 0.02). SNP rs4420638 of the APOC1 gene was associated with increased risk of CAD in non-smokers (p = 0.009), while the association of this SNP in smokers showed the similar trend, but did not reach the statistical significance level (p = 0.10). Unexpected findings that polymorphisms rs12328675 of COBLL1 (p = 0.02) and rs55730499 of LPA (p = 0.01) were associated with an increased risk of coronary artery disease in non-smokers, while no significant associations of these SNPs were found in smokers. SNP rs217406 of the NPC1L1 gene showed significant association with an increased risk of coronary artery disease exclusively in smokers (p = 0.01). Table 7. Smoking-stratified analysis for associations of SNPs and cardiovascular phenotypes.

Gene (SNP ID)
Smoking Habit Cardiovascular Phenotypes p-values for associations were analyzed by linear (normalized TC, LDL-C, HDL-C, TG, and CIMT values) and logistic (CAD) regression analyses adjusted for sex, age and BMI. R-recessive model, D-dominant model, OD-overdominance model, AD-log-additive model. The coloured cells mean: atherogenic (pink) and antiatherogenic (green) SNPs effects.

Replication for Associations between SNPs and Cardiovacsular Phenotypes in Independent Populations
Whole-genome genotype datasets from the Cardiovascular Knowledge Portal were utilized to perform a replication study for associations between SNPs and the cardiovascular phenotypes such as coronary artery disease, TC, LDL-C, HDL-C, and TG in independent population cohorts of CAD patients and controls. Results of replication analysis were interpreted only for those genotype-phenotype associations that reached a significance p-value of 0.01 or lesser in our study. Table 8 shows associations of the studied polymorphisms with CAD and plasma lipids. Single nucleotide polymorphisms such as rs4420638 of APOC1 (p = 1.15 × 10 −203 ), rs55730499 of LPA (p = 1.45 × 10 −174 ), and rs6065906 of PLTP (p = 0.01) were successfully replicated in large population cohorts as susceptibility markers for coronary artery disease. Only the rs3136441 SNP of F2 was not associated with the risk of CAD in the entire population cohort. In contrast, rs3764261 of CETP (p = 3.57 × 10 −10 ), rs4846914 of GALNT2 (p = 1.14 × 10 −8 ) were significantly associated with the risk of CAD in independent cohorts, whereas we did not find such association in our population. Furthermore, several gene polymorphisms were successfully replicated as markers associated with plasma lipids: rs3764261 of CETP with HDL-C (p = 7.08 × 10 −36 ), rs4846914 of GALNT2 with TC (p = 0.049), rs55730499 of LPA (p = 4.16 × 10 −298 ), and rs1689800 of ZNF648 (p = 6.84 × 10 −8 ) with LDL-C. In contrast to our findings, polymorphisms such as, for instance, rs1883025 of ABCA1, rs4420638 of APOC1, and rs9987289 of PPP1R3B showed significant associations with all studied plasma lipids in the independent populations.

Analysis of Pairwise SNP-SNP Interactions Contributing to the Studied Cardiovascular Phenotypes
Polygenic mechanisms underlying atherosclerosis substantiate a need in investigating the role of gene-gene and gene-environment interactions contributing to the pathophysiol-ogy of coronary artery disease. Pursuing this interest, we analyzed joint effects of the gene polymorphisms on the cardiovascular phenotypes including plasma lipids, CIMT, as well as the risk of coronary artery disease. Initially, the LRT was used to evaluate the effects of SNP-SNP combinations on the normalized phenotypic values. The LRT analysis allowed identifying a lot of paired interactions between genes associated with plasma lipids, CIMT, and coronary artery disease. Figure 1 shows interactome networks indicating a complexity of epistatic interactions between studied gene polymorphisms significantly associated with the studied cardiovascular phenotypes. In particular, epistatic interactions between 13 gene polymorphisms influence the levels of total plasma cholesterol in CAD patients with the most strong gene-gene interactions comprising polymorphisms of ABCA1 and ST3GAL4, as well as GALNT2 and LILRA3. The interactome network of total plasma cholesterol shows that the GALNT2 and PSKH1 genes represent the nodes, each interacting with a network of other genes, and these nodes are bridged to each other through the LILRA3 and ZNF648 polymorphisms. In contrast, the joint effect of the STARD3 and PPP1R3B SNPs on the total cholesterol levels was found regardless of other studied genes. Epistatic interactions with the most substantial effects on the plasma TC levels were found for combinations of LILRA3 and GALNT2 (p = 0.005) as well as ABCA1 and ST3GAL4 (p = 0.02). Interactions between 14 polymorphic genes such as ABCA1, APOC1, CETP, F2, GALNT2, LILRA3, LPA, NPC1L1, PLTP, PSKH1, ST3GAL4, STARD3, ZNF648, and SCARB1 showed the joint effects on cholesterol levels of atherogenic lipoproteins (i.e., LDL). Individual effects on LDL-C were established for polymorphic variants of CETP (p = 5 × 10 −5 ), LPA (p = 0.002), ZNF648 (p = 0.03) and SCARB1 (p = 0.02). Interestingly, gene-gene interactions representing the "chain" spanned LPA × APOC1 × CETP × ABCA1 × ST3GAL4 loci were strongest (p-values for the interactions varied from ≤0.0009 to 0.0001). In addition, highly significant interactions influencing C-LDL were also found between ST3GAL4 and ABCA1, SCARB1, PLTP, PSKH1, and NPC1L1 (p ≤ 0.009-0.001). Epistatic interactions between twelve genes such as ABCA1, APOC1, CETP, F2, GALNT2, NPC1L1, PLTP, ST3GAL4, STARD3, ZNF648, SCARB1, and PPP1R3B were established to impact the levels of high density lipoprotein cholesterol in plasma of CAD patients. In general, epistatic interactions involving the F2 and GALNT2 loci (p = 0.002) as well as SCARB1 and ST3GAL4 (p = 0.0003) that associated with the levels of HDL were strong. Notably, the F2 and APOC1 gene polymorphisms showed interactions with multiple studies SNPs that jointly affect HDL levels in plasma. Epistatic interactions between 15 polymorphic genes (ABCA1, APOC1, CETP, F2, GALNT2, LILRA3, LPA, NPC1L1, PLTP, PSKH1, ST3GAL4, STARD3, ZNF648, SCARB1, and PPP1R3B) were characterized by significant effects on plasma triglycerides. Individual effects of SNPs on triglycerides in CAD patients were also observed for polymorphisms of APOC1 (p = 0.04) and of NPC1L1 (p = 0.03). As can be seen from Figure 1, two groups of interacting genes were found to be associated with TG level in plasma: the first group included PLTP, APOC1, NPC1L1, ST3GAL4, and ABCA, whereas the second group included F2, PSKH1, CETP, LPA, SCARB1, ZNF648, LILRA3, GALNT2, and PPP1R3B. It is noteworthy that these groups interacted to each other through the STARD3 polymorphism. Interactions between thirteen loci such as ABCA1, APOC1, CETP, F2, GALNT2, LILRA3, LPA, NPC1L1, PLTP, ST3GAL4, ZNF648, SCARB1, and PPP1R3B contributed to carotid intima-media thickness in patients with coronary artery disease. Polymorphic variants of the APOC1 (p = 0.04), ST3GAL4 (p = 0.01), ZNF648 (p = 0.01), and PPP1R3B (p = 0.01) genes showed individual effects on CIMT. The epistatic interactions were observed between LPA and LILRA3 (p = 0.009), PLTP and ST3GAL4 (p = 0.001), as well as between ZNF648 and SCARB1 (p = 0.002). As can be seen from the interactome network of CIMT (Figure 1), three independent "chains" of interacting genes such as F2 × NPC1L1 × PPP1R3B × CETP × LILRA3, F2 × PLTP × ST3GAL4 × LILRA3 and F2 × SCARB1 × ZNF648 × GALNT2 × LPA × LILRA3 were associated with carotid intima-media thickness in CAD patients. Moreover, we found that the genetic susceptibility to CAD is associated with complex interactions between 14 polymorphic genes including ABCA1, APOC1, CETP, F2, GALNT2, LPA, NPC1L1, PLTP, PSKH1, ST3GAL4, STARD3, ZNF648, SCARB1, and PPP1R3B. Polymorphisms of APOC1 (p = 0.01), F2 (p = 8 × 10 −8 ), LPA (p = 0.001), and PLTP (p = 0.0009) showed significant individual effects on the predisposition to coronary artery disease. The tight gene-gene interactions underlying CAD susceptibility comprised GALNT2, ST3GAL4, ABCA1, ZNF648, STARD3, PPP1R3B, and PLTP as well as APOC1 and CETP. In particular, PPP1R3B interacted with SCARB1 (p = 0.008) and PLTP (p = 0.005) as well as PSKH1 (p = 0.02). ST3GAL4 was found to be associated with SCARB1 (p = 0.01), LPA (p = 0.02), and PLTP (p = 0.01), whereas the F2 locus was associated with CETP (p = 0.03), ZNF648 (p = 0.03), and NPC1L1 (p = 0.04) gene polymorphisms. Thus, despite the fact that plasma lipids and CIMT are associated with coronary artery disease, the structure of interactions between genes that determine the studied cardiovascular phenotypes differed significantly.

Modeling for Gene-Gene and Gene-Environment Interactions Determining the Cardiovascular Phenotypes
Multifactor Dimensionality Reduction (MDR) is one of the most popular bioinformatics methods for nonparametric analysis of gene-gene and gene-environment interactions, allowing simultaneously assess to numerous variables, including gene polymorphisms, qualitative and quantitative phenotypes by reducing the dimension of the number of calculated parameters, thereby facilitating the detection of non-linear or non-additive interactions among the assessed attributes [32,33]. In the present study, we used the modelbased multifactor dimensionality reduction (mbmdr) method allowing to detect multiple sets of significant gene-gene (G × G) and/or gene-environment (G × E) interactions in relation to a trait of interest [30,31]. Two-, three-, and four-order G × G and G × E models, including combinations of SNPs and smoking habits, were analyzed using the statistical package mbmdr for R, version 3.5.3 to identify associations with the levels of plasma lipids, carotid intima-media thickness, and the risk of coronary artery disease were analyzed. Six two-order, 39 three-order, and 147 four-level statistically significant MDR models (P perm < 0.05) associated with the normalized blood cholesterol levels were established. The best n-order mbmdr models associated with TC, C-LDL, and CAD risk are summarized in Tables 9-11, respectively. To prioritize the attributes associated with each cardiovascular phenotype, we counted the number of n-models in which each attribute was involved, and the resulted value was considered as a measure of the contribution of an attribute (a variable such as SNP or smoking status) to the polygenic background, as estimated by the mbmdr method. Figure 2 presents diagrams illustrating the contribution (%) of each attribute to the cardiovascular phenotypes (attributes are presented in descending order of the number of models in which attributes are involved). In particular, the leading SNPs contributing to the levels of total cholesterol in plasma of CAD patients were GALNT2 (16%), LPA (11%), SCARB1 (10%), APOC1 (8%), LILRA3 (8%), ST3GAL4 (6%), COBLL1 (6%), and PSKH1 (5%). Thus, 70% of the identified G × G mbmdr models associating with plasma level of TC comprised the above eight polymorphisms. It is noteworthy that the number of G × E models (SNP-smoking interactions) associated with TC was relatively small (1.7% for three-order and 4.9% for four-order mbmdr models). A total of 39 two-order, 281 three-order, and 1704 four-order GxG and G × E models were found to affect the normalized levels of LDL-C. As can be seen from Figure 2, the total proportion of models involving smoking status (12%) and ABCA1 (8%), CETP (7%), NPC1L1 (7%), and PLTP (6 %) loci was 39%. Notably, the quantity of mbmdr models comprising smoking status was 11% for 4-order models, 13% for 3-order models and 21% for 2-order models suggesting synergic effects tobacco smoking and these SNPs on the atherogenic fraction of lipoproteins. In total, 37 two-order, 183 three-order, and 624 four-order G × G and G × E models significantly associated with normalized HDL-C levels were identified. A significant proportion of mbmdr models included smoking habit: 16% for 4-order models, 15% for 3-order models, and 30% for 2-order models. In addition, 55% of the GxG and G × E models involved interactions between CETP, PLTP, ST3GAL4, NPC1L1, PSKH1, F2, ABCA1, PPP1R3B, and APOC1 gene polymorphisms. A majority of mbmdr models associated with normalized levels of plasma triglycerides comprised interactions between ABCA1 (9%), LPA (8%), STARD3 (7%), PLTP (7%), PSKH1 (7%), SCARB1 (6%), NPC1L1 (6%), APOC1 (6%), and ST3GAL4 (5%) loci. The mbmdr method allowed establishing 14 two-order, 45 three-order, and 73 four-order GxG and G × E models associated with normalized CIMT. About 70% of G × G and G × E models associated with CIMT included interactions between polymorphisms of ST3GAL4 (15%), GALNT2 (10%), COBLL1 (9%), PSKH1 (9%), APOC1 (8%), PPP1R3B (8%), LILRA3 (6%), and PLTP (6%) genes. Smoking status was present in 11% of the two-order, 6% of the three-order, and 7% of the four-order models. As can be seen from Table 11, the best models of G × G and G × E interactions associated with the risk of coronary artery disease included rs3136441 of F2, rs55730499 of LPA, rs881844 of STARD3, rs838880 of SCARB1, and rs12328675 of COBLL1 gene polymorphisms. Sixteen SNPs such as F2 (9%), LPA (8%), STARD3 (7%), SCARB1 (6%), PLTP (6 %), PPP1R3B (6%), ABCA1 (6%), APOC1 (6%), GALNT2 (6%), COBLL1 (6%), CETP (5%), ZNF648 (5%), LILRA3 (5%), ST3GAL4 (5%), PSKH1 (5%), and NPC1L1 (4%) showed a comparable contribution to CAD susceptibility. Figure 2 shows the contribution of the studied lipid-associated GWAS loci and smoking to the polygenic mechanisms underlying cardiovascular phenotypes such as plasma lipids, CIMT, and coronary artery disease. A percentage of contribution for each factor was estimated as a proportion of mbmdr models in which a particular risk factor is involved.
A comparative contribution of each SNP and smoking status to the cardiovascular phenotypes is summarized in Figure 3. A substantial portion of associated mbmdr models was found for CAD. LDL-C came in the second place in terms of the number of identified mbmdr models. LDL-C is followed by high-density lipoprotein cholesterol and triglyceride levels. The smallest number of models was found regarding associations with carotid intimamedia thickness. These findings may indicate the fact that the studied gene polymorphisms discovered by GWAS as lipid-associated loci are tightly linked to the development of coronary artery disease itself by atherogenic changes in low-density lipoprotein cholesterol rather than through other studied cardiovascular phenotypes.  NH-number of interacting genotypes / high-risk environmental factors; β-H-regression coefficient for high-risk interactions identified in step 2 of the analysis. WH-Wald statistics for high-risk interactions; NL is the number of interacting genotypes / low risk environmental factors; β-L-regression coefficient for low-risk interactions identified in step 2 of the analysis; WL-Wald statistics for low-risk interactions. P perm -permutation significance levels for the models (all models are adjusted for sex, age, and BMI). Mbmdr, model based multifactor dimensionality reduction method [30,31]. Statistically significant p-values are bolded. NH-number of interacting genotypes/high-risk environmental factors; β-H-regression coefficient for high-risk interactions identified in step 2 of the analysis. WH-Wald statistics for high-risk interactions; NL is the number of interacting genotypes/low risk environmental factors; β-L-regression coefficient for low-risk interactions identified in step 2 of the analysis; WL-Wald statistics for low-risk interactions. P perm -permutation significance levels for the models (all models are adjusted for sex, age, and BMI). Mbmdr, model based multifactor dimensionality reduction method [30,31]. Statistically significant p-values are bolded. NH-number of interacting genotypes/high-risk environmental factors; β-H-regression coefficient for high-risk interactions identified in step 2 of the analysis. WH-Wald statistics for high-risk interactions; NL is the number of interacting genotypes/low risk environmental factors; β-L-regression coefficient for low-risk interactions identified in step 2 of the analysis; WL-Wald statistics for low-risk interactions. P perm -permutation significance levels for the models (all models are adjusted for sex, age, and BMI). Mbmdr, model based multifactor dimensionality reduction method [30,31]. Statistically significant p-values are bolded.

Functional Annotation of the Studied Gene Polymorphisms
Numerous bioinformatics tools for functional SNP annotation have been developed to investigate putative molecular mechanisms by which gene polymorphisms are involved into the pathogenesis of common diseases like coronary artery disease [37,38]. In the present study, a variety of bioinformatics tools and internet resources were utilized for functional annotation of lipid-associated GWAS loci. Summary of data on the regulatory potential of the studied gene polymorphisms is shown in Table 12. As can be seen from Table 12, a majority of the studied SNPs possess a regulatory potential and represents expression quantitative trait loci (eQTLs). In particular, the regulatory potential was identified for ABCA1, GALNT2, NPC1L1, STARD3, and PPP1R3B polymorphisms, as in silico assessed by the FuncPred tool. Nine SNPs are associated with the levels of gene expression in coronary arteries and/or aorta. In particular, alternative alleles of CETP (p = 0.0063), F2 (p = 0.0019), LILRA3 (p = 4.9 × 10 −6 ), NPC1L1 (p = 0.0007), PLTP (p = 0.005), PSKH1 (p = 0.029), and STARD3 (p = 0.002) gene polymorphisms are associated with a decrease in gene expression, while the rs11220463-T allele at (p = 0.036) is associated with an increase in expression of the ST3GAL4 gene. According to the rSNPbase database, many studied polymorphisms (except APOC1, LILRA3, LPA, and ZNF648) are found to be regulatory single nucleotide polymorphisms (rSNPs). Moreover, all the polymorphisms are in the linkage disequilibrium with at least one rSNPs in the genome. Notably, PSKH1, STARD3, PPP1R3B, F2, ST3GAL4, GALNT2, PLTP, LILRA3, ABCA1, and NPC1L1 polymorphisms are linked to at least 10 rSNPs. The bioinformatics tool atSNP search predicted allele-specific effects of SNPs on the binding affinity of numerous transcription factors. Predicted transcription factors (TF) and their molecular effects (activator and repressor) are summarized in Supplementary Table S1. It is predicted that the alternative alleles of all SNPs create binding sites for numerous TFs (TFBS, transcription factor binding sites) that may modulate the transcriptional activity of genes. For instance, allele rs55730499-T associated with increased level of LDL-C and the risk of coronary artery disease was predicted to create binding sites for transcription factors acting as activators (NR1H, AHR::ARNT, IRF) or repressors (HAND1) of gene expression. In addition, several SNPs showed epigenetic regulatory potential. As it can be assessed with the regulatory score of RegulomeDB, the PLTP and STARD3 loci have a score of 2b (TFBS, DNA-protein interaction site, and DNAse hypersensitivity), GALNT2, NPC1L1, PSKH1 ABCA1, APOC1, F2, LILRA3, SCARB1, and PPP1R3B loci have scores 4 or 5 (TFBS and DNAse hypersensitivity site). The analysis of ENCODE data showed that chromatin remodeling, in particular, chemical modification of histone H3K4me3 can regulate the access of DNA for transcriptional regulation of PSKH1 in the aorta and ZNF648 in adipose tissue genes through their nucleotide sequences surrounding SNPs rs16942887 and rs1689800, respectively. Histone modification H3K27ac might be associated to the transcriptional activity of the CETP gene in the subcutaneous adipose tissue, as well as the PSKH1, ZNF648, and SCARB1 genes in the aorta and subcutaneous adipose tissue. Interestingly, both histone marks are found to be active enhancers regulating gene expression [39,40]. Analysis of the epigenomic data from the Roadmap Epigenomics Project allowed identifying histone modifications may affect transcriptional activity of genes in the aorta and vascular endothelium through the polymorphisms of COBLL1, STARD3, and PPP1R3B.

Enrichment Analysis of Regulatory Gene Networks in Which the Studied Gene Polymorphism Might Be Involved
Characterization of regulatory gene networks controlling the expression of genes that were identified to increase the risk of coronary artery disease has a great importance for understanding the polygenic mechanisms of the disease pathogenesis. Pursuing this interest, we first analyzed whether the sets of transcription factors that were in silico predicted to bind DNA surrounding the gene polymorphisms represent the pathways that are known to be involved in the pathogenesis of atherosclerosis. , and adipogenesis (WP236, FDR = 0.008). Importantly, some sets of SNP-related TFs enriched with the pathways that are involved in the regulation of lipid metabolism as well as in the pathogenesis of atherosclerosis, along with the terms indicating the impact on the transcriptional activity of a gene. In particular, the rs1689800-G allele of ZNF648 that was predicted to create binding sites for TFs enriched with a term nuclear receptors in lipid metabolism and toxicity (WP299, FDR = 0.02), whereas the rs9987289-G allele of PPP1R3B was associated with TFs that enriched with a term MAPK targets/nuclear events mediated by MAP kinases (R-HSA-450282, FDR = 0.04). Thus, the enrichment analysis showed that transcription factors that were in silico predicted to bind DNA motifs in the presence of CAD-associated alleles represent as transcriptional regulators capable to enhance expression of target genes as well as are involved in the regulation of lipid metabolism, inflammation and other pathways related with disease pathogenesis. Bioinformatics tools implemented in the STRING online resource (https://string-db.org, accessed on 2 July 2019) were used to visualize regulatory networks that may operate by transcription factors whose binding sites fall into the regions of the studied SNPs in the presence of alleles associated with cardiovascular phenotypes. Figure 4 shows interactomic networks of transcription factors associated with the risk alleles at SNP rs4420638 of APOC1 ( Figure 4A

Summary of the Study Findings and Their Comparison with Literature
The present study was designed to perform a comprehensive statistical and bioinformatics analysis to assessing the relationship between 16 single nucleotide polymorphisms that were established as markers associated with plasma lipids by genome-wide association studies (lipid-related GWAS loci) and intermediate cardiovascular phenotypes such as carotid intima-media thickness, atherogenic lipid profile as well as susceptibility to coronary artery disease. Our interest is justified by the fact that a limited number of GWAS studies have been done to assess the contribution of lipid-related loci to the risk of coronary artery disease, and few studies have performed the functional annotation of these polymorphisms, thereby complicating our understanding the mechanisms by which these loci might be involved in the pathogenesis of coronary atherosclerosis. We found that four polymorphisms such as rs4420638 of the APOC1 gene, rs3136441 of the F2 gene, rs55730499 of the LPA gene, and rs6065906 of the PLTP gene were significantly associated with the risk of coronary artery disease regardless of sex, age, and body mass index. Previously, several large studies found that SNP rs4420638 APOC1 is associated with a variety of cardiometabolic and other phenotypes such as an increase in low-density lipoprotein cholesterol [11,25], increased activity of lipoprotein-associated phospholipase A2 [41], triglyceride levels [42,43], total cholesterol [44], decreased HDL cholesterol [43], increased risk of type 2 diabetes [45], Alzheimer's disease [46,47], and age-related macular degeneration [48]. Moreover, allele rs4420638-G of APOC1 was found to be associated with an increased risk of CAD Europeans [43], and then it was confirmed in a large metaanalysis [9]. Thus, the present study confirmed the association of SNP rs4420638 APOC1 with an increased risk of CAD in a population of Central Russia.
It is known form the literature, allele rs3136441-C of the F2 gene is associated with an increase in HDL-C [8]. We found that the SNP is associated with plasma levels of total cholesterol exclusively in smokers. Our study is actually the first to show the relationship between SNP rs3136441 of the F2 gene and the development of coronary artery disease, as well as the level of total cholesterol, LDL-C and HDL-C. Allele rs55730499-T of the LPA gene is known to be associated with increased levels of lipoprotein (a) [26], coronary artery disease [9,49], and a short life expectancy [50,51]. We also confirmed the association of allele rs55730499-T of LPA with an increased risk of CAD in the Russian population. In addition, significant associations of this SNP with an increased level of TC and LDL-C and with a decreased level of HDL-C and TG have been identified in our study for the first time. It is known from the literature that SNP rs6065906 at PLTP is associated with the levels of triglycerides and HDL-C [8,44]. We also confirmed an association of this SNP with the level of triglycerides. Furthermore, our study revealed, for the first time, that the rs6065906-T of PLTP is associated with the increased risk of coronary artery disease. According to the summary data of GWAS catalog (https://www.ebi.ac.uk/gwas/, accessed on 2 July 2019), the rs3764261-A allele located near the CETP and HERPUD1 genes is associated with an increase in the HDL cholesterol, a decrease in LDL-C and TG [8], and an increase in total cholesterol in plasma [52]. We also observed the association of allele rs3764261-A with increased levels of total blood cholesterol, but this finding occurred only in smokers. However, in contrast to previously published data, the rs3764261-A allele is associated with an increased level of LDL-C and a decreased level of HDL-C, indicting a discrepancy between ours and other studies. As can be seen from the literature, rs4846914-G allele of the GALNT2 gene is associated with a decrease in HDL cholesterol levels and an increase in blood TG [25,44], whereas we observed an association of this SNP with an increase in total blood cholesterol levels. We also observed that the rs9987289-G allele of PPP1R3B is associated with increased levels of blood TC, a finding that is consistent with the results obtained by Willer with co-workers [8]. In addition, SNP rs9987289 at PPP1R3B is known to be associated with LDL-C and HDL-C levels [8]. We found that polymorphism rs16942887 of the PSKH1 gene is associated with increased levels of total cholesterol in plasma exclusively in smokers, and this association is found for the first time. It is known that rs16942887 of the PSKH1 gene is associated with HDL-C and TG levels [44,53]. We also observed a relationship between the rs838880-C allele of the SCARB1 gene and increased levels of TC and LDL-C, the findings that are partially concordant with previously published papers. In particular, it is known that allele rs838880-C is associated with an increase in TC [54] and HDL cholesterol in plasma [8]. The present study also showed that SNP rs1883025 of the ABCA1 gene is associated with an increase in LDL-C levels in smokers. According to the study of Hoffmann with co-authors [44], polymorphism rs1883025 SNP is also associated with LDL-C, but the association has not been investigated in subgroups of patients stratified by smoking status. SNP rs1883025 of the ABCA1 gene is known to be associated with the levels of TC [12] and HDL-C [8]. In addition, the present study was the first to investigate relationship between the lipid-related GWAS loci and carotid intima-media thickness in patients with coronary artery disease. In particular, we found that the rs1689800-G allele of the ZNF648 gene (locus LINC01344) is associated with an increase in LDL-C levels in smokers and increased CIMT. Interestingly, allele rs1689800-G was found to be associated with decreased levels of HDL cholesterol [8,12]. Meantime, the present study showed that polymorphisms such as rs217406 NPC1L1 and rs881844 STARD3 are associated with a decreased CIMT, but exclusively in non-smoker patients. The rs217406-G allele of NPC1L1 was also linked to a decrease in blood TG levels in our study and this association was also revealed in a study of Willer with coworkers [8]. Polymorphism rs881844 at STARD3 associated with CIMT was also associated with decreased levels of HDL-C [44,55]. We also found, for the first time, SNPs rs12328675 of COBLL1 and rs11220463 of ST3GAL4 are associated with a decreased CIMT. SNP rs12328675 at COBLL1 is known to be associated with the levels of HDL-C [8] and triglycerides [44].

The Contribution of Gene-Gene Interactions to the Studied Cardiovascular Phenotypes
The present study showed for the first time that the interactions between lipid-related GWAS loci contribute to the risk of coronary artery disease through the characteristic changes in the lipid profile and carotid intima-media thickness. An importance of SNP-SNP interactions contributing to cardiovascular phenotypes was clearly demonstrated by the mbmdr method allowed identifying numerous significant models of gene-gene interactions associated with plasma lipid parameters, CIMT and coronary artery disease risk. These findings indicate the existence of epistatic interactions between lipid-associated GWAS loci, a situation when the effect of one gene may not be disclosed if the effect of another gene is not considered [56,57]. The most illustrative example for epistasis may be shown by polymorphism rs386000 of the LILRA3 gene: this SNP was associated with none of the cardiovascular traits alone; however, this locus was present in a majority of GxG and GxE mbmdr models associated with the level of total blood cholesterol. Epistatic interactions between ABCA1, APOC1, CETP, F2, GALNT2, LILRA3, LPA, PSKH1, ST3GAL4, STARD3, ZNF648, SCARB1, and PPP1R3 contributed to the level of total blood cholesterol level. At the same time, interactions between fourteen polymorphisms at genes such as ABCA1, APOC1, CETP, F2, GALNT2, LILRA3, LPA, NPC1L1, PLTP, PSKH1, ST3GAL4, STARD3, ZNF648, and SCARB1 were associated with atherogenic changes in the levels of cholesterol of low density lipoproteins. Epistatic interactions between thirteen loci such as ABCA1, APOC1, CETP, F2, GALNT2, LILRA3, LPA, NPC1L1, PLTP, ST3GAL4, ZNF648, SCARB1, and PPP1R3B contribute to the carotid intima-media thickness in coronary artery disease patients. Although the above sets of genes were quite similar across the cardiovascular phenotypes, there were substantial differences in the structure and strength of interactions between the loci concerning particular phenotypes. The biological interpretation of such complex interactions between the genes is quite difficult and may reflect different biological phenomena, such as physical interactions between proteins, linkage disequilibrium between loci, interactions at the level of metabolism, co-expression of genes in a certain tissue, as well as shared gene regulatory networks by which transcription factors may modulate expression of genes. Nevertheless, bioinformatics methods allowed prioritizing SNP combinations whose joint effects may particularly explain the molecular mechanisms by which the lipid-associated GWAS loci are involved in the pathogenesis of coronary atherosclerosis. About 70% of G × G and G × E interaction models associated with the level of total blood cholesterol in CAD patients comprised interactions between eight loci such as GALNT2, LPA, SCARB1, APOC1, LILRA3, ST3GAL4, COBLL1, and PSKH1 and certain literature data confirms the roles of these genes in the regulation of cholesterol metabolism. In particular, several studies have shown that genes such as LPA [44], SCARB1 [54], APOC1 [8], and ST3GAL4 [44] may affect blood cholesterol levels. In addition, it is known that GALNT2, LPA, SCARB1, and APOC1 are genes responsible for a hereditary form of hypercholesterolemia [58]. Interestingly, interactions between ABCA1, CETP, NPC1L1, PLTP, and cigarette smoking were associated with the levels of LDL-C. The STRING tools allowed revealing that these four proteins whose interactions were associated with the levels of LDL-C are enriched with Gene Ontologies such lipid transport (GO: 0006869), lipid metabolic process (GO: 0006629). Moreover, ABCA1, CETP, and PLTP were enriched with terms such as regulation of cholesterol efflux (GO: 0010874), plasma lipoprotein particle organization (GO: 0071827) and regulation of plasma lipoprotein particle levels (GO: 0097006). ABCA1, CETP, and NPC1L1 were enriched with GO terms such as cholesterol transport (GO: 0030301) and cholesterol metabolic process (GO: 0008203). Interactions between genes such as ABCA1, CETP, NPC1L1, PLTP, ST3GAL4 PSKH1, APOC1, and smoking habit were found to influence the level of HDL-C in CAD patients. Apparently, the link of APOC1 to these set of genes is important for the regulation of HDL-C levels since the antiatherogenic effect of this gene is attributed to the suppression of CETP and activation of esterified lecithin cholesterol, which plays an important role in the metabolism of esterified cholesterol between lipoproteins and cholesterol transport from peripheral tissues [59].
We observed that 70% of GxG and G × E interaction models associated with CIMT comprised polymorphic variants of the ST3GAL4, GALNT2, COBLL1, PSKH1, APOC1, PPP1R3B, LILRA3, and PLTP genes. rs12040273, another SNP of the GALNT2 gene and the activity of plasma PLTP, have been found to be associated CIMT [60,61]. Apparently, the joint effect of GALNT2 and PLTP loci on CIMT might be explained by shared pathways, in particular those regulating lipid metabolism and transport, lipid deposition in the intima of arteries, and vascular inflammation, i.e., processes that are known to influence carotid intima-media thickness [62,63]. This assumption may be supported by the finding of pleiotropic effects of some lipid-associated GWAS loci (i.e., rs11220463 ST3GAL4, rs4420638 APOC1, rs9987289 PPP1R3B and rs6065906 PLTP) on the levels of plasma lipids (i.e., TC, LDL-C, HDL-C, and TG), as well as on the levels of C-reactive protein as an inflammation marker of atherosclerosis [64]. It is important to note that the interactions between F2 rs3136441, LPA rs55730499, STARD3 rs881844, SCARB1 rs838880, and COBLL1 rs12328675 showed a strong contribution to the risk of coronary artery disease. It is quite obvious that each of these genes can determine independently the predisposition to coronary artery disease. For example, it is well known that polymorphic variants at the F2 gene (coagulation factor II or thrombin) are associated with thrombosis [65,66] as well as with changes in the level of HDL-C [8,44]. LPA is a known independent risk factor for coronary artery disease [67], and its polymorphisms have been associated with atherogenic changes of lipid metabolism. In addition, SCARB1 may play a role as a receptor for LPA, which was found to mediate the binding and excretion of lipoprotein (a) [68]. It is known that polymorphisms of the STARD3 gene are associated with the level of HDL-C [44,55], triglycerides [69], and the increased expression of STARD3 is responsible for the antiatherogenic lipid phenotype [70]. SNPs of the COBLL1 gene are known to be associated with various lipid-associated phenotypes, including BMI [71], waist-hip ratio [72], HDL-C levels [8], TG levels [55], and each of them is considered to be a risk factor for CAD.

Functional Effects of the Studied SNPs and Their Link to the Pathogenesis of Coronary Artery Disease
The preset study was the first to perform a comprehensive statistical and bioinformatics analysis aiming to better understand the mechanisms by which the established lipid-related GWAS loci are involved in the molecular mechanisms of coronary artery disease. We identified that almost all the studied SNPs, despite being located in non-coding sequences or intergenic spacers, represent functional regions of the genome that may affect gene expression in a tissue-specific manner through various molecular mechanisms. The studied lipid-associated GWAS polymorphisms may possess by own functional effects or their effects might be attributed to rSNPs that are being in the close linkage disequilibrium. Notably, a total of 393 rSNPs have been identified as to be linked to the lipid-associated GWAS loci, and these regulatory polymorphisms are located in genes that play a role in the pathogenesis of atherosclerosis. In particular, SNP rs4420638 of APOC1 is linked to PVRL2 (nectin cell adhesion molecule 2), TOMM40 (translocase of the outer mitochondrial membrane 40), and APOC4 (apolipoprotein C4). SNP rs3136441 of the F2 gene is in a complete linkage disequilibrium with genes such as ARHGAP1 (Rho GTPase activating protein 1), LRP4 (protein related to LDL receptor 4), ATG13 (protein associated with autophagy 13), CKAP5 (protein associated with cytoskeleton 5), ZNF408 (zinc finger protein 408), and SNORD67 (small nucleolar RNA, C/D box 67). SNP rs6065906 of the PLTP gene is linked to genes such as MMP9 (matrix metallopeptidase 9), CTSA (cathepsin A), ZNF335 (zinc finger protein 335), PCIF1 (C-terminal inhibitory factor 1 PDX1), ZSWIM1 (SWIM-type zinc finger protein 1), SPATA25 (protein associated with spermatogenesis 25), NEURL2 (neuralized U3 protein ubiquitin ligase 2), ZSWIM3 (SWIM-type 3 zinc finger protein), FTLP (ferritin light chain 1 pseudogene), and ACOT8 (acyl-CoA thioesterase 8).
We identified that a half of the studied loci are cisor trans-eQTLs, which are genome loci affecting gene expression [73]. Nine SNPs such as rs3764261 of CETP, rs12328675 of COBLL1, rs3136441 of F2, rs386000 of LILRA3, rs217406 of NPC1L1, rs6065906 of PLTP, rs16942887 of PSKH1, and rs11220463 of ST3GAL4 are associated changes in gene expression in arteries and/or aorta, thereby demonstrating their putative pathogenetic involvement in atherosclerosis. The presence of trans-eQTLs for SNPs rs3764261 of CETP, rs12328675 of COBLL1, rs3136441 of F2, rs55730499 of LPA, rs217406 of NPC1L1, rs6065906 of PLTP, rs16942887 of PSKH1, and rs881844 of STARD3 (Supplementary Table S2) may suggest that these genes are co-expressed with genes located at other regions of the genome presumably due to the shared gene regulatory networks operating by common transcription factors.
It has been in silico predicted that many of lipid-associated GWAS loci represent the binding sites for a wide range of transcription factors capable to modulate gene expression in an allele-specific manner. Putative TFs for SNPs rs55730499 of LPA, rs4420638 of APOC1, and rs3136441 of F2 were subjects of great interest because these loci were associated with coronary artery disease. As an example, Figure 5 summarizes the transcription factors whose binding sites are predicted at the intron region surrounding SNP rs55730499 of the LPA gene. An importance of this SNP for disease pathogenesis was demonstrated by two studies that observed that allele rs55730499-T of LPA is associated with an increased level of lipoprotein (a) and cholesterol of low-density lipoproteins, as well as the risk of coronary artery disease [10] and decreased life expectancy [51]. We predicted that there exist at least three transcriptional factors such as NR1H, AHR:ARNT, and IRF whose binding sites might be created by the rs55730499-T allele and enhance the expression of the LPA gene. Interestingly, these TFs are known as regulators of lipid metabolism, including the activation of lipogenesis, the formation and release of HDL-C by the liver, β-oxidation of fatty acids, the release of the remnants of LDL-C and LDL-C by the liver, the formation of VLDL-C, and activation of lipoprotein lipase [74,75]. Furthermore, an in silico predicted transcription factor NR1H4 is known to impact the expression of cytokine genes underlying inflammatory mechanisms of atherosclerosis [76]. The molecular complex AHR (UniProtKB-P35869):ARNT (UniProtKB-P27540) might act as another transcriptional activator affecting expression of the LPA gene. The AHR:ARNT complex is known to be involved in the ligand-induced cascade activation of genes encoding phase I and II biotransformation enzymes [77,78] as a result of environmental exposure to polycyclic and halogenated aromatic hydrocarbons that are chemical pollutants. Notably, it has been found by numerous studies [79][80][81] that chronic exposure to such chemicals is associated with the prevalence of atherosclerosis [82]. We suggest that the atherogenic up-regulation of LPA gene expression might be due to the binding of complex AHR:ARNT with a region surrounding SNP rs55730499 in the presence of allele-T associated with CAD. Transcription factor IRF3 (UniProtKB-Q14653) is involved in the transcriptional regulation of interferon α and β genes controlling the innate immune response to DNA and RNA viruses [83]. Transcriptional repressor HAND1 was predicted to bind and suppress expression of the LPA gene through the binding with SNP rs55730499 in the presence of the T allele. HAND1 is known to be involved in the transcriptional regulation of trophoblast cell differentiation, cardiac morphogenesis, and angiogenesis (UniProtKB-O96004). Allele of rs55730499-C that showed protective effect against CAD susceptibility was predicted to increase an affinity to binding to six TFs such as androgen receptor AR (UniProtKB-P10275), involved in transcriptional regulation of gene expression by suppressing androgen-induced signal transduction and cell proliferation, E2F2 (UniProtKB-Q14209) activating of genes involved into the cell cycle and DNA replication, and ETF3 (UniProtKB-O00716) also activating cell cycle and DNA replication genes as well as suppressing adipogenesis. Other predicted TFs included GCM2 (UniProtKB-O75603), a chorion-specific transcriptional activator important for the development of parathyroid glands, and NR3C1 (UniProtKB-P04150), a glucocorticoid hormone receptor acting as a transcription factor and modulator of gene expression that affects inflammatory response, chromatin remodeling, and suppresses adipogenesis through the regulation of lipolysis and lipogenesis. Transcriptional repressor REST was in silico predicted to bind with SNP rs55730499 in the presence of allele C. REST is known to be a key repressor of gene expression in hypoxia conditions by chromatin modifications that influence ischemia-induced neuronal death (UniProtKB-Q13127). We suppose that a loss of the binding site for REST due to the C > T nucleotide substitution removes the repressive "antiatherogenic" effect of this TF on expression of the LPA gene. Notably, the transcriptome data of the Framingham study [84] showed that the levels of REST expression in coronary arteries were decreased in patients with coronary artery disease, than in healthy individuals. This finding suggests the loss of binding site for REST at SNP rs55730499 may explain a loss of the "antiatherogenic" effect of this TF on the expression of the LPA gene. CTCF (UniProtKB-P49711) is another predicted repressor at SNP rs55730499, a chromatin-binding transcription factor regulating expression of the APOA1/C3/A4/A5 gene cluster and also HLA genes of class II [85]. Finally, transcription factor THAP1 (UniProtKB-Q9NVV9) that was also in silico predicted to bind the rs55730499 SNP region is known to regulate endothelial cell proliferation and G1/S cell cycle progression, biological processes playing a role in angiogenesis [86].
It is known that APOC1 is an important player in the metabolism of high density lipoproteins and very low-density lipoproteins (VLDL-C). The main function of APOC1 is an inhibition of the cholesterol ester transfer protein (CETP). APOC1 is also responsible for the activation of esterified cholesterol lecithin, which is responsible for exchanging of cholesterol between lipoproteins and its removal from peripheral tissues [87]. "Atherogenic allele" rs4420638-G of APOC1 is recognized to be linked to the increased levels of lowdensity lipoprotein cholesterol [8,25], total cholesterol, and triglycerides [8,11,88], activity of lipoprotein-associated phospholipase A2 [41], as well as with life expectancy [89]. We predicted that allele rs4420638-G may create DNA binding sites for eleven transcriptional activators and two transcriptional repressors of APOC1. The predicted activators included EGR1, ETS1, NFE2L2, E2F1, and ETS2. EGR1 (UniProtKB-P18146) plays an important role in regulating a response of the target genes to growth factors, DNA damage, and ischemia. It also participates in the regulation of cell survival and proliferation as well as is found to mediate cellular response to ischemia and hypoxia via the regulation of expression of IL1B and CXCL2, thereby contributing to tissue inflammation and postischemic damage. ETS1 (UniProtKB-P14921) is a transcription factor directly controlling the expression of genes encoding cytokines and chemokines, and regulating angiogenesis through the control for migration and invasion of endothelial cells. NFE2L2 (UniProtKB-Q16236) is a transcriptional activator binding to DNA motifs (so-called ARE-elements) at the promoter of target genes involved in antioxidant defense of cells against oxidative stimuli. In fact, NFE2L2 is required for the coordinated activation of genes in the response to oxidative stress as well as for maintaining cellular redox homeostasis. E2F1 (UniProtKB-Q01094) is an activator of genes involved in the regulation of cell cycle and DNA replication, cell proliferation and TP53-dependent apoptosis. E2F1 blocks adipocyte differentiation by binding to specific promoters and inhibiting the binding of CEBPA to its target genes. ETS2 (UniProtKB-P15036) is a transcription factor that regulates expression of genes involved in human development and apoptosis. Experimental studies show that ETS2 determines the inflammatory state of endothelial cells in the severe atherosclerotic lesions of arteries [90].
The F2 gene encodes coagulation factor II or thrombin, converting fibrinogen to fibrin and activates factors V, VII, VIII, XIII, and, in complex with thrombomodulin [91]. Moreover, F2 is a powerful vasoconstrictor and mitogen, which is the main factor contributing to vascular spasm. Thrombin is characterized by strong pro-inflammatory actions that are important for the development and progression of atherosclerosis. Acting through membrane receptors such as PAR-1, PAR-3, and PAR-4 expressed in the arterial wall, thrombin has atherogenic effects, including modulation of inflammation, migration of leukocytes into atherosclerotic plaque, increased oxidative stress, proliferation of vascular SMCs, apoptosis and angiogenesis [92,93]. The rs3136441-C allele the F2 gene, which was found to be protective against coronary artery disease, showed association with increased levels of high-density lipoprotein cholesterol [8], and decreased expression of the F2 gene. Two transcription repressors such as BHLHE23 and CBX5 were in silico predicted at SNP rs3136441 in the presence of allele-C. Interestingly, expression of the BHLHE23 gene can be suppressed by environmental chemicals, including those found in the tobacco smoke [94], polycyclic aromatic hydrocarbons [95], biphenol A [96], acrylamide [97], dietary fats [98], and many of them are well known risk factors for atherosclerosis [99]. Furthermore, the atSNP tool allowed predicting that five transcription activators such as E2F2 (UniProtKB-Q14209), ETS1 (UniProtKB-P14921), ETS2 (UniProtKB-P15036) and HNF4BA (P15036) and HNF4BA (P15036), and HNF4BA (UniProt35) have an affinity for binding to a DNA motif in the presence of the rs3136441-C allele of the F2 gene. ETS1 is a transcription factor that controlling the expression of cytokine and chemokine genes and regulating angiogenesis by the activation of genes involved in migration and invasion of endothelial cells [100]. Three transcriptional regulators such as RXRA, SOX7, and FOXA2, that have been predicted to bind with SNP rs3136441 in the presence of allele C, may have the pathogenetic role for atherosclerosis. RXRA (UniProtKB-P19793) is a retinoic acid receptor that serves as a common partner for numerous nuclear receptors. In particular, the heterodimeric complex in RXRA/PPARA is necessary to ensure the transcriptional activity of PPARA (peroxisome proliferator-activated receptor alpha) in regulating expression of genes for β-oxidation of fatty acids (ACOX1), and cytochromes P450 generating free radicals leading to oxidative stress, which plays a key role in the pathogenesis of atherosclerosis. SOX7 (UniProtKB-Q9BT81) binds and activates promoter of the CDH5 gene, playing a role in transcriptional regulation of genes expressed in vascular endothelium [101]. FOXA2 (UniProtKB-Q9Y261) is a transcription factor involved in the embryonic development and epigenetic control of gene expression by opening chromatin and accessing regulatory proteins to interact with enhancers or promoters. It is also involved in the regulation of glucose homeostasis and lipid metabolism. Interestingly, FOXA2 is also capable to bind to the promoter of fibrinogen gene (FGB), an important regulator of blood coagulation, as a result of IL6-induced activation of its transcription [102].
Enrichment analysis has identified biological processes that were enriched with several sets of transcription factors in silico predicted to bind with CAD-associated alleles at the studied GWAS loci. The enriched biological processes included adipogenesis (for TFs binding to the C-rs1883025 allele of ABCA1), aryl hydrocarbon receptor (for TFs binding to the G-rs4420638 allele of APOC1), WNT signaling (for TFs binding to the T-rs3136441 allele of F2), TGF-β receptor signaling, apoptosis, angiogenesis, differentiation of white and brown adipocytes, adipogenesis (for TFs binding to the T-rs55730499 of LPA), positive regulation of the macromolecular metabolic process and energy metabolism (for TFs binding to the C-rs881844 allele of STARD3), nuclear receptors in lipid metabolism and toxicity (for TFs binding to the G-rs1689800 allele of ZNF648) negative regulation of fat cell differentiation (for TFs that bind to the T-rs838880 allele of SCARB1), MAPK targets/ Nuclear events mediated by MAP kinases (for TFs binding to the G-rs9987289 allele of PPP1R3B). Thus, bioinformatics methods allowed establishing that lipid-related GWAS loci (1) have a regulatory potential and/or are linked to a number of regulatory polymorphisms; (2) are associated with tissue-specific gene expression; (3) are objects of epigenetic regulation of gene expression through chemical modification histones; (4) are the binding sites for transcription factors modulating gene expression. Taking together the study findings show that the phenotypic effects of the lipid-associated GWAS loci are not limited by the impact on lipid metabolism, but, most likely, are characterized by multiple or pleiotropic effects on various biological processes linking to the development of atherosclerosis. The overlapping effects of the studied genes on the cardiovascular phenotypes indicate their involvement in the polygenic mechanisms of coronary artery disease. These effects seem to be mediated by shared transcription factors representing gene regulatory networks related with the pathogenesis of atherosclerosis.
The study has some limitations. A relatively small number of participants in this study did not allow presenting more reliable estimates of the relationships between lipidassociated GWAS loci and cardiovascular phenotypes of high statistical power, as can be achieved by large-scale genome-wide association studies that we used for validating the observed genotype-phenotype correlations. Lacking some clinical data such as blood pressure, body mass index, and alcohol intake in the study patients gave us no possibility for assessing their effects as possible confounders on plasma lipid levels, carotid intimamedia thickness, and the risk of coronary artery disease. Moreover, lacking data on lipid profile and CIMT in the control group did not allow replicating genotype-phenotype correlations observed in CAD patients. Because bioinformatics tools were used in this study to predict the potential effects of loci and putative disease-associated pathways, involved in the molecular mechanisms of atherosclerosis, the study findings should be considered as hypothesis-driving, and hence should be confirmed by experimental studies, including those integrating omics technologies such as genomics, transcriptomics, metabolomics, proteomics, and bioinformatics. Interpreting inter-population differences in the associations of genetic markers with cardiovascular phenotypes, we should take into account the fact that the studied polymorphisms are a priori characterized by weak or moderate phenotypic effects that are quite difficult to reproduce in independent populations, given a strong genetic heterogeneity of human populations, both in minor allele frequencies and linkage disequilibrium between the loci [103][104][105]. Finally, since the studied loci are located in noncoding regions of the human genome or intergenic spacers, their phenotypic effects in the relation to functions of neighboring genes should be interpreted with caution, since our knowledge on the mechanisms of genome function and the regulation of gene expression is still limited.

Conclusions
In the present study, we performed a comprehensive statistical and bioinformatics analysis including (a) the model-based multifactor dimensionality reduction method for stochastic modeling gene-gene and gene-environment interactions constituting the polygenic mechanisms of coronary artery disease and contributing to the intermediate cardiovascular phenotypes such as plasma lipids, lipoproteins, and carotid intima-media thickness, (b) functional SNP annotation for assessing their regulatory potential and impact on gene expression in a tissue specific manner, and (c) in silico prediction of allele-specific binding sites for transcription factors at the lipid-associated GWAS loci to identify pathways and gene regulatory networks controlling the molecular pathogenesis of CAD. The studied polymorphisms were predicted to be have significant but unequal effects on each studied phenotypes such as plasma lipids, carotid intima-media thickness, and coronary artery disease. Putative mechanisms by which lipid-associated GWAS loci contribute to CAD susceptibility involve well-recognized atherogenic changes in the plasma lipid profile, such as increased levels of total cholesterol and low-density lipoprotein cholesterol, as well as thickening of the arterial wall, as assessed through measuring CIMT. The phenotypic effects of lipid-associated GWAS polymorphisms might not be limited only by atherogenic changes in the lipid metabolism, and might involve other biological processes and metabolic pathways that play a role in atherosclerosis such as hemostasis (F2, coagulation factor II and ST3GAL4, ST3 beta-galactoside alpha-2,3-sialyltransferase 4), immune response and inflammation (LILRA3, leukocyte immunoglobulin like receptor A3), carbohydrate metabolic process (PPP1R3B, protein phosphatase 1 regulatory subunit 3B), and apoptosis (SCARB1, Scavenger receptor class B member 1).
The present study is the first to show that lipid-associated GWAS loci such as rs4420638 of APOC1, rs3136441 of F2, rs55730499 of LPA and rs6065906 of PLTP are susceptibility markers for coronary artery disease regardless of sex, age, and body mass index. The lipid-related GWAS loci are in tightly epistatic interactions to each other, as well as with other candidate genes contributing to atherosclerosis, thereby representing an important part of the polygenic predisposition to coronary artery disease. Finally, the methodology and approaches implemented in the present study can be used for further research focusing on deciphering the molecular mechanisms by which gene polymorphisms established by genome-wide association studies contribute to common multifactorial diseases.