Examining the Association between Mitochondrial Genome Variation and Coronary Artery Disease

Large-scale genome-wide association studies have identified hundreds of single-nucleotide variants (SNVs) significantly associated with coronary artery disease (CAD). However, collectively, these explain <20% of the heritability. Hypothesis: Here, we hypothesize that mitochondrial (MT)-SNVs might present one potential source of this “missing heritability”. Methods: We analyzed 265 MT-SNVs in ~500,000 UK Biobank individuals, exploring two different CAD definitions: a more stringent (myocardial infarction and/or revascularization; HARD = 20,405), and a more inclusive (angina and chronic ischemic heart disease; SOFT = 34,782). Results: In HARD cases, the most significant (p < 0.05) associations were for m.295C>T (control region) and m.12612A>G (ND5), found more frequently in cases (OR = 1.05), potentially related to reduced cardiorespiratory fitness in response to exercise, as well as for m.12372G>A (ND5) and m.11467A>G (ND4), present more frequently in controls (OR = 0.97), previously associated with lower ROS production rate. In SOFT cases, four MT-SNVs survived multiple testing corrections (at FDR < 5%), all potentially conferring increased CAD risk. Of those, m.11251A>G (ND4) and m.15452C>A (CYB) have previously shown significant associations with body height. In line with this, we observed that CAD cases were slightly less physically active, and their average body height was ~2.00 cm lower compared to controls; both traits are known to be related to increased CAD risk. Gene-based tests identified CO2 associated with HARD/SOFT CAD, whereas ND3 and CYB associated with SOFT cases (p < 0.05), dysfunction of which has been related to MT oxidative stress, obesity/T2D (CO2), BMI (ND3), and angina/exercise intolerance (CYB). Finally, we observed that macro-haplogroup I was significantly (p < 0.05) more frequent in HARD cases vs. controls (3.35% vs. 3.08%), potentially associated with response to exercise. Conclusions: We found only spurious associations between MT genome variation and HARD/SOFT CAD and conclude that more MT-SNV data in even larger study cohorts may be needed to conclusively determine the role of MT DNA in CAD.


Study Population, Disease Phenotypes and Quality Filtering
The UK Biobank [42] is a large population-based prospective cohort study from the United Kingdom with genetic and deep phenotypic (~7221 phenotypes http://www. nealelab.is/uk-biobank, accessed on 22 February 2022) data on ca. 500,000 individuals aged 40 to 69. We downloaded these data (application ID 61684) and used a similar CAD case definition, as previously described by [15], for UK Biobank. HARD CAD cases included individuals with fatal or non-fatal myocardial infarction (MI), percutaneous transluminal coronary angioplasty (PTCA), or coronary artery bypass grafting (CABG). SOFT CAD included individuals meeting the HARD CAD definition as well as those with chronic ischemic heart disease (IHD) and angina ( Figure 1). In HESIN hospital episodes data and death registry data from diagnosis and operation (primary and secondary causes), MI was defined as hospital admission or cause of death due to ICD9 410-412, ICD10 I21-I24, I25.2; PTCA was defined as hospital admission for PTCA (OPCS-4 K49, K50.1, K75); CABG was defined as hospital admission for CABG (OPCS-4 K40-K46); and angina or chronic IHD was defined as hospital admission or death due to ICD9 413, 414.0, 414.8, 414.9, ICD10 I20, I25.1, I25.5-I25. 9. In UK Biobank self-reported data, cases were defined as having "vascular/heart problems diagnosed by the doctor" or "non-cancer illnesses that self-reported as angina or heart attack". Self-reported surgery included PTCA, CABG, or triple heart bypass. All participants not defined as CAD cases using the SOFT definition were considered as controls in the analysis. For a complete list of definition codes, see Supplementary Table S1. We subsequently performed individual-level filtering ( Figure 1) by removing missingness or heterozygosity outliers, participants with self-reported vs. genetically inferred sex mismatches or putative sex chromosome aneuploidy, individuals that were not of European (EUR) ancestry, and individuals having withdrawn their consent at the time of analysis. We also identified closely related participants (kinship coefficient > 0.088 i.e., first-or seconddegree relative pairs), preferentially retaining CAD cases or relatives with the highest call rate.

Genotype Data Quality Control
In the UK Biobank [42], genotyping was performed using Affymetrix UK biobank Axiom (450,000 samples) and Affymetrix UK BiLEVE Axiom (50,000 samples) arrays ( Figure 1) and the autosomal genetic data were subsequently imputed to the Haplotype Reference Consortium panel and UK10K4 + 1000 Genomes panel. We downloaded the genotype data for the 265 MT DNA variants for all 500,000 individuals and pre-processed MT DNA data as previously described in ref. [43]. In brief, we first made sure that the reference alleles match the latest MT Cambridge Revised Sequence (rCRS) of the Human MT DNA positions. After setting all potential heterozygotes to missing, further quality control of genotyped individuals included filtering for missingness by individual < 0.1 and missingness by SNV < 0.1 with PLINK [44]. For common/low-frequency variant association analyses, we also required that the minor allele frequency (MAF) > 0.01. An overview of the filtering of MT-SNVs is provided in Figure 1.

MT-SNV Association Analyses
For common and low-frequency (MAF > 0.01; n = 111) variants, we performed single marker tests to explore their associations with HARD and SOFT CAD ( Figure 1) using SNPTEST v2.5.4 with the frequentist test and expected method, as previously described by ref. [45]. We used as covariates the array (UK Biobank vs. UK BiLEVE), sex, birth year, and the first five principal components of the autosomal genotype data, provided by the UK Biobank, similar to ref. [15] and Benjamini-Hochberg (BH) [46] adjustment for multiple testing was applied to calculate the false discovery rate (FDR). MT-SNV annotations were performed using a manually-curated database, HmtVar (https://www.hmtvar.uniba.it/, accessed on 22 February 2022).

MT-Gene-Based Association Analyses
To also consider the potential effects of rare (MAF ≤ 0.01) variants on CAD risk, we assigned all SNVs to MT genes based on MITOMAP (https://www.mitomap.org/MITOMAP, accessed on 22 February 2022) and used the R software package SKAT (v2.0.1) [47] to perform MT-gene-based (additionally including the whole mitochondrion as our region of interest, MT) association analyses with HARD and SOFT CAD phenotypes (Figure 1), again using as covariates the array (UK Biobank vs. UK BiLEVE), sex, birth year and the first five principal components of the autosomal genotype data, provided by the UK Biobank, similar to ref. [15] and obtain resampled residuals (n.Resampling = 1000, type.Resampling = "bootstrap") to compute resampling p-value.

Characteristics of Study Subjects
The current study included ca. 500,000 genotyped individuals from the UK Biobank [42], 48,700 with an inclusive CAD phenotype (SOFT) that incorporated self-reported angina or other evidence of chronic coronary heart disease, of which 28,503 had a more stringently defined CAD phenotype (HARD) of myocardial infarction (Figure 1), similar to ref. [15]. All participants (n = 453,805) not defined as CAD cases using the SOFT definition are considered as controls in the analysis. After this step of quality control, 45,285 individuals were removed: 24,770 cases with the HARD, 42,079 cases with the SOFT CAD phenotype, and 415,271 controls remained. Finally, further quality control of genotyped individuals included filtering for missingness by individual < 0.1 and missingness by SNV < 0.1 was performed, considering the 265 mitochondrial variants present on the UK Biobank or UK BiLEVE arrays (as described in Section 2.2. in Methods). As a result, a further 54,474 individuals were removed, leaving us with 20,405 cases with the HARD and 34,782 cases with the SOFT phenotype vs. 356,563 controls. Individual characteristics of these individuals, in terms of common CAD risk factors, are summarized in Table 1. Table 1. Individual characteristics of HARD and SOFT CAD cases vs. controls. *** Represents statistically significant (of p < 0.001) difference between HARD and SOFT CAD cases vs. controls, whereas +++, ++ and + represent statistically significant (of p < 0.001, p < 0.01 and p < 0.05, respectively) difference between HARD vs. SOFT CAD cases.

MT-SNV Associations with HARD and SOFT CAD Phenotypes
After quality control of genotyped individuals (including filtering for missingness by individual < 0.1 and missingness by SNV < 0.1, as described in Section 2.2. in Methods), from the 265 MT-SNVs present in the UK Biobank or UK BiLEVE arrays, 243 remained for further analyses. For the genotyped common and low-frequency MT-SNVs (MAF > 0.01; n = 111, of those n = 39 with MAF > 0.05; genotyping rate > 0.99) in the UK Biobank, we performed single marker association analyses with HARD (n = 20,405) and SOFT (n = 34,782) CAD phenotypes, adjusting for the array, sex, birth year and first five principal components.
In HARD cases, no MT-SNVs survived multiple testing correction, the most significant (nominal p < 0.05) findings (Table 2 Table S2.
In SOFT cases, four MT-SNVs survived multiple testing correction (at FDR < 5%; Table 3 and

MT-Gene-Based Associations with HARD and SOFT CAD Phenotypes
We next sought to also consider the potential effects of rare (MAF < 0.01) MT-SNVs on CAD risk, hence we performed MT-gene-based association analyses with HARD and SOFT CAD phenotypes, additionally including the whole mitochondrion as our region of interest (MT). As a result, we observed that in both HARD and SOFT cases, CO2 displayed gene-based association at nominal significance (p < 0.05), while CYB and ND3 were also associated (nominal p < 0.05) with SOFT CAD phenotype (Table 4). When considering the whole mitochondrion (MT), no significant associations with CAD were observed (n = 243; p = 0.07, Table 4).

Discussion
Over the last 14 years, several large-scale genome-wide association studies have found hundreds of single-nucleotide variants (SNVs) significantly associated with CAD; however, these explain <20% of the heritability. In this study, we hypothesize that mitochondrial (MT)-SNVs might present one potential source of the "missing heritability".
We analyzed 265 common/low-frequency (MAF ≥ 1%) and rare (MAF < 1%) MT-SNVs in~500,000 UK Biobank individuals, exploring two different CAD definitions, HARD (n = 20,405) and SOFT (n = 34,782) (Figure 1), as previously proposed by [15], and using the array, sex, birth year and first five principal components as covariates. Overall, the differences in the prevalence of common risk factors among CAD cases (both HARD and SOFT phenotypes) and controls were statistically significant (p < 0.001; Table 1), male gender, older age, hypertension, hypercholesterolemia, obesity, T2D, physical inactivity, shorter body statue, smoking and positive family history demonstrating predominance in CAD patients.
In SOFT cases, four MT-SNVs survived multiple testing correction (at FDR < 5%; Table 3  both were found more frequently in SOFT CAD cases vs. controls (OR = 1.03; 95% CI 1.01-1.05). m.11251A>G (rs869096886) and m.15452C>A (rs193302994) are tagging macrohaplogroup J and thus potentially related to a decreased cardiorespiratory fitness/exercise capacity [52,53] and increased CAD risk. Moreover, both MT-SNVs displayed significant associations with body height in the UK Biobank ( Figure 5) [58]. In line with this, we observed that the average body height of both male and female CAD cases was~2.00 cm lower compared to controls (Table 1), and shorter body height is related to an increased CAD risk [66]. Gene-based tests revealed that in both HARD and SOFT cases, CO2 displayed genebased association at nominal significance (p < 0.05, Table 4). The CO2 gene encodes for the second subunit of cytochrome c oxidase (COX, complex IV). Dysfunction of COX has been previously associated with mitochondrial oxidative stress, obesity, and T2D [67]. CYB and ND3 were also associated (nominal p < 0.05) with the SOFT CAD phenotype (Table 4). Somatic variations in CYB have been previously related to hypertrophic cardiomyopathy (one of its clinical manifestations being angina) and exercise intolerance [68]. Recently, a large gene-based meta-analysis of mitochondrial genes with several metabolic traits identified ND3 associated with BMI (p < 1 × 10 −3 ) [43].
All haplogroups demonstrating significant (at FDR < 5%) associations in our study (M45a, G2b1a2, U2b2 with both HARD/SOFT, M57b1 and L2c (under-represented) with HARD and M3a with SOFT CAD phenotypes; Table 5 and Supplementary Table S4) were with a frequency <1%, whereas other studies have considered only haplogroups with a frequency ≥5% [69]. Low counts in the less common haplogroups may lead to a falsepositive result [70]. Although this should be addressed by performing multiple testing corrections, grouping the less frequent haplogroups may be another approach to tackle this [70]. Hence, we also assigned individuals to one of the major European haplogroups (Figure 4) for comparison. As a result, we observed that 43.28% of the individuals belonged to the macro-haplogroup H, 13.70% to the macro-haplogroup U, 10.70% to the macrohaplogroup J, and 9.52% to the macro-haplogroup T (Figure 4), in line with previous reports in other European populations [71]. Overall, the frequencies of the major European mitochondrial haplogroups did not differ significantly (at FDR < 5%) between CAD patients and control subjects (Figure 4). Only the frequency of haplogroup I was significantly (nominal p < 0.05) higher in patients with HARD CAD phenotype vs. controls (3.35% vs. 3.08%, OR = 1.09) and the macro-haplogroup R was significantly (nominal p < 0.001 and p < 0.01) higher in patients with HARD and SOFT CAD phenotype vs. controls (0.26% and 0.23% vs. 0.16%, OR = 1.70 and OR = 1.49, respectively) ( Figure 4). Of note, however, we were able to assign most samples reliably into haplogroups, as the MT DNA haplogroups were deduced from genotyping arrays with limited numbers of (high-quality) SNVs being profiled, hence the quality score for haplogroup assignment ranged from 0.50 to 0.86, with a median of 0.68. Therefore, we were not able to exclude samples with quality scores for haplogroup assignment <0.8 (as in ref. [72]). Moreover, we performed Fisher's exact, which did not allow us to adjust for covariates, hence it is possible that known and unknown potential confounding factors might have influenced these results. Though, in which cases to adjust for which covariates and whether it will increase or decrease the study power and/or bias, is still a matter of intense debate [73,74].
Several other limitations should also be acknowledged. It is well known that very large cohorts are required to reliably associate genetic variations with complex traits [70]. The power for detecting causal MT-SNVs and haplogroups has been compared with that in the nuclear genome given equal effect sizes, estimating that the sample size required for the mitochondrial studies would be roughly the same as that needed for the nuclear genome studies [75]. Previous power calculations for ischemic stroke (assuming an additive model) [76] revealed a maximum power of 73% to detect SNVs with OR = 1.4 and MAF = 0.30, whereas for SNVs conferring OR = 1.20 and MAF = 0.20, the study power dropped to 4.6% and further to 0.001% for OR = 1.10 and MAF = 0.10. This study concluded that "prohibitively large sample sizes" would be required to achieve sufficient power to detect individual MT DNA variants [76]. In line with this, we observe that in HARD CAD cases, where n = 20,405, no MT-SNVs survived multiple testing correction, whereas when increasing n to 34,782 in SOFT CAD cases, four MT-SNVs survived multiple testing correction (FDR < 5%). Hence, even larger sample sizes (≥50,000) may be required to reliably associate MT-SNVs and haplogroups with CAD.
In addition to the number of individuals, the number of MT-SNVs studied was also limited. In the UK Biobank [42], genotyping was performed using Affymetrix UK biobank Axiom (450,000 samples) and Affymetrix UK BiLEVE Axiom (50,000 samples) arrays, which included 265 genotyped MT DNA variants. After quality control procedures (Figure 1), 243 MT-SNVs remained for further analyses, and 111 of those were common or low-frequency (MAF > 0.01) and could be used for single-marker association analyses. However, this is clearly not a representative set of MT-SNVs and, as previously recognized, some regions may be not well covered, such as the hypervariable regions [77,78]. Clearly, whole-genome sequencing or targeted sequencing of MT-DNA, considering their ability to achieve a deep genome coverage, would allow the identification of many more MT-SNVs (especially the low-frequency/rare variants; MAF ≤ 0.01), improving also the detection of haplogroups and allowing the investigation of heteroplasmy, a phenomenon characteristic to MT DNA [33,79].
Heteroplasmy denotes the coexistence of MT DNA genomes with wild-type inherited SNVs and somatic variants in varying ratios, which are dynamically determined and display patterns of cell and tissue specificity, and may differ even within the same mitochondrion [33], determining the clinical presentation of disease phenotypes [77,80,81]. In this study, we were limited to genotype calls from arrays, which are restricted in terms of minor alleles and do not allow the capture of heteroplasmy [77,80]. Moreover, MT DNA content was assessed only in blood cells, whereas previous studies have identified an additional six vascular and metabolic tissues relevant to CAD [82,83]. Therefore, whole-genome sequencing/targeted sequencing of MT-DNA across several vascular and metabolic tissues relevant to CAD may be necessary [7,82,83] in order to characterize the full landscape of mitochondrial genetic variations and their potential contribution to these complex disease phenotypes. This may be necessary, especially considering that the energy requirements and thus sensitivity to the changes in mitochondrial function differ for different cells and tissues and hence may be important in determining the phenotypic effect of MT-SNVs [31].
We also did not consider mitochondrial DNA copy number (MT DNA-CN), representing the number of mitochondria per cell and the number of MT DNA per mitochondrion [84,85]. Each mitochondrion contains multiple copies of MT DNA, and different cells and tissues contain different numbers (up to 7000) of mitochondria, again displaying patterns of cell and tissue variability [84,85]. MT DNA-CN is believed to serve as an indirect biomarker that would capture the underlying mitochondrial activity and function, such as energy production capacity and metabolic characteristics, thus possibly playing a causative role in health and disease [85]. Decreased MT DNA-CN has been previously associated with an increased risk of developing cardiovascular disease (CVD) outcomes [84]. More recently, similar analyses in the UK Biobank demonstrated a possible causal role of lower MT DNA-CN on higher CAD risk [86]. In an even larger cohort (of 408,361 individuals from TOPMed and UK Biobank), a decline in MT DNA-CN was observed in elderly individuals (>65 years) and lower MT DNA-CN levels also demonstrated an age-independent association with hypertension, hyperlipidemia, T2D, and obesity, i.e., the well-known CAD risk factors [87]. However, none of these studies compared the MT DNA-CN levels between HARD vs. SOFT CAD phenotypes, which could be a subject of future studies. Furthermore, considering that MT DNA-CN varies greatly across cell and tissue types, again profiling of several vascular and metabolic tissues relevant to CAD may be necessary for such investigations [7,82,83].
Yet another important aspect not considered here is the nuclear genome, considering the co-evolution of mitochondria and eukaryotic cells [37]. The mitochondrial genome encodes only 37 genes, mainly components of the OXPHOS machinery, whereas the re-maining~1000-1500 mitochondrial proteins are all encoded by the nuclear genome [88]. The importance of common genetic variation in the nuclear genome regulating MT heteroplasmy and DNA-CN is an active area of research [85,[89][90][91]. Moreover, genetic variants in nuclear genes could lead to oxidative disorders or modulate the mitochondrial variants [81], and mild nuclear gene variants could potentially become clinically relevant when combined with an incompatible MT DNA [92]. Additive interactions (epistasis) between mitochondrial variants in the MT-ND2 gene and nuclear variants in genes responsible for mitochondrial replication and transcription have been demonstrated to influence the BMI and obesity phenotype [93]. Similarly, our previous investigations have demonstrated the role of nuclear-encoded mitochondria imported genes in coordinating the response to hypercholesterolemia and atherosclerotic lesion expansion, as well as foam cell formation [24]. Hence, further analyses also considering these additional variations will be required.
Finally, similar to SNVs in the nuclear genome, even if (mitochondria) genome-wide significant associations with HARD/SOFT CAD phenotypes would be identified, their functional consequences would need to be determined in the CAD-relevant tissues [82,83]. Currently, functional studies for MT-SNVs are not readily available; however, several novel experimental animal models (e.g., mice strains displaying DNA haplogroups similar to those observed in humans) may be available in the near future, allowing the investigation of the potential causality of the relationship between inherited "natural" non-pathogenic MT-SNVs and potential alterations in mitochondrial function (e.g., oxygen consumption and oxidant production, cellular ATP levels) and their relation to alterations in cardiovascular function and CAD risk [31,37,81].

Conclusions
We found only spurious MT-SNV, gene, and haplogroup associations with HARD and SOFT CAD phenotypes and conclude that whole-genome sequencing/targeted-sequencing of MT-DNA, across several vascular and metabolic tissues relevant to CAD in even larger study cohorts (n > 50,000), followed by functional studies in animal models, may be necessary to conclusively determine the role of MT-SNVs, genes, and haplogroups in modulating the risk of CAD. Therefore, whole-genome sequencing/targeted sequencing of MT-DNA across several vascular and metabolic tissues relevant to CAD may be necessary [7,82,83] in order to characterize the full landscape of mitochondrial genetic variations and their potential contribution to these complex disease phenotypes. This may be especially necessary considering that the energy requirements and thus sensitivity to the changes in mitochondrial function differ for different cells and tissues and hence may be important in determining the phenotypic effect of MT-SNVs [31].
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13030516/s1, Table S1. A complete list of HARD and SOFT CAD case definition codes. Table S2. A complete list of 111 common and low-frequency MT-SNV associations with HARD CAD phenotype. Table S3. A complete list of 111 common and low-frequency MT-SNV associations with SOFT CAD phenotype. Table S4. Haplogroup assignment in HARD CAD cases vs. controls prior to excluding samples with quality scores for haplogroup assignment <0.8 and further assigned individuals to one of the major European haplogroups. Table S5. Haplogroup assignment in SOFT CAD cases vs. controls prior to excluding samples with quality scores for haplogroup assignment <0.8 and further assigned individuals to one of the major European haplogroups.  Institutional Review Board Statement: The present study was conducted using the UK Biobank Resource (application ID 61684). The UK Biobank has obtained approval from the North West Multi-centre Research Ethics Committee (MREC) as a Research Tissue Bank (RTB; 11/NW/0382) [42].
Informed Consent Statement: Electronic signed consent was obtained from all the UK Biobank patient(s) at recruitment [42].

Data Availability Statement:
The data supporting results reported in this study will be returned to the UK Biobank and available for download to registered researchers on approved applications (https: //biobank.ndph.ox.ac.uk/ukb/ukb/docs/ukblink_instruct.html, accessed on 22 February 2022).