Genome-Wide Association Studies of Conotruncal Heart Defects with Normally Related Great Vessels in the United States

Conotruncal defects with normally related great vessels (CTD-NRGVs) occur in both patients with and without 22q11.2 deletion syndrome (22q11.2DS), but it is unclear to what extent the genetically complex etiologies of these heart defects may overlap across these two groups, potentially involving variation within and/or outside of the 22q11.2 region. To explore this potential overlap, we conducted genome-wide SNP-level, gene-level, and gene set analyses using common variants, separately in each of five cohorts, including two with 22q11.2DS (N = 1472 total cases) and three without 22q11.2DS (N = 935 total cases). Results from the SNP-level analyses were combined in meta-analyses, and summary statistics from these analyses were also used in gene and gene set analyses. Across all these analyses, no association was significant after correction for multiple comparisons. However, several SNPs, genes, and gene sets with suggestive evidence of association were identified. For common inherited variants, we did not identify strong evidence for shared genomic mechanisms for CTD-NRGVs across individuals with and without 22q11.2 deletions. Nevertheless, several of our top gene-level and gene set results have been linked to cardiogenesis and may represent candidates for future work.


Introduction
Congenital heart defects comprise some of the most common, serious, and clinically important groups/types of birth defects [1][2][3]. These defects consist of a heterogenous group of structural heart malformations (i.e., conotruncal heart defects that affect the cardiac outflow tract) that are thought to have at least some shared genetic basis [4][5][6][7]. Some conotruncal heart defects involve a deviation from the normal position of the origin of the aorta and pulmonary trunk, in which case the great vessels are said to be transposed. Normally related great vessels indicate that the aorta emerges from the left ventricle while the pulmonary artery arises from the right ventricle. Conotruncal heart defects with normally related great vessels (CTD-NRGVs) frequently occur in individuals with Genes 2021, 12, 1030 2 of 10 a hemizygous 22q11.2 deletion, whereas those with transposed great vessels very rarely occur in the context of a 22q11.2 deletion. This suggests that the genetic basis of CTD-NRGV may differ from CTDs with transposed vessels. Further, since CTD-NRGVs occur both in individuals with and without a 22q11.2 deletion, there may be overlap in the genetic contribution to CTD-NRGV in these groups. This potential overlap in genetic etiology could include genetic variation within as well as outside of the 22q11.2 region, though this hypothesis has not been extensively studied.
Both common copy number variants and common single nucleotide polymorphisms (SNPs) have been found to be associated with increased risk for heart defects in individuals with the 22q11.2DS [8]. Further, common variants in the distal region of the remaining 22q11.2 allele have been associated with increased risk for these defects among individuals with 22q11.2DS [6]. These data suggest that genetic variation within and outside of the deleted region contributes to the risk of heart defects in individuals with the deletion. In addition, association studies of rare copy number variants (rCNVs) suggest that at least some overlap in the genes and pathways that are involved in CTD-NRGVs in patients with and without 22q11.2DS [9][10][11]. For example, among separate cohorts of individuals with and without 22q11.2DS, Xie et al. (2019) identified 14 gene sets from Reactome pathways of interest [12] (e.g., gene silencing by RNA pathway, TGF-beta signaling pathway), with rCNVs over-represented among patients with CTD-NRGVs compared to controls without heart defects [10].
In general, both common and rare inherited variants are thought to play a role in conotruncal defects [7,13,14]. However, prior genome-wide association studies (GWAS) of conotruncal heart defects, both among cases with and without 22q11.2DS, have had somewhat limited success in identifying significant associations. Most of these initial studies have been limited by a fairly small number of cases, and the subset of conotruncal defects with NRGVs has not been evaluated in cases which do not have 22q11.2DS.
To assess the possibility of shared genetic susceptibility to CTD-NRGV between those with and those without a 22q11.2 deletion, we conducted GWAS and meta-analyses at the SNP-level and conducted gene-level GWAS, as well as gene set analyses using the rCNV Reactome pathways identified by Xie et al. [10]. CTD-NRGVs were defined based on the presence of normally related great vessels in the context of at least one of the following diagnoses: tetralogy of Fallot, ventricular septal defects (conoventricular, posterior malalignment, and conoseptal hypoplasia), isolated aortic arch anomalies, truncus arteriosus, and interrupted aortic arch. Normally related great vessels are defined by the association of the pulmonary artery with the right ventricle and the aorta with the left ventricle (i.e., the presence of fibrous continuity between the aortic and mitral valves), where the aortic valve is situated posteriorly and just rightward of the pulmonary valve. Participants with CTD-NRGVs, but without documented 22q11.2 deletions, and their parents were recruited at the Children's Hospital of Philadelphia (CHOP) during 1999-2010 and through the Pediatric Cardiac Genomics Consortium (PCGC) during 2010-2012, as previously described [13] (Figure 1). In both CHOP and PCGC groups, cases with suspected syndromes, including 22q11.2DS, were excluded. Further, all CHOP cases screened negative for a 22q11.2 deletion, using fluorescence in situ hybridization and/or multiplex ligation-dependent probe amplification [15,16]. Potential cases with other documented genetic syndromes were also excluded, based on a review of cardiac medical records [13]. To allow for case-control analyses among cases without trio data (e.g., missing parent samples), data for pediatric controls undergoing well-child visits at CHOP were also obtained [13]. Because trio-based analyses are robust to potential population stratification [17], trios of any race/ethnicity were included. However, all cases and controls were self-reported Caucasians, as case-control analyses are more sensitive to without trio data (e.g., missing parent samples), data for pediatric controls undergoing well-child visits at CHOP were also obtained [13]. Because trio-based analyses are robust to potential population stratification [17], trios of any race/ethnicity were included. However, all cases and controls were self-reported Caucasians, as case-control analyses are more sensitive to this potential bias. Each participant or parent provided informed consent under protocols approved by the institutional review boards at CHOP or the PCGC clinical study sites.

Subjects with 22q11.2DS
Data for subjects with 22q11.2DS were obtained from affected subjects and their parents recruited by the International Chromosome 22q11.2 Deletion Syndrome Consortium, the International 22q11.2 Brain Behavior Consortium, and clinical groups that specialize in the treatment of individuals with 22q11.2DS, as previously described [18] (Figure 1). For all cases, the 22q11.2 deletion was confirmed using fluorescence in situ hybridization and/or multiplex ligation-dependent probe amplification [18]. Subjects with CTD-NRGVs were considered to be "cases" and those without a clinically significant heart defect were considered to be "controls." Of note, a substantial proportion of subjects were recruited in Santiago, Chile, and this cohort was genotyped and analyzed separately [18]. Each participant or parent provided informed consent under protocols approved by the institutional review board at Albert Einstein College of Medicine.

Subjects without 22q11.2DS
Genomic DNA was genotyped using Illumina arrays and additional genotypes were imputed using reference data from the 1000 Genomes Project, as previously described [13]. Pre-imputation quality control measures included exclusion of case-parent trios (Mendelian error rate > 1%) and variants with minor allele frequency < 1% or genotyping rate < 90%. Post-imputation quality control measures included exclusion of variants with minor allele frequency < 5%, genotyping rate < 90%, or r 2 < 0.8, which suggests poor imputation. At that stage, we also excluded variants and individuals with genotyping rates < 90%.

Subjects with 22q11.2DS
Data for subjects with 22q11.2DS were obtained from affected subjects and their parents recruited by the International Chromosome 22q11.2 Deletion Syndrome Consortium, the International 22q11.2 Brain Behavior Consortium, and clinical groups that specialize in the treatment of individuals with 22q11.2DS, as previously described [18] (Figure 1). For all cases, the 22q11.2 deletion was confirmed using fluorescence in situ hybridization and/or multiplex ligation-dependent probe amplification [18]. Subjects with CTD-NRGVs were considered to be "cases" and those without a clinically significant heart defect were considered to be "controls". Of note, a substantial proportion of subjects were recruited in Santiago, Chile, and this cohort was genotyped and analyzed separately [18]. Each participant or parent provided informed consent under protocols approved by the institutional review board at Albert Einstein College of Medicine.

Subjects without 22q11.2DS
Genomic DNA was genotyped using Illumina arrays and additional genotypes were imputed using reference data from the 1000 Genomes Project, as previously described [13]. Pre-imputation quality control measures included exclusion of case-parent trios (Mendelian error rate > 1%) and variants with minor allele frequency < 1% or genotyping rate < 90%. Post-imputation quality control measures included exclusion of variants with minor allele frequency < 5%, genotyping rate < 90%, or r 2 < 0.8, which suggests poor imputation. At that stage, we also excluded variants and individuals with genotyping rates < 90%.
We have previously described SNP- [13] and gene-level [19] GWAS of a broader group with any conotruncal defects. The present analysis involved only the subset of those cases with CTD-NRGVs, a group we have not previously reported on.

Subjects with 22q11.2DS
Genomic DNA was genotyped using an Affymetrix array and additional genotypes were imputed using reference data from the 1000 Genomes Project, as previously described [18]. Pre-imputation quality control measures included exclusion of variants with minor allele frequency < 1%, genotyping rate < 95%, or deviation from Hardy-Weinberg equilibrium in controls based on p ≤ 1 × 10 −5 . Post-imputation quality control measures included the exclusion of variants with minor allele frequency ≤ 1%, or r 2 < 0.8.
We have previously conducted a GWAS of cases with 22q11.2 deletions and one specific conotruncal defect, tetralogy of Fallot [18]. The present analysis involved these subjects as well as the broader group of subjects with any CTD-NRGVs, for which we have not previously reported results.

SNP-Level Analyses
Separate SNP-level analyses were conducted for five individual cohorts, including three without a 22q11.2 deletion (461 CHOP trios, 180 PCGC trios, 294 CHOP cases/ 2976 CHOP controls) and two with 22q11.2DS (191 Chilean subjects with arrays processed in Santiago, Chile and 1281 subjects in the main cohort, 1244 with arrays processed at Albert Einstein College of Medicine, and 37 with arrays processed at the Children's Research Institute in Milwaukee, WI, USA), as previously described [13,18]. For the cases without a 22q11.2 deletion, 29% had tetralogy of Fallot and 71% had other defects; however, for the subjects with 22q11.2DS, 22% were cases with tetralogy of Fallot, 39% were cases with other defects, and 38% were controls without a clinically significant congenital heart defect. Briefly, trios were analyzed using a multinomial likelihood approach [20] implemented in the EMIM software package [21], and the case-control analyses were conducted using logistic regression based on an additive genetic risk model and adjusted for principle components of race/ethnicity (the first four components for the cohorts with 22q11.2DS and the first two components for the cohorts without 22q11.2DS). Because subjects with 22q11.2DS are hemizygous for all loci within the 1.5-3 million base-pair deleted region, we excluded genes in this region in the analyses of the cohorts with a 22q11.2 deletion and in the meta-analysis of all five cohorts. Following these five cohort-specific analyses, we conducted three meta-analyses using GWAMA v2.1 [22], restricted to variants that were present across all five cohorts (with the exception of the variants in the 22q11.2 hemizygous deletion region). These included analyses of individuals with 22q11.2DS and those without 22q11.2DS (all five cohorts), as well as separate meta-analyses for individuals without 22q11.2DS (three cohorts) and individuals with 22q11.2DS (two cohorts). We used a fixedeffects model for these analyses when Cochran's heterogeneity p > 0.1, and a random-effects model when Cochran's heterogeneity p ≤ 0.1 [13].

Gene-Level Analyses
Using MAGMA version 1.08 [23], gene-level analyses were conducted using SNP-level summary statistics from each meta-analysis as input. SNPs were annotated to proteincoding genes, defined by their transcription start-stop coordinates, using NCBI 37.3 (downloaded from https://ctg.cncr.nl/software/magma, accessed on 6 June 2019). SNPs within 1 kb upstream or downstream of the start or stop coordinates were included in the annotation window and also mapped to the gene.
Gene-level p-values were calculated from the SNP-level summary statistics for each meta-analysis. Magma software can estimate the gene-level p-value by using the mean test statistic for the SNPs or the top test statistic among the SNPs. Magma can also estimate an aggregate p-value obtained by combining both test statistics. For our analyses, we used the aggregate statistic to ensure even distribution of power and to account for a wider range of genetic models. The computed gene-level p-values were transformed to a Z-score using the probit transformation, with lower p-values (i.e., more significant associations) being associated with higher Z-scores. These Z-scores served as input for the gene set analyses.

Candidate Gene Set Analyses
We used MAGMA to conduct candidate gene set association analyses. First, we evaluated the 42 genes in the 22q11.2DS 3 Mb region as a single gene set among the non-deleted cohorts. Second, we separately evaluated 14 Reactome pathways identified in the rCNV study reported by Xie et al. [10]. This group of genes represents statistically significant shared pathways, expression patterns, and biological functions between patients with versus without conotruncal heart defects among patients with and without 22q11.2DS [10]. We also evaluated an additional aggregate gene set consisting of genes present in any of these 14 gene sets.
The gene-level association results for each gene were used as the input for these gene set association analyses. Specifically, each gene p-value computed from the gene-level association analysis was converted to a Z-score, which served as the dependent variable [23]. For these comparisons, we used competitive (as opposed to self-contained) association tests under a linear regression framework, which evaluate whether the genes in the set of interest are more strongly associated with a phenotype as compared to all other genes in the genome (i.e., β s = 0 against the alternative hypothesis β s > 0), correcting for gene size, gene density, differential sample size, and the log of those values [23]. This analysis corrects for potential confounders including gene size, density, and sample size by adding these variables and their log as additional covariates in the gene-level regression model. To adjust for linkage disequilibrium between genes, a gene-gene correlation matrix was approximated and included in the model (for gene pairs over 5 Mb apart, the correlation was set to zero) [23].

Interpretation
For the SNP-level analyses, we used the standard GWAS threshold (p < 5.0 × 10 −8 ) to identify statistically significant associations. SNP associations with p ≥ 5.0 × 10 −8 but less than p < 1.0 × 10 −5 were considered suggestive of association. For the gene and gene set analyses, we used a Bonferroni correction for the total number of genes and gene sets, respectively. Genes associated with p < 1.0 × 10 −3 but greater than the Bonferroni-corrected cut-off were considered to be suggestive of association.

SNP-Level
SNP-level analyses were conducted separately for the five individual cohorts (N = 3,311,160 SNPs). No SNP association achieved genome-wide significance (p < 5.0 × 10 −8 ) in any of the three meta-analyses (Tables S1 and S2). The smallest p-value was 1.6 × 10 −7 (rs6886261 in the non-deleted cohort) and a number of SNPs had p-values suggestive of association (p < 1.0 × 10 −5 ) (12 SNPs among individuals with 22q11.2DS, 147 among individuals without 22q11.2DS, and 129 among individuals with 22q11.2DS + without 22q11.2DS). However, no SNP association was suggestive of association in both meta-analysis of individuals with 22q11.2DS and meta-analysis of individuals without 22q11.2DS.

Gene Sets
Among individuals without 22q11.2DS only, the 42 genes in the 3 Mb 22q11.2 deleted interval were evaluated as a single gene set. However, this set was not significantly associated with CTD-NRGV in these data (p = 0.49). The 14 individual gene sets and aggregate gene set (all gene sets combined) from the rCNV Reactome pathways identified by Xie et al. [10] were assessed in all three groups (Table 2). No gene set was significantly associated with conotruncal defects with NRGVs after accounting for multiple comparisons using a Bonferroni correction for 14 comparisons (based on p < 3.6 × 10 −3 ). The lowest gene set p-values included 6.6 × 10 −3 (Gene Silencing by RNA gene set among individuals without 22q11.2DS) and 5.6 × 10 −3 (ECM-receptor interaction gene set among individuals with 22q11.2DS).

Discussion
Our findings from genome-wide SNP-and gene-level analyses and candidate gene set analyses among these cohorts did not provide strong evidence for associations due to common variants in either cohort or in the combined cohorts. Thus, while we did not observe results that strongly supported the hypothesis that there are shared genomic mechanisms involving common inherited variants for CTD-NRGVs across subjects with and without 22q11.2DS, our results also did not refute this hypothesis. Gene and gene set analyses among the 3 Mb 22q11.2 region and analyses of gene sets from the rCNV gene interaction network [10] did not strongly support or refute the notion that the respective regions may contribute to conotruncal defects with CTD-NRGVs among both deleted and non-deleted cases. Nevertheless, several results were suggestive of association, even in the absence of achieving statistical significance, and may represent helpful candidates to consider further in future work.
We found some suggestive evidence for association between SNPs and CTD-NRGVs, particularly among the cohorts without 22q11.2DS. Of the 129 SNPs with p < 1.0 × 10 −5 among the comparison of individuals with 22q11.2DS + without 22q11.2DS, 33 corresponded to INPP4B, and the majority of these SNPs also had p < 0.05 in both the separate comparison of individuals with 22q11.2DS and comparison of individuals without 22q11.2DS. In fact, similar trends were also observed for INPP4B among the gene-level comparisons, and it was the gene with the second-lowest p-value in Table 1. INPP4B is a Mg(2+)-independent phosphatase that is highly expressed in the heart [24], and it is a tumor suppressor involved in the inhibition of PI3K signaling [25]. However, relatively little is known about the function of this gene, and it is unclear what, if any, role this gene may play in cardiogenesis. Additionally, 48 of the 129 SNPs with suggestive associations in the comparison of individuals with 22q11.2DS + without 22q11.2DS corresponded to TULP4, a candidate gene for craniofacial cleft and short stature that has also been implicated as a contributing gene in a patient with features of 22q11.2DS but without a 22q11.2 deletion [26]. However, suggestive associations with these SNPs in TULP4 were actually observed in our comparison of individuals without 22q11.2DS, but not in our comparison of individuals with 22q11.2DS. Similar trends were observed for TULP4 among our gene-level comparisons.
Several of the other top genes from our gene set analyses that are thought to be related to cardiogenesis but did not achieve genome-wide significance may still represent good candidates for CTD-NRGVs, either among individuals with or without 22q11.2DS. For example, including INPP4B, 8 of the 26 suggestive genes had a p-value < 0.05 in both individual cohorts as well as a lower p-value in the combined cohort than in either individual cohort. Of these eight genes, a potential candidate for future work is EVX1, which is an agonist of cardiogenic mesoderm formation [27]. Another, PTCH1, is involved in TGF-beta, Wnt, and SHH signaling, which are all thought to be involved in secondary heart field development [28]. Specifically, PTCH1 encodes the main receptor for sonic hedgehog, which is required for normal development of the cardiac outflow tract, and SHH signaling is a major candidate pathway for CTD-NRGVs, both in patients with and without 22q11.2DS [29]. Heart abnormalities and open neural tube defects are also present among mice with homozygous Ptch1 mutations, which are embryonically lethal [30].
Among the 14 gene sets from the rCNV Reactome pathways identified by Xie et al. [10], there were associations between the Gene Silencing by RNA gene set and CTD-NRGVs without 22q11.2DS, as well as between the ECM-receptor interaction gene set and CTD-NRGVs with 22q11.2DS, though these associations would not be significant after considering a Bonferroni correction for the number of gene set comparisons. Disruption of genes involved in the composition and remodeling of the extracellular matrix (ECM) of the developing heart can result in cardiac malformations [31]. Although no gene sets had p < 0.05 for both the comparison of individuals with 22q11.2DS and the comparison of individuals without 22q11.2DS, the p-value for all pathways combined in the comparison among individuals with 22q11.2DS + without 22q11.2DS was low (p = 0.016), as well as smaller than that from the other two separate comparisons. This may provide some suggestive further evidence of genetic overlap between the etiologies of these defects that involves not only rCNVs [10] but also more common genetic variation.
Though our analyses did not detect strong evidence for genetic similarities between deleted and non-deleted conotruncal defects with CTD-NRGVs, it may be that our sample was not sufficiently powered to detect modest associations. Our restriction to variants present in all five cohorts (i.e., for variants outside of the 3 Mb 22q11.2 region) also may have resulted in the elimination of SNPs or genes than could have been associated within subsets of the five cohorts. Further, if there are heterogeneous genetic effects between subtypes of conotruncal defects with CTD-NRGVs, more homogeneous subgroups may be more helpful to focus on in future work (e.g., tetralogy of fallot), though sub-group analyses were beyond the scope of these analyses and require larger samples. Similar to other genome-wide studies, we conducted a number of comparisons and used a Bonferroni correction within but not across each analytic group.
Strengths of this study include access to data from individuals with and without 22q11.2 deletions, and the use of case-parent trio samples, which allow for analyses that do not require external controls and are robust to potential bias related to population stratification.

Conclusions
In summary, we report on a number of potential candidate regions for CTD-NRGV, both among individuals with and without 22q11.2DS. We did not observe strong evidence of overlap in associations involving common variants between these two groups, and more work is needed to evaluate other forms of genomic variation, as well as phenotypic subgroups.