The Relationship of Maternal Gestational Mass Spectrometry-Derived Metabolites with Offspring Congenital Heart Disease: Results from Multivariable and Mendelian Randomization Analyses

Background: It is plausible that maternal pregnancy metabolism influences the risk of offspring congenital heart disease (CHD). We sought to explore this through a systematic approach using different methods and data. Methods: We undertook multivariable logistic regression of the odds of CHD for 923 mass spectrometry (MS)-derived metabolites in a sub-sample of a UK birth cohort (Born in Bradford (BiB); N = 2605, 46 CHD cases). We considered metabolites reaching a p-value threshold <0.05 to be suggestively associated with CHD. We sought validation of our findings, by repeating the multivariable regression analysis within the BiB cohort for any suggestively associated metabolite that was measured by nuclear magnetic resonance (NMR) or clinical chemistry (N = 7296, 87 CHD cases), and by using genetic risk scores (GRS: weighted genetic risk scores of single nucleotide polymorphisms (SNPs) that were associated with any suggestive metabolite) in Mendelian randomization (MR) analyses. The MR analyses were performed in BiB and two additional European birth cohorts (N = 38,662, 319 CHD cases). Results: In the main multivariable analyses, we identified 44 metabolites suggestively associated with CHD, including those from the following super pathways: amino acids, lipids, co-factors and vitamins, xenobiotics, nucleotides, energy, and several unknown molecules. Of these 44, isoleucine and leucine were available in the larger BiB cohort (NMR), and for these the results were validated. The MR analyses were possible for 27/44 metabolites and for 11 there was consistency with the multivariable regression results. Conclusions: In summary, we have used complimentary data sources and statistical techniques to construct layers of evidence. We found that pregnancy amino acid metabolism, androgenic steroid lipids, and levels of succinylcarnitine could be important contributing factors for CHD.


Introduction
Congenital heart diseases (CHDs) are the most common congenital anomaly affecting approximately 6-8 per 1000 live births and 10% of stillbirths. They are the leading cause of death from congenital anomalies [1]. Approximately 20% of CHD cases can be attributed to known chromosomal anomalies, gene disorders or teratogens [2]. The causes of the remaining cases are unknown. Identifying causes of CHDs is important for improving aetiological understanding and developing potential targets for intervention.
Metabolomics technologies have enabled the quantification of a large number of metabolites in a biological sample. Metabolites are small-molecule intermediates and products of metabolism. The metabolome, the complete set of metabolites in biological tissues/fluids, is influenced by both genotype and the environment, and dynamically responds to environmental influences. Analyses of maternal metabolomic profiles could identify causal mechanisms leading to CHDs [3]. Because the metabolome reflects interactions of the genomic, environmental (e.g., air pollution), behavioural (e.g., smoking) and pathophysiological states (e.g., body composition), examining associations of it with CHDs could help elucidate modifiable upstream risk factors and/or potential molecular targets for intervention to prevent CHDs.
Studies have explored maternal biomarkers and found that the offspring of women with low 25-hydroxyvitamin D (<50 nmol/L compared to >75 nmol/L) [4] and dyslipidaemic profiles [5,6] have an increased risk of CHDs. Other work has shown that higher fasting glucose levels and diabetes during pregnancy can increase CHD risk [7][8][9]; however, these studies focus on single or only a few biomarkers. Exploring the wider metabolome could provide opportunities to improve our understanding of the molecular mechanisms that underpin CHDs [3].
One previous study has explored metabolomics in maternal serum including 27 CHD cases compared with 59 controls [10]. A total of 123 (out of 174) metabolites were associated with CHD in univariable analyses, with 3 of the MS metabolites (e hydroxypropionylcarnitine, glutaconylcarnitine, and hydroxytetradecadienylcarntine) in multivariable analyses showing a high discrimination for a first trimester CHD diagnosis (area under the curve 0.98 (0.94, 0.99)). A further study used an untargeted approach to explore whether midpregnancy urinary metabolites could accurately discriminate CHD cases from the controls (N = 70 CHD cases and 70 controls) and found that two principal components separated the cases and the controls, with 4-hydroxybenzeneacetic acid, 5-trimethylsilyloxy-n-valeric acid, hydracrylic acid and propanedioic acid driving their differences [11]. Finally, a study in a Chinese population of metabolomic analyses using maternal amniotic fluid (N = 71 CHD cases) found that two metabolites (uric acid and proline) had higher concentrations in CHD affected pregnancies [12]. In summary, to date there have been few pregnancy metabolomic studies exploring the potential molecular causal mechanisms of CHD. The small number that we have identified focused on discrimination between cases and the controls and had not obtained external validation.
The aim of this study was to gain insights into the potential molecular mechanisms for CHD. To address this aim, we explored associations of maternal metabolites quantified by an untargeted mass spectrometry (MS) platform and the odds of CHD in offspring. We searched for relevant studies within The LifeCycle Project-EU Child Cohort Network [13] to identify any study with detailed, untargeted, maternal gestational metabolomic data and offspring CHD information. We identified only one cohort with relevant data in a subgroup: the Born in Bradford (BiB) cohort (N = 2605 participants; 46 CHD cases) [14,15]. Recognising that these novel data were potentially underpowered, we sought internal validation of the metabolites suggestively associated with CHD, by repeating the multivariable regression analysis within the BiB cohort for any metabolite that was measured by nuclear magnetic resonance (NMR) or clinical chemistry in larger numbers (N = 7296, 87 CHD cases (including those in the original analysis)). We subsequently searched the MR-PREG consortia studies [16,17] for cohorts with maternal genome-wide data and offspring CHD information that could be used for Mendelian randomization (MR) analyses of the associations of genetic instruments for maternal metabolites. We performed pooled MR analyses across three MR-PREG cohorts meeting our criteria (N = 38,663, 319 CHD cases) for any metabolites that were: (i) suggestively associated with CHD in the BiB cohort (p < 0.05) and (ii) had summary data in the most recent metabolomic genome-wide association study (GWAS).

Study Design and Participants
A schematic overview of the study design is illustrated in Figure 1. We excluded children of multiple births because they differ from single births for congenital anomaly outcomes [18,19]. For the multivariable metabolomic analyses, we used data from the BiB cohort, as this was the only cohort that had measures of a substantial number of metabolites reflecting a wide range of metabolic paths assessed during pregnancy and CHD outcomes [15]. We also explored internal validation of any findings with a p-value < 0.05 within the BiB study where equivalent (or near equivalent) measures to any on the MS platform were available from other sources. BiB is a population-based prospective birth cohort, including 12,453 women across 13,776 pregnancies who were recruited at their oral glucose tolerance test (OGTT) at approximately 26-28 weeks' gestation [14]. The eligible women had an expected delivery date between March 2007 and December 2010. The use of a multivariable p-value threshold of <0.05 to take associations forward into further validation analyses was appropriate as an initial screen, for a relatively rare outcome, to avoid missing potential causal effects. All the BiB participants included in the Metabolon multivariable regression analyses (N = 2605) (Step 1 (Figure 1)) were also included in the NMR validation analyses (Step 2a (Figure 1)), and 2147 had the relevant genetic data and were, therefore, included in the MR analyses.
To be included in the MR analyses, the studies and participants had to have genomewide data in the mothers and CHD data in their offspring. Three cohorts contributed to the MR analyses: BiB, the Avon Longitudinal Study of Parents and Children (ALSPAC) and the Norwegian Mother, Father and Child Cohort Study (MoBa). ALSPAC is a UK prospective birth cohort study which was devised to investigate the environmental and genetic factors of health and development [20][21][22]. Pregnant women resident in Avon, UK with expected dates of delivery 1 April 1991 to 31 December 1992 were invited to take part in the study. The initial number of pregnancies enrolled was 14,541 (and for these at least one questionnaire was returned, or a "Children in Focus" clinic had been attended by 19/07/99). Of those initial pregnancies, there was a total of 14,676 foetuses, resulting in 14,062 live births and 13,988 children who were alive at 1 year of age. MoBa is a population-based pregnancy cohort study conducted by the Norwegian Institute of Public Health [23,24]. The participants were recruited from all over Norway from 1999-2008. The women consented to participation in 41% of the pregnancies. The cohort included approximately 114,500 children, 95,200 mothers and 75,200 fathers. The current study is based on 12 of the quality-assured data files released for research in 2019. The establishment of the MoBa and initial data collection was based on a license from the Norwegian Data Protection Agency and approval from The Regional Committees for Medical and Health Research Ethics. The MoBa cohort is currently regulated by the Norwegian Health Registry Act. The current study was approved by The Regional Committees for Medical and Health Research Ethics (2018/1256).  Figure S1. Abbreviations: CHD, congenital heart disease; BiB, Born in Bradford; NMR, nuclear magnetic resonance; MR, Mendelian randomization; GWAS, genome-wise association study; ALSPAC, Avon Longitudinal Study of Parents and Children; MoBa, Norwegian Mother, Father and Child Cohort.

Sample Collection and Metabolomic Profiling in BiB
Of the 13,776 pregnancies in the BiB cohort, 11,480 had a fasting blood sample taken during the OGTT (N= 10,574 [92%] between 26-28 weeks' gestation, with the remaining women being within 11-39 weeks' gestation). The samples were taken by trained phlebotomists working in the antenatal clinic of the Bradford Royal Infirmary and sent immediately to the hospital laboratory. The metabolomics data in the BiB cohort has previously been described in detail [15]. In brief, the metabolomics analysis was performed on ethylenediamine tetraacetic acid (EDTA) plasma samples at around 26-28 weeks' gestation. The untargeted MS metabolomics analysis of over 1000 metabolites was performed at Metabolon, Inc. (Durham, NC, USA). Quality control of the metabolite data was conducted by Metabolon. The classes of metabolites include amino acids, carbohydrates, cofactors and vitamins, energy, lipids, nucleotides, partially characterised molecules, peptides, and xenobiotics. These super-pathways, as defined by Metabolon, were also further subdivided into ~80 sub-pathways. The metabolite concentrations were quantified using the area under the curve of primary MS ions and were expressed as the multiple of the median (MoM) value for all batches processed on the given day. The MoM more closely reflects the biological variation rather than technical variation between samples or an analysis platform [25]. Due to the timing of funding acquisition, the samples were sent to Metabolon in two separate batches. Dataset 1 was completed in December 2017 and included 1000 maternal pregnancy samples. Dataset 2 was completed in December 2018 and consisted of 2000 maternal pregnancy samples within a case cohort design. The selection of the participants into the two MS metabolomic datasets are shown in flowcharts in Figure S1 and have been described in detail previously [15]. Neither dataset was random given how they were selected ( Figure S1). The sampled cases of other adverse pregnancy/perinatal outcomes (that were not CHD) were removed.  Figure S1. Abbreviations: CHD, congenital heart disease; BiB, Born in Bradford; NMR, nuclear magnetic resonance; MR, Mendelian randomization; GWAS, genome-wise association study; ALSPAC, Avon Longitudinal Study of Parents and Children; MoBa, Norwegian Mother, Father and Child Cohort.

Sample Collection and Metabolomic Profiling in BiB
Of the 13,776 pregnancies in the BiB cohort, 11,480 had a fasting blood sample taken during the OGTT (N = 10,574 [92%] between 26-28 weeks' gestation, with the remaining women being within 11-39 weeks' gestation). The samples were taken by trained phlebotomists working in the antenatal clinic of the Bradford Royal Infirmary and sent immediately to the hospital laboratory. The metabolomics data in the BiB cohort has previously been described in detail [15]. In brief, the metabolomics analysis was performed on ethylenediamine tetraacetic acid (EDTA) plasma samples at around 26-28 weeks' gestation. The untargeted MS metabolomics analysis of over 1000 metabolites was performed at Metabolon, Inc. (Durham, NC, USA). Quality control of the metabolite data was conducted by Metabolon. The classes of metabolites include amino acids, carbohydrates, cofactors and vitamins, energy, lipids, nucleotides, partially characterised molecules, peptides, and xenobiotics. These super-pathways, as defined by Metabolon, were also further subdivided intõ 80 sub-pathways. The metabolite concentrations were quantified using the area under the curve of primary MS ions and were expressed as the multiple of the median (MoM) value for all batches processed on the given day. The MoM more closely reflects the biological variation rather than technical variation between samples or an analysis platform [25]. Due to the timing of funding acquisition, the samples were sent to Metabolon in two separate batches. Dataset 1 was completed in December 2017 and included 1000 maternal pregnancy samples. Dataset 2 was completed in December 2018 and consisted of 2000 maternal pregnancy samples within a case cohort design. The selection of the participants into the two MS metabolomic datasets are shown in flowcharts in Figure S1 and have been described in detail previously [15]. Neither dataset was random given how they were selected ( Figure S1). The sampled cases of other adverse pregnancy/perinatal outcomes (that were not CHD) were removed.

Confounders
In multivariable regression analyses in BiB, we adjusted for the following maternal characteristics based on their known or plausible influence on maternal metabolites and on CHD: age, ethnicity, parity, residential neighbourhood index of multiple deprivation (IMD), body mass index (BMI), smoking, and alcohol consumption. Details of the methods for how the confounders were assessed are provided in the Supplementary File S1 (Text S1).

Congenital Heart Disease Outcomes
In the BiB cohort, the cases were identified from either the Yorkshire and Humber congenital anomaly register database, which tends to pick up most cases that were diagnosed antenatally and in the early postnatal period of life, or through linkage to primary care (up until age 5), which will have picked up any additional cases, in particular those that might have been less severe and not identified antenatally/in early life [26]. All the BiB cases were confirmed postnatally and were assigned ICD-10 codes. We used the ICD-10 codes to assign CHD cases according to the European surveillance of congenital anomalies (EUROCAT) guidelines. In the ALSPAC cohort, the cases were obtained from a range of data sources, including health record linkage and questionnaire data up until age 25 following European EUROCAT guidelines [27]. In the MoBa cohort, information on whether a child had a CHD or not (yes/no) was obtained through linkage to the Medical Birth Registry of Norway (MBRN). All maternity units in Norway must notify births to the MBRN, and information on malformations is reported to the registry up to 12 months postpartum [28]. Further details on defining CHDs including the ICD codes are shown in Text S2 (Supplementary File S1).

Genetic Data
The rationale for performing the MR analyses was to further explore potential causal mechanisms by using a method, namely, MR, which has different key sources of bias to multivariable regression. Metabolites are affected by multiple disease processes as well as numerous environmental exposures; therefore, understanding the metabolic pathways implicated in CHD is nontrivial. Genetic variants are less likely to be confounded by the socioeconomic and environmental factors that might bias causal estimates in conventional multivariable regression [29], but may be biased by weak instruments, which would bias towards the confounded association, or a path from the metabolomic genetic score to CHD, for example, via horizontal pleiotropy or foetal genotype [30]. Consistent results from both increase the confidence that the result is causal. In addition, whilst we included the BiB cohort in these genetic analyses, we also had two independent studies and repeated MR analyses with only the two (non-BiB) independent cohorts.

Genotyping in Each Cohort
The ALSPAC mothers were genotyped using an Illumina Human 660K-Quad single nucleotide polymorphism (SNP) chip, and the ALSPAC children were genotyped using the Illumina Human Hap 550 Quad genome-wide SNP genotyping platform. Genotype data for both the ALSPAC mothers and children were imputed against the Haplotype Reference Consortium v1.1 reference panel, after performing the QC procedure (with a minor allele frequency (MAF) ≥1%, a call rate ≥95%, in Hardy-Weinberg equilibrium (HWE), correct sex assignment, no evidence of cryptic relatedness, and of European descent). The samples of the BiB cohort (mothers and offspring) were processed on three different types of Illumina chips: HumanCoreExome12v1.0, HumanCoreExome12v1.1 and HumanCoreExome24v1.0. The genotype data were imputed against the UK10K + 1000 Genomes reference panel, after a similar QC procedure (with a call rate ≥99.5%, correct sex assignment, no evidence of cryptic relatedness, and correct ethnicity assignment). In MoBa, blood samples were obtained from both parents during pregnancy and from the mothers and children (umbilical cord) at birth [31]. Genotyping has had to rely on several projects-each contributing with resources to genotype subsets of MoBa over the last decade. The data used in the present study was derived from a cohort of genotypes samples from four MoBa batches. The MoBa genetics QC procedure involved a MAF ≥1%, a call rate ≥95%, was in HWE, included correct sex assignment, and had no evidence of cryptic relatedness. Further details of the genotyping methods for each cohort are provided in the Supplementary File S1 (Text S3) including flow charts showing the selection of the participants ( Figure S2).

GWAS Data and SNP Selection
We aimed to construct weighted GRSs for all metabolites that were associated with CHD in the multivariable regression with a p-value < 0.05 (referred to throughout as "suggestively associated") in the multivariable regression analyses using BiB data. To do this, we cross-referenced our suggestive associations with the relevant GWAS and we used summary data from two GWAS. In the first, the authors explored the genetic associations with 174 metabolites (compared with the 923 included in our study) [32]. To ensure robust associations, the SNPs were selected from the first GWAS if they were genome-wide significant (p < 5 × 10 −8 ). Independent SNPs were defined as linkage disequilibrium (LD) r 2 ≥ 0.001 assessed at a distance of 10,000 kb, with only one SNP from a correlated (clumped) group retained (performed using the TwoSampleMR package [33]). In the second (unpublished) GWAS, the authors performed a GWAS of 913 metabolon metabolite levels using samples from the EPIC-Norfolk [34] and INTERVAL studies [35]. A total of 14,296 participants were included in a discovery set (5841 from EPIC-Norfolk; 8455 from INTERVAL) and 5698 from EPIC-Norfolk in a validation set. The authors performed exact conditional analyses to identify independent associations.

Genetic Risk Score Generation
GRSs were calculated using SNPs previously associated in largescale GWAS with the metabolites (described above) by adding up the number of metabolite-increasing alleles among the selected SNPs after weighting each SNP by its effect on the corresponding metabolite: where w is the weight (i.e., the beta-coefficient of the association of the SNP with the exposure from the published GWAS) and SNP is the genotype dosage of the exposureincreasing alleles at that locus (i.e., 0, 1, or 2 exposure-raising alleles). After matching the metabolites suggestively associated with CHDs at p < 0.05 from the multivariable regression analyses and removing indels, the selected SNPs were extracted from the imputed genotype data in dosage format using QCTOOL (v2.0) and VCF tools (v 0.1.12b) in ALSPAC and BiB, respectively. PLINK (v1.9) was then used to construct the GRS for each exposure coded so that an increased score was associated with increased levels in metabolites. In MoBa, we constructed the GRSs from the QC'd data in PLINK format. If a SNP was missing, a proxy SNP was used where available based on a r 2 > 0.8 using the European reference panel in the LDLink R package [36].

Statistical Analysis
Analyses were performed in R version 4.0.2 (R Foundation for Statistical Computing, Vienna, Austria). An analysis plan was written and uploaded to the Open Science Framework before the analyses commenced, where any subsequent changes to the analyses were documented along with the rationale [37]. We used scaled imputed data (in which missing data have been imputed and the multiple of median values transformed to standard deviation (SD) scores) which was log transformed. Any metabolite (in either dataset) where there was too little variation for meaningful analyses (defined as <440 unique values) was excluded [38]. The transformed metabolite values were converted to standard deviation (SD) units. There were 1100 and 1150 quantified metabolites included in dataset 1 and 2, respectively, with 923 of these present in both datasets.

Multivariable Regression (Metabolomic) Analyses
We used logistic regression to estimate the odds ratios (ORs) and 95% confidence intervals (CIs) of any CHD per SD higher metabolite, with and without an adjustment for confounders. As we are interested in the potential causal effects, we present confounder adjusted results throughout. The analyses were completed separately in the two BiB datasets and the results pooled using fixed-effects meta-analyses. Given that CHD is rare and binary, we accepted an uncorrected p < 0.05 (from meta-analyses) for the metabolite being suggestively associated with CHD in the offspring (but requiring further validation). We took these metabolites forward to the MR analyses.
The MS-platform used in BiB includes the measures of xenobiotics which are synthetic chemicals that are not synthesised by humans. Their presence in the circulation usually reflects endogenous exposures, such as medications and supplements. Given that these metabolites would not be present in all participants (and therefore have high missingness), many were removed (86/154 (56%)) from the dataset given the metabolite inclusion criteria of 440 unique observations mentioned above. Consequently, we performed an exploratory additional analysis using xenobiotics (N = 154) as binary variables (1 = yes, metabolite is detected in the sample; 0 = no, metabolite is not detected in the sample). We present adjusted ORs of these binary variables (any presence vs. none) with CHDs.
We sought to internally validate any of the metabolites suggestively associated with CHDs that were also measured in BiB in larger numbers using different methods. After matching the suggestive associations, we used data from the NMR platform (N = 2 metabolites) and did not use any data from the clinical chemistry measurements. More information on the BiB NMR data including the methods, QC and participant information has been described in detail previously [15].

Mendelian Randomization Analyses
We undertook MR in each of the 3 cohorts, including all BiB, ALSPAC, and MoBa participants with maternal genetic data and offspring CHD data. Logistic regression was used to estimate the OR of CHD per SD change in GRS, with an adjustment for the first 10 genetic principal components (PCs) with additional adjustments for the genetic chip, genetic batch, and imputation batch in MoBa.
The key assumptions for MR are: (i) Relevance assumption-the genetic instruments are robustly and strongly statistically associated with the exposure and relevant to the population being studied (i.e., here the pregnant women). We tested the association of the GRS of each metabolite with the metabolite levels during pregnancy in BiB dataset 2 using linear regression. (ii) Independence assumption-the IV outcome association is not confounded. Such confounding could occur as a result of population stratification. To minimise this, we adjusted the GRS-CHDs associations for the first 10 genetic PCs. We also repeated the MR analyses without the inclusion of BiB, given that BiB has a unique ethnic structure of South Asians and White Europeans. (iii) Exclusion restriction criteria-the genetic variant is not related to the outcome other than via its association with the exposure. We assessed pleiotropy by estimating the variance explained in all metabolites by each of the GRSs by undertaking the linear regression of every metabolite measured in BiB on each GRS. If the variance explained in other metabolites was similar or greater than to that explained in the candidate risk metabolite, this would suggest that there was low metabolite-specificity for the GRS and potential horizontal pleiotropic bias via the other metabolite(s). Importantly, however, this approach of testing the GRS specificity does not distinguish between vertical pleiotropy (e.g., the GRS influences the candidate metabolite which is the precursor of another metabolite that affects CHD) and horizontal pleiotropy (e.g., the GRS influences two metabolites that affect CHD independently). We also checked the consistency of the MR results when additionally adjusting for foetal genotype [30]. We performed the MR analyses separately in BiB, ALSPAC and MoBa and report the pooled results from random-effect meta-analyses for all three cohorts and fixed-effect meta-analyses for the MR analyses excluding BiB (i.e., ALSPAC and MoBa). Table 1 shows the distributions of characteristics for the women in both BiB datasets. In total, there were 2605 mother-offspring pairs with 46 CHD cases included in the BiB multivariable regression metabolomic analyses. Table S1 shows the maternal characteristics stratified by offspring CHD status. There were some differences in the characteristics between the CHD and non-CHD cases in Dataset 1 such as a higher proportion of offspring of female sex and Pakistani ethnicity. The Dataset 2 characteristics were comparable between the CHD cases and non-CHD cases. N.B., for consistency and clarity, we refer to metabolites here by their super-pathways (as defined by Metabolon). A metabolite might have a different super-pathway and chemical group. For example, N-Acetylcarnosine is a metabolite that is part of the amino acid super-pathway, but it is not an amino acid itself. Where available, we include the Human Metabolome Database (HMDB) IDs with all numerical results, which can assist the reader in finding further information on the structure and function of a metabolite. The super-pathways that included the largest proportions of the 923 metabolites were the lipids (38%), unknown (22%), amino acids (18%) and Xenobiotics (8%), with other super-paths having ≤3% of the total ( Table 2).  Of the 923 metabolites quantified in both BiB datasets, 44 (4.8%) were associated with any CHD, at p < 0.05, in the confounder adjusted pooled analyses (Figure 2). We observed suggestive effects (i.e., confounder adjusted associations reaching the p-value threshold <0.05) with several amino acids, lipids and co-factors and vitamins. There were also suggestive effects for two xenobiotics, one nucleotide, one energy metabolite and some partially characterised and unknown metabolites (Figure 2). None of the 22 peptide or 19 carbohydrate-related metabolites associated with CHD at this p-value threshold. Of the 18 lipid-related metabolites associated with CHD, 13 were positively associated (i.e., increased odds) (e.g., Glycolithocholate Sulfate: adjusted odds ratio (aOR) per SD increase in metabolite: 1.73 95% CI (1.21, 2.48)) and 5 were negatively associated (decreased odds) (e.g., Phosphocholine: aOR 0.65 (0.47, 0.90)). All but one (N-Acetylcarnosine) of the 10 amino acid-related metabolites were negatively associated with CHDs (e.g., isoleucine: aOR: 0.67 (0.49, 0.92)). Three of the four co-factors and vitamins were negatively associated, whereas one (biliverdin) was positively associated (aOR 1.41 (1.07, 1.86)). One nucleotide was negatively associated (inosine 5 -Monophosphate (Imp): aOR 0.59 (0.36, 0.99)) and one energy related metabolite was positively associated (succinylcarnitine (C4): aOR 1. 42 (1.02, 1.97)). Benzoate and Saccharin were the two xenobiotics associated with CHDs in the main analyses with both showing positive associations. The results for the associations of all metabolites (irrespective of p-value) in the unadjusted and confounder adjusted analyses from the pooled datasets, and each dataset separately are provided in Supplementary File S2 (Tables S5-S7), including HMDB IDs where applicable.

Main BiB Multivariable Regression Analyses
In the analysis treating the xenobiotics as binary variables, after removal of the metabolites with no exposed cases, there were six xenobiotic metabolites suggestively associated with offspring CHDs (Supplementary File S1: Table S2). Two out of the six showed positive associations: saccharin, which was also associated in the main analyses (adjusted odds ratio (aOR) for the presence of metabolite vs. not: 2.16 95% CI (1.02, 5.13)-an artificial sweetener) and salicyluric glucuronide (aOR: 2.27 (1.16, 4.29)-a metabolite involved in aspirin metabolism). The remaining four showing negative associations are all part of the food component/plant metabolite sub-pathway (Table S2).

Internal Validation Using NMR or Clinical Chemistry Measures of Suggestive Associations from Main Multivariable Regression Analyses
It was possible to explore 2 of the 44 metabolites suggestively associated with CHDs in the larger BiB sample. In the comparable confounder adjusted analyses, the NMR measured amino acids isoleucine and leucine were available on 7296 mothers, with 87 having an offspring with CHD. The results for these two amino acids were highly consistent between the two samples/assay methods (aOR per SD increase in the MS isoleucine 0.67 (0.49, 0.92) vs. 0.65 (0.50, 0.84) for the NMR isoleucine, and aOR per SD increase in the MS leucine 0.69 (0.51, 0.94) vs. 0.67 (0.53, 0.85) for NMR leucine).

Validating Findings with Mendelian Randomization
The distributions of offspring and maternal characteristics for the MR analyses in BiB, ALSPAC and MoBa are displayed in Table S3 (Supplementary File S1). It was possible to explore MR replication for 27 of the 44 metabolites that associated with CHD in the multivariable analyses (the other 17 were either not available in the GWAS or had genetic variants not available in the cohorts, Figure S3). All but three of the GRSs (24/27 (89%)) were associated with the corresponding metabolite during pregnancy in BiB (with R 2 values ranging from 0.3% to 34%), but for the remaining three, the associations were imprecise with wide confidence intervals that included the null (Table S4; Supplementary File S1). Of the 27 GRSs, 3 were specific for the metabolite they were instrumenting (i.e., they had the strongest association with it and little evidence of associations with other metabolites; N-acetylcarnosine, phosphocholine and succinylcarnitine). Eighteen GRSs were associated with the metabolite they were instrumenting and several others that were correlated with that metabolite (e.g., the biliverdin GRS was associated with it and also similarly with other hepatic-related metabolites). Six GRSs were more strongly associated with other (uncorrelated) metabolites than the one they were instrumenting (scatter plots for all twenty-seven GRSs are shown in Figure S4; Supplementary File S1). The six non-specific GRS were for indolelactate, glycolithocholate sulfate, isoleucine, leucine, myo-inositol and taurolithocholate 3-sulfate (the MR results for these should be treated with caution and are denoted in Figure 3B by white-filled points).
Results from the MR analyses demonstrated the same direction of (protective) effect of higher levels of the amino acids leucine, indolelactate and isoleucine on CHD (although the instruments for these metabolites were non-specific as noted above), but for the other amino acids, the MR results were either very close to the null or in the opposite direction ( Figure 3). Seven of the lipid-related metabolites that were positively associated in multivariable regression were also replicated in MR analyses (six of which were highly correlated androgenic steroid metabolites), as was the energy related metabolite, succinylcarnitine ( Figure 3). For the 11 metabolites where we considered the MR GRS analyses providing some evidence of replication and a potential causal effect, 7 of the GRSs were specific for the metabolite alone and/or also for its correlates. Individual study results and the p-values for heterogeneity are included in Supplementary File S2 (Table S8). The MR results were largely unchanged when excluding BiB from the analyses (Table S8) and when adjusting for the offspring genotype (Table S9; Supplementary File S2). carnitine (Figure 3). For the 11 metabolites where we considered the MR GRS analyses providing some evidence of replication and a potential causal effect, 7 of the GRSs were specific for the metabolite alone and/or also for its correlates. Individual study results and the p-values for heterogeneity are included in Supplementary File 2 (Table S8). The MR results were largely unchanged when excluding BiB from the analyses (Table S8) and when adjusting for the offspring genotype (Table S9; Supplementary File 2). , results from each analysis are presented on different scales; we are not attempting to quantify estimates in the MR analyses, the aim is to compare the direction of effect. The confounder adjusted associations are as above in Figure 2. The MR analyses are adjusted for the top 10 genetic principal components and genetic batches in MoBa. In Panel B, the metabolite genetic risk scores filled with white appeared to be non-specific for the metabolite we were trying to instrument (i.e., the risk score relates to several other metabolites more strongly than the specific named metabolite). The metabolites filled in black were either metabolite-specific or specific to the metabolite and other correlated metabolites (see scatter plots in Figure S4). The results were pooled using random effects meta-analyses; individual study results and p-values for heterogeneity are shown in Supplementary Table S8. * at the end of metabolite names as defined by Metabolon.     , results from each analysis are presented on different scales; we are not attempting to quantify estimates in the MR analyses, the aim is to compare the direction of effect. The confounder adjusted associations are as above in Figure 2. The MR analyses are adjusted for the top 10 genetic principal components and genetic batches in MoBa. In Panel B, the metabolite genetic risk scores filled with white appeared to be non-specific for the metabolite we were trying to instrument (i.e., the risk score relates to several other metabolites more strongly than the specific named metabolite). The metabolites filled in black were either metabolite-specific or specific to the metabolite and other correlated metabolites (see scatter plots in Figure S4). The results were pooled using random effects meta-analyses; individual study results and p-values for heterogeneity are shown in Supplementary Table S8. * at the end of metabolite names as defined by Metabolon. Abbreviations: BiB, Born in Bradford; CHD, congenital heart disease; GRS, genetic risk score; MR, Mendelian randomization; OR, odds ratio; CI, confidence interval.

Discussion
Maternal metabolism is important for healthy foetal growth and development. To our knowledge, no previous study has examined the association of detailed maternal metabolites with the risk of CHD within a causal framework. In this study, we found 44 metabolites (of 923) suggestively associated with CHD. These included metabolites related to amino acids, lipids, co-factors and vitamins, unknown molecules, xenobiotics, nucleotides and energy. In separate xenobiotics analyses, there was some evidence that metabolites related to aspirin and saccharine may increase the odds of CHD, whereas metabolites related to plant food components may reduce the odds. Two of the amino acids, where it was possible to explore replication, were replicated in BiB in larger numbers using an alternative metabolomics platform. In the MR analyses, we analysed the relationship of maternal genetic variants that are associated with metabolite levels with offspring CHD. We completed this for 27 metabolites that showed evidence of association with offspring CHD in multivariable regression analyses and that we had GWAS summary data on. There was directional consistency for 11 out of the 27 metabolites that could be explored in the MR analyses. Overall, we found that maternal amino acid metabolism during pregnancy, several lipids (more specifically androgenic steroids), and levels of succinylcarnitine could be important contributing factors to offspring CHD risk.
Nine out of the ten amino acids suggestively associated with CHDs were negatively associated, suggesting that deficiencies in certain amino acids during pregnancy could contribute to offspring CHDs. Previous research found that amino acid concentrations measured in amniotic fluid were lower in patients with CHDs [39], which is a similar pattern to what we found here. We were able to replicate our findings for isoleucine and leucine in larger numbers in BiB, which improves the confidence in these findings. The MR analyses also provided evidence to support the direction of association for these metabolites; however, the GRSs for isoleucine and leucine were non-specific, and, therefore, these results should be treated with caution until these genetic instruments have been shown to be robustly associated with the amino acid metabolites.
Eighteen of the forty-four maternal metabolites suggestively associated with CHD were part of the lipid super pathway. Previous work reported that an abnormal lipid profile (defined as elevated cholesterol and apolipoprotein B) [5], abnormal lipid metabolism (defined as a disturbance in phosphatidyl-choline and various sphingolipids and choline metabolism) [13] and high maternal blood lipids [6] are a feature of CHD pregnancies. We were able to take forward 15 (out of 18) of the lipid metabolites and replicated the direction of effect for 7. All except one of these seven replicated metabolites were androgenic steroids and, therefore, were highly correlated. Steroids are important for numerous functions during gestation, particularly for normal placental function [40]. Here we present evidence of a potential causal effect (associated in metabolomic analyses with consistent direction of effect in the MR analyses) of positive associations between maternal gestational androgenic steroid metabolites and offspring CHDs.
Levels of bilirubin and biliverdin were positively associated with CHDs. MR analyses were only possible for biliverdin but were inconclusive with wide confidence intervals; thus, these two compounds, which are involved in heme catabolism, should be further investigated for a possible role in CHD development. Levels of succinylcarnitine were also positively associated with CHDs and we found good replications in the MR analyses with consistent directions of effect and a GRS that appeared highly specific for succinylcarnitine. Succinylcarnitine is an acylcarnitine, which is a group of metabolites responsible for the beta oxidation of fatty acids and mitochondrial function [41]. It is well documented that fatty acids play an important role in embryonic and foetal development [42,43]. We included the analyses of partially characterised and unknown metabolites in our results as with the increasing evidence from genomic studies, previously unknown metabolites are having their function identified. With future studies identifying the function of some of these unknown/partially characterized metabolites, our results could shed light on the aetiology of CHDs.
A key strength of this study is the unique data that was available in BiB to support novel analyses of the associations of a wide range of maternal metabolic paths with offspring CHD risk. We were not able to identify any other study with such data; however, we realised, even before the analyses, that we would have limited statistical power with just 46 CHD cases. This motivated us to think about ways of trying to replicate any findings in larger samples, either through finding measures of the same metabolites available from other assays in larger samples or using GRSs as instruments for the metabolites. In the initial multivariable regression analyses we adjusted for potential confounders. We defined the suggestive associations based on a p-value threshold < 0.05, i.e., not taking account of multiple testing. When we applied a Bonferroni corrected threshold (p < 0.0001), none of the associations passed this (Supplementary Table S7; Supplementary File S2). Given the novel nature of this study and the use of the initial multivariable regression in BiB to select associations for further follow-up (replication and MR), we felt this was appropriate. As with any 'screening' for further analyses, we wanted to ensure that we would not miss potential causal effects. We recognise that selecting the results based on a p-value threshold is problematic, as some associations with higher p-values might have associations of a magnitude that could be clinically important, but there would also be the potential for several false positives. Additionally, we limited the MR analyses only to those metabolites that associated with a p < 0.05 rather than undertaking these analyses on all of the 923 metabolites. Our reason for this was that having searched for all studies with maternal genome wide data and offspring CHD outcomes, we identified only three cohorts and recognised that for MR analyses, the pooled results from these might also have limited power. The limited power in both the multivariable and MR analyses also meant that we could only examine associations with any CHD and not subtypes.
The MR analyses were sensitive to their assumptions that the GRS is statistically strongly associated with the metabolite in pregnancy. We examined the associations of these with pregnancy metabolite levels and were careful in our interpretation of the results in relation to this. Methods that are available for exploring potential bias due to horizontal pleiotropy in two-sample MR were not possible here. We know that many of the 923 metabolites will be biologically related to each other and with our sample size it would be difficult to robustly distinguish the effects of correlated metabolites. We explored this by examining the strength of association (proportion of the variation explained) of each of the 27 GRSs with all other metabolites available in the BiB dataset 2. Stronger or similar associations with other metabolites would suggest that the GRS was not a specific instrument for the metabolite that we were using it for. In this case this could be because of known biological relations. For example, we know biologically that many of the lipid metabolites are related to each other, and we saw this with similar proportions of variation explained by the GRS of the androgenic steroid lipid metabolites with other androgenic steroid lipid metabolites. As such, we would interpret the results for these metabolites as supporting an effect of maternal androgenic steroid metabolites on CHD, but we could not be specific about which ones were driving this. A similar or stronger variation of a GRS for other metabolites could be related to vertical pleiotropy, i.e., the metabolite for which the GRS is instrumenting strongly influences other metabolites that are related to CHD with the other metabolites partly mediating the effect of the focused metabolite. This would not bias the result; however, this could also occur with horizontal pleiotropy where the GRS, independently of the metabolite of interest, influences other metabolites that are risk factors for CHD. With our current data, we are not able to distinguish between these two.
A further limitation of this study is that the maternal plasma/serum metabolomics data were derived at a single timepoint around 26-28 weeks' gestation. Foetal cardiac development starts early in pregnancy and much of the development occurs in the first trimester [44]. Here, we are assuming that metabolite levels around 26-28 weeks' gestation are good proxies for levels in early pregnancy-when the offspring's heart is forming. Previous work has shown that between person differences throughout pregnancy remain largely consistent (i.e., those with a high level of a metabolite in early pregnancy tend to have a similarly high level of a metabolite in later pregnancy) [45]. Similarly, and worth mentioning, the effects obtained from MR studies are often interpreted as the lifetime effect of the exposure (metabolites) in question [46]. We were also not able to adjust for diet in the multivariable regression analyses. Metabolite concentrations are likely to be influenced by differences in dietary intake and there is some evidence that differences in maternal pregnancy diets may influence the risk of CHD [47]; thus, our multivariable regression analyses may be influenced by residual confounding due to diet or other factors. The selection process to include women in the BiB metabolomic multivariable regression analyses may have also influenced (in either direction) the results through selection bias. Again, where it was possible to explore replication in the larger BiB sample with the metabolites measured using NMR or clinical chemistry and/or undertaking MR, the inclusion of more of the BiB cohort and the use of two independent cohorts were useful, as these results are less likely to be influenced by selection.
In summary, we have used metabolomics and genetic data obtained during pregnancy to explore how the maternal metabolome may contribute to offspring CHDs. We found evidence that markers of maternal pregnancy amino acids and androgenic steroids during pregnancy, as well as higher levels of succinylcarnitine could be important contributing factors to offspring CHD risk. It was not possible to undertake genetic MR analyses on 17 of the potential associations from multivariable regression and further multivariable regression and/or MR in larger independent samples, but these would be valuable. Our analysis pipeline, which involved seeking replication of metabolite associations by harnessing large-scale GWAS data, provides scope to address this and to obtain more precise estimates for the 27 associations we could undertake MR on, as more cohorts with genomic and metabolomic data become available. Metabolomics could prove to be an important tool for identifying the biological pathways that may lead to the identification of prevention targets to decrease the disease burden of CHDs. To do this, future research will require the international collaboration of more and larger studies with detailed metabolomics data in pregnancy, ideally with some of these having repeat measures across pregnancy and offspring CHD data.

Ethical Approval and Consent to Participate
Ethical approval for ALSPAC was obtained from the ALSPAC Law and Ethics committee and local research ethics committees (NHS Haydock REC: 10/H1010/70). Informed consent for the use of data collected via questionnaires and clinics was obtained from the participants following the recommendations of the ALSPAC Ethics and Law Committee at the time. At age 18, the study's children were sent 'fair processing' materials describing ALSPAC's intended use of their health and administrative records and were given clear means to consent or object via a written form. Data were not extracted for the participants who objected, or who were not sent fair processing materials. For BiB, Ethics approval has been obtained for the main platform study and all of the individual sub studies from the Bradford Research Ethics Committee. Written consent was obtained from all participants. The establishment of MoBa and the initial data collection was based on a license from the Norwegian Data Protection Agency and approval from The Regional Committees for Medical and Health Research Ethics. The MoBa cohort is now based on regulations related to the Norwegian Health Registry Act.

Availability of Data and Materials
The ALSPAC data management plan (http://www.bristol.ac.uk/alspac/researchers/ data-access/documents/alspac-data-management-plan.pdf), describes in detail the policy on data sharing, which is through a system of managed open access. Scientists are encouraged to make use of the BiB study data, which are available through a system of managed open access. Please note that the study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool and reference the following webpage: http://www.bristol.ac.uk/alspac/researchers/our-data/.
Before you contact the BiB study, please make sure you have read the Guidance for Collaborators: https://borninbradford.nhs.uk/research/guidance-for-collaborators/. MoBa data are used by researchers and research groups at both the Norwegian Institute of Public Health and other research institutions nationally and internationally. The research must adhere to the aims of MoBa and the participants' given consent. All use of data and biological material from MoBa is subject to Norwegian legislation. More information can be found on the study website (https://www.fhi.no/en/studies/moba/for-forskere-artikler/ research-and-data-access/).

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/jcdd9080237/s1, Supplementary File S1: Figure S1. Illustrating the flow of participants into the Metabolon datasets in the Born in Bradford cohort, Text S1. Confounder data, Text S2. Defining congenital heart disease, Text S3. Genetic data methods, Figure S2. An overview of included cohorts and selection of study participants for Mendelian randomization analyses, Table S1: Participant characteristics for the Born in Bradford metabolomic analyses stratified by CHD status, Table S2: Logistic regression results from the presence/absence xenobiotic analysis, Table S3. Participant characteristics for the 3 studies included in Mendelian randomization analyses, Figure S3. Flow chart to illustrate the analysis pipeline and selection of metabolites for Mendelian randomization, Table S4. Characteristics of maternal GRS and associations with the corresponding metabolite in BiB Dataset 2 (N = 1326), Figure S4. Scatter plots of the variance explained by the weighted genetic risk score for each of the 27 metabolites included in MR analyses. Supplementary File S2: Table S5. Associations of maternal pregnancy metabolites measured at around 26-28 weeks' gestation with offspring congenital heart disease (Dataset 1), Table S6. Associations of maternal pregnancy metabolites measured at around 26-28 weeks' gestation with offspring congenital heart disease (Dataset 2), Table S7. Associations of maternal pregnancy metabolites measured at around 26-28 weeks' gestation with offspring congenital heart disease, Table S8. Mendelian randomization results for each study and pooled across studies, Table S9. Mendelian randomization results pooled across studies to check consistency of results when adjusting for offspring genotype.