Network Approaches to Integrate Analyses of Genetics and Metabolomics Data with Applications to Fetal Programming Studies

The integration of genetics and metabolomics data demands careful accounting of complex dependencies, particularly when modelling familial omics data, e.g., to study fetal programming of related maternal–offspring phenotypes. Efforts to identify genetically determined metabotypes using classic genome wide association approaches have proven useful for characterizing complex disease, but conclusions are often limited to a series of variant–metabolite associations. We adapt Bayesian network models to integrate metabotypes with maternal–offspring genetic dependencies and metabolic profile correlations in order to investigate mechanisms underlying maternal–offspring phenotypic associations. Using data from the multiethnic Hyperglycemia and Adverse Pregnancy Outcome (HAPO) study, we demonstrate that the strategic specification of ordered dependencies, pre-filtering of candidate metabotypes, incorporation of metabolite dependencies, and penalized network estimation methods clarify potential mechanisms for fetal programming of newborn adiposity and metabolic outcomes. The exploration of Bayesian network growth over a range of penalty parameters, coupled with interactive plotting, facilitate the interpretation of network edges. These methods are broadly applicable to integration of diverse omics data for related individuals.


Introduction
Fetal programming describes the series of fetal responses to in utero stimuli, including the maternal genetic and metabolic milieu, during critical phases of cellular differentiation that may substantially impact fetal development [1]. According to the Barker hypothesis, these programmed adaptations may be the origin of a number of diseases in later life [2]. Associations between maternal metabolic phenotypes, including glycemia and body mass index, with newborn adiposity are well documented, e.g., in the Hyperglycemia and Adverse Pregnancy Outcome (HAPO) study findings [3,4]. Related investigations using high-dimensional omics data for HAPO mothers and their offspring have provided some clarity around the mechanism of these associations, and include investigations relating maternal genetics and maternal metabolic phenotypes [5], maternal metabolomics and maternal metabolic phenotypes [6], maternal metabolomics and newborn outcomes [7], fetal genetics and newborn outcomes [8,9], and cord blood metabolomics and newborn outcomes [10]. These analyses have tended to focus on pairs of maternal or newborn predictors and outcomes in HAPO and other cohorts, with only limited attempts at integrating multi-omics data [11].
To provide a unified framework for more fully interrogating multi-omics fetal programming contributions that underlie maternal-offspring phenotype associations, we adapt a Bayesian network modeling approach. A Bayesian network is a graphical probabilistic model defined from a set of variables and their conditional dependencies via a directed acyclic graph (DAG). This analytic strategy has been successfully deployed for other analyses of omics data [12][13][14][15][16]. The unique idea behind Bayesian networks is to estimate a series of directed relationships among nodes in which the distribution of a variable represented by a 'child' node is described conditional on its 'parent' nodes, with a penalty term 'λ' typically applied to induce model sparsity [17]. Ordering of relationships is particularly relevant in multi-omics data analyses. For example, genotypes can affect metabolite levels, but metabolite levels do not affect genotype. In maternal-offspring studies, maternal genotype affects offspring genotype, but not vice versa. Implicit ordering of maternal/newborn metabolite associations is also often identifiable. Directionality of possible associations may be explicitly defined for Bayesian network models, thus accommodating the natural ordering of maternal-offspring and genetic-metabolomics data. Using this cohesive Bayesian network approach, we identify overall trends in the relative contribution of genetics and metabolomics to maternal-offspring phenotypic associations. Additionally, we uncover potential ordered multi-step mediation pathways that are independently validated. This comprehensive multi-omics analytic pipeline for correlated maternal-offspring data provides an alternative to common analytic strategies that emphasize direct associations between omics data sources and phenotypes.
Interpretation and utilization of this scale of integrated omics analysis demands sophisticated graphical display that is accessible to multidisciplinary teams of investigators. To complement Bayesian network estimation, we apply visualization techniques that demonstrate network growth over a range of penalty parameters for the model. In so doing, we observe the varied strengths of multi-omic associations with the HAPO maternal-offspring phenotypes of interest. The approach to comprehensive network visualization is complemented by the ability to highlight specific ordered paths of interest when undertaking strategic analysis of potential mediation mechanisms.

Significant Variant-to-Metabolite Associations
Across 5.2 million genotyped and imputed SNPs and 331 metabolites (43 unique fasting and 1-hr metabolites from a 2-hr 75g OGTT conducted during HAPO at~28 weeks' gestation and/or metabolites from cord blood collected at delivery), a total of 510 unique SNPs had p < 10 −7.5 in per-metabolite GWAS analyses. Forty-three pairs of maternal and newborn SNPs remained after LD trimming (Supplemental Table S1). Notable identified associations included APOA5 (rs2072560 and rs2266788) with fasting maternal triglycerides, GCKR with fasting and 1-hr maternal 2-hydroxybutyric acid and maternal 1-hr lactate, and SLC16A9 (rs1171616, rs1171617, and rs1171619) with fasting and 1-hr maternal AC C2 ( Figure 1). The strongest association found was CPS1 (rs1047891 and another SNP 2500 base-pairs away) with fasting and 1-hr maternal glycine. Three newborn SNPs (rs7601356, rs2286963, and rs3764913) in the ACADL region and one newborn SNP (rs3738934) in the RPE region were highly associated with cord blood AC C8:1-OH/C6:1-DC. Forty-nine newborn SNPs in the REV3L region were highly associated with cord blood NEFA. The 43 pairs of maternal and newborn SNPs listed in Supplemental Table S1 were included as variables for estimating Bayesian network models (BNMs). All significant GWAS results are listed in Supplemental Table S2.

Bayesian Network Model (BNM)
Under the highest penalty term value of λ = 15 (i.e., the most stringent penalty parameter and thus the sparsest network), the BNM involving fasting maternal glucose and metabolites estimated edges among a network involving 187 nodes with at least one incident edge and 137 edges. As λ values decreased, the number of nodes in each estimated BNM with at least one incident edge and the total number of edges increased, culminating in a BNM involving 309 nodes and 3070 edges for the lowest value of λ = 1 (i.e., the least stringent penalty parameter and thus the fullest network). Similarly, for the BNM involving 1-hr metabolites and 1-hr maternal glucose, the most stringent penalty of λ = 15 resulted in a BNM with 180 nodes and 132 edges and grew to 306 nodes and 3033 edges at λ = 1. Supplemental Files S1 and S2 demonstrate network growth for networks that were estimating using fasting and 1-hr maternal metabolites, and Table 1 catalogs the numbers of edges connecting nodes of different types at various values of λ. NP-NP 1 (0%) 1 (0%) 1 (0%) 4 (0%) 5 (0%) λ = 15 most stringent penalty, λ = 1 least stringent penalty λ = 10 first occurrence of genotype to non-genotype edge λ = 7 first occurrence of maternal exposure phenotype to offspring outcome phenotype λ = 2 first occurrence of SMM pathway evaluated from maternal fasting AC C2 to cord glucose outcome  Dynamic animation of both the fasting and 1hr BNMs were generated to better visualize areas of increasing edge density over the range of penalty parameters (Supplemental Files S1 and S2). Nodes were colored to indicate whether they represent phenotypes, genotypes, and metabolites, with different colors for each metabolite compound class. Node shapes were used to indicate whether they represent maternal or offspring variables. As evidenced in Table 1 and in the Supplemental File S1 video animation, most BNM edges exist among maternal metabolites and cord blood metabolites under the most stringent penalty, with edges primarily existing within metabolites of the same class. Connections increase between the maternal and cord blood metabolites before maternal exposure phenotypes and newborn outcome phenotypes connect to the BNM at λ = 7 for both the maternal fasting and 1-hr BNMs. While maternal-offspring genotypes are pairwise connected by edges under the most stringent penalty as would be expected, the first pairwise edge between a genotype node to a non-genotype node appears at λ = 10 for the fasting BNM and λ = 9 for the 1-hr BNM, where in both instances rs715 from CPS1 connects to glycine. The CPS1-glycine association is well-documented in multiple GWAS studies [18][19][20][21]. Substantial connections from genotype nodes to the larger network do not occur until far less stringent λ values are used BNM. This network growth phenomenon suggests that fetal programming mechanisms are likely to be more substantially informed by environmentally driven metabolic contributions than strictly inherited germline genetic variation that may predispose an individual to certain phenotypes.

Serial Mediation Model (SMM)
For the fasting and 1-hr BNMs with the least stringent penalty values at λ = 1, all shortest paths from a maternal feature to a newborn outcome (sum of skinfolds (SSF), birthweight, cord C-peptide or cord glucose) were identified. Any path that was a subpath of a larger path was not considered. In total, 3000 unique directed paths from a maternal feature to a newborn outcome were identified in the fasting BNM and 1438 unique paths were identified in the 1-hr BNM. The number of nodes ranged from 3 to 8 in the paths extracted from both the fasting and 1-hr BNMs.
For all identified 4438 different pathways, SMM analyses were performed as described. We specifically examined pathways for which the first node was treated as the independent 'exposure' variable and the last node, a newborn outcome (i.e., SSF, birthweight, cord glucose or cord C-peptide), was the dependent 'outcome' variable, and all nodes in between were treated as potential mediators, i.e., individual mediators for paths with three nodes or serial mediators for paths with four or more nodes. For the 3000 candidate pathways from the fasting BNM, nine demonstrated a statistically significant IDE after false discovery rate (FDR) adjustment and maintained a positive proportion mediated (PM) in the training data set. For the 1438 candidate pathways from the 1-hr BNM, 22 demonstrated a statistically significant indirect effects (IDE) after FDR adjustment and maintained a positive PM in the training data set. When SMM analyses were repeated in the testing dataset, statistically significant serial mediation was validated for five of these pathways (Figures 2 and 3, Table 2). Notably, only one of the pathways has a genetic component, consistent with previous observations of the stronger metabolomics dependencies in this HAPO data set.  Table 2). Notably, only one of the pathways has a genetic component, consistent with previous observations of the stronger metabolomics dependencies in this HAPO data set.

Discussion
Integrated modeling of multi-omics data demands careful specification of complex dependencies. For HAPO genetics and metabolomics data, the expected dependencies between observations for mother-offspring pairs merits additional consideration, particularly given the potential for investigating mechanisms underlying fetal programming of related clinical phenotypes in mothers and their newborns using these unique data resources. Recent efforts to identify 'genetically determined metabotypes' using classic GWAS approaches in independent individuals identified SNP-metabolite associations with larger effect sizes than typically observed for clinical phenotypes [19,20,[22][23][24]. Extensions to GWAS, such as phenotype set enrichment analyses to identify SNPs associated with multiple metabolites [25] and informatics resources to investigate purported metabotypes [26][27][28], have also proven useful for probing the genetics and metabolomics of complex disease. However, to accurately characterize genetic and metabolic contributions to fetal development, more sophisticated models are required to integrate metabotypes with maternal-fetal genetic dependencies and metabolic profile correlations. BNMs offer a natural paradigm for this type of data integration due to the ability to naturally incorporate structured, ordered dependencies among different types of data while estimating edges.
A key finding upon global review of our integrated analyses is the relative strength of coordinated behavior among classes of maternal and newborn metabolites compared to all other possible dependencies among the variables included in the estimated BNMs. While we may expect high dependency within maternal and offspring metabolites, the relative density of edges connecting maternal and offspring metabolites under the strictest penalty parameters highlights the tight coordination between the metabolisms of a mother and the developing fetus. Of note, network edges between maternal and offspring metabolites and the maternal and offspring clinical phenotypes emerge prior to any edges between the paired clinical phenotypes (e.g., maternal glucose to cord blood glucose) and shared maternal/offspring genetics. This observation suggests that environmentally derived or influenced metabolomic contributions may outweigh the genetic underpinnings of fetal programming for size and adiposity at birth.
Complementing the big picture linking of metabolomics and genetics contributions to maternal/offspring phenotypes is the specific pathway-level interpretation of possible mediating mechanisms. Serial mediation analyses informed by comprehensive BNM estimation indicated biologically plausible mediation mechanisms. In the pathway from 1-hr tyrosine to cord glucose via cord blood tyrosine, phenylalanine and propionyl carnitine (C3), the BNM suggests that higher levels of maternal 1-hr tyrosine indirectly result in lower cord glucose and revealed the plausible intermediate pathways linking these two. It is known that phenylalanine is an essential amino acid and is metabolized to tyrosine when the tyrosine level is low. However, if maternal tyrosine is high and is transported across the placenta leading to sufficient fetal tyrosine, then the conversion of phenylalanine to tyrosine is reduced. C3 is formed from amino acid metabolism but primarily branch chain amino acids (BCAA). Previous studies have shown that the aromatic acids, phenylalanine, tyrosine, and the BCAA, are associated with insulin resistance in adults, including in pregnancy [11]. Thus, it is plausible that they could be involved in fetal glycemic regulation. For the pathway from SLC16A9 to cord C-peptide via cord blood acetylcarnitine (C2) and hexadecenoylcarnitine (C16:1), SLC16A9 is a transporter of monocarboxylates that has been associated with C2 levels [29]. Elevated C2 levels in the mother may affect C2 in the fetus through placental transfer. While the direct relationship between C2 and C16:1 is unclear, both C2 and C16:1 levels are higher in newborns with extreme macrosomia [24] and higher levels of C16:1 have been noted in newborns of mothers with GDM [30]. Notably, the biological plausibility of these purported SMM pathways is based on observations in adult metabolism. Our findings when jointly examining maternal and newborn data with the described dependencies suggest similar pathway involvement very early in life. Furthermore, examination of serial pathways in the BNM highlights that while maternal glucose is typically emphasized when studying newborn adiposity and glucose/insulin regulation, in fact multiple maternal metabolic factors, including metabolites that may cross the placenta, are part of the fuller picture.
Penalty parameter selection for sparse estimation of edges in large scale networks is a topic of focused discussion and ongoing bioinformatics research. In contrast to selecting a single penalty parameter that optimizes specific criteria for estimation of one BNM of our integrated data, we opted to view and animate BNMs over a range of penalty parameters. This approach to visualization, along with color distinctions for node types, naturally facilitated the observation of the relative centrality of maternal/offspring metabolite dependencies compared to shared genetics for explaining dependencies between clinical phenotypes that were specifically investigated in HAPO. For other studies involving multi-omics data, exploring network growth over decreasing stringency requirements may illuminate the relative strengths and cooperativity of different types of omics features.
There are limitations to our analytic approach. In our BNM analyses, we blocklisted potential edges from offspring SNPs to maternal phenotypes and fasting/1-hr maternal metabolites such that no such edge could be included in the estimated models. However, there is evidence that offspring genotype can have limited influence maternal phenotypes [31,32]. Implementing the specified blocklist prevented the discovery of such rare occurrences with our data. We chose this condition to reduce noise from these edges for BNM estimation. A second limitation pertains to the examination of serial mediation in the BNM as independent paths. As shown, there are two purported mediation pathways from maternal 1-hr tyrosine to cord glucose. Another approach on this specific pathway would be to use a combination of both parallel and serial mediation modeling. In addition, we focused only on serial mediation paths that started at a maternal variable and ended at a newborn outcome. Other routes of investigation for different combinations of pathway starting and end points may yield further insight about mediating mechanisms.
In summary, Bayesian network modeling and animated visualization over a range of penalty parameters provided a natural analytic framework for the cogent synthesis of genetics and metabolomics data related to clinical phenotypes for HAPO mothers and their newborns. This approach facilitated both large-scale summaries of the relative contribu-tions of data types, as well as local interrogation of potential serial mediating pathways. These pathways may suggest unique lines for further laboratory and/or population-level investigation, potentially incorporating other edges from the BNM that influence the linear serial mediating pathway of primary interest. Beyond the HAPO study, the approaches described here can be adapted to integrate omics data for other studies that are designed to evaluate complex contributions that may underlie clinical phenotypic associations.

Hyperglycemia and Adverse Pregnancy Outcome Study
The HAPO Study was an observational, population-based study conducted from 2000-2006 at 15 international field centers that recruited over 25,000 participants [3]. Enrolled women underwent a 75-g oral glucose tolerance test (OGTT) at approximately 28 weeks' gestation after an overnight fast. Fasting, 1-hr, and 2-hr glucose levels were measured as described [33,34]. Maternal blood samples during the OGTT were placed on ice immediately, processed within 60 min of collection, stored at −20 • C or −80 • C for 1-6 weeks, shipped on dry ice to the HAPO Central Laboratory, and stored at −80 • C. Each participant's age, gestational age at time of OGTT, family history of diabetes and hypertension, parity, and history of cigarette smoking and alcohol use were obtained by questionnaire. Cord blood samples were obtained within 5 min of delivery and before delivery of the placenta, processed within 30 min, and stored at −70 • C until laboratory assays were performed. Cord blood C-peptide, birthweight, and sum of skinfolds (SSF) were measured as described using calibrated equipment and standardized methods [33][34][35]. The HAPO protocol was approved by each field center's institutional review board, and participants gave informed consent.

Metabolomics Data
Conventional clinical and targeted metabolites were measured in 3463 maternal fasting and 1-hr serum samples [36] and in 1600 offspring cord blood serum samples [10,37]. Conventional clinical metabolites (triglycerides, nonesterified fatty acid (NEFA), lactate, glycerol, and 3-hydroxybutyrate) were measured using standard enzymatic chemistries on a Beckman-Coulter Unicell DxC 600 clinical analyzer. Targeted panels of amino acids and acylcarnitines were analyzed by flow-injection, electrospray-ionization tandem mass spectrometry (MS) and quantified by isotope or pseudo-isotope dilution using a Waters TQ triple quadrupole MS, equipped with an Acquity liquid chromatography system, and with data handling in the MassLynx 4.1 environment (Waters Corporation, Milford, MA, USA). Five conventional clinical metabolites, 15 amino acids, and 45 acylcarnitines were analyzed across maternal fasting, 1-hr, and offspring cord blood. Targeted metabolites with >10% missing data across all samples were excluded from analysis, yielding a total of 57, 56, and 48 targeted metabolites for analysis in maternal fasting, 1-hr, and offspring cord blood, respectively.
Non-targeted assays were performed using gas chromatography-mass spectrometry (GC-MS) as described [38]. Briefly, maternal fasting, 1-hr, and offspring cord blood serum samples were extracted with methanol that was spiked with a retention time-locking internal standard of perdeuterated myristic acid. After methanol extraction, drying and derivatization by methoximation and trimethylsilylation, samples were run in daily batches of matched sets of fasting and 1-hr maternal OGTT sera and cord blood sera on a 7890B GC/5977B MS (Agilent Technologies, Santa Clara, CA, USA). Each batch also included multiple injections of a quality control (QC) sample and a process blank. QC samples consisted of uniform pools generated from aliquots of all sera in the study and were injected at the beginning, middle and end of each batch. GC-MS peaks were deconvoluted with AMDIS freeware [39] (http://www.amdis.net/ accessed on 22 June 2021 [40]) and annotated using the Agilent Fiehn GC-MS Metabolomics RTL Library [41] with additions from the laboratory at Duke University School of Medicine. Only annotated peaks were included for analysis. Integrated peak areas were log 2 transformed. QC data were then used to account for technical variability on a feature-specific basis, with batch correction and normalization performed as described using the metabomxtr R package (version 1.26.0) [42,43]. Non-targeted metabolites with >20% missing data across all samples were excluded from analysis. Remaining non-targeted metabolites with <20% missing data were imputed using metabolite-specific minimum values since, in general, non-targeted metabolite missing values are attributable to low abundance. Non-targeted metabolite measures that were also represented in the targeted assays were excluded from analysis. In total, 55, 53, and 47 non-targeted metabolites were included in the final analyses from the maternal fasting, 1-hr, and offspring cord blood samples, respectively.

Genotype Data
Genotyping and QC for HAPO data have been described for an initial set of 2581 maternal and newborn Afro-Caribbean and 1615 maternal and newborn Mexican American DNA samples (Illumina HumanOmni1M-Duo v3B SNP array, Illumina, San Diego, CA, USA), 3152 maternal and newborn Northern European samples (Illumina Human610-Quad v1B SNP array, run at the Broad Institute, Cambridge, MA, USA), and 2466 maternal and newborn Thai samples (Illumina HumanOmni-Quad v1-0B SNP array, run at the Center for Inherited Disease Research, Baltimore, MD, USA) [5,9].
In addition, a subset of 2680 Northern European maternal and newborn samples were run using the Illumina Human 610 Quad v1 B SNP array at the Broad Institute Center for Genotyping and Analysis (CGA), USA [44], and a total of 5564 maternal and newborn transethnic samples were processed on the Illumina Global Screening Array-24 v2.0 A1 following agreed-upon protocols of the Gene, Environment Associations Studies consortium [45]. Exclusion criteria for samples and SNPs included inconsistent self-report of sex at birth with available chromosomal data, chromosomal anomalies, unintended sample duplicates, sample relatedness based on genetic relatedness, low call rate (samples with missing call rate >1% and SNPs with missing call rate >2%), SNPs with >3 Mendelian errors, departures from Hardy-Weinberg equilibrium (< 1 × 10 −4 for the original Afro-Caribbean, Mexican American, Northern European, and Thai groups, <1 × 10 −6 for the additional 2680 Northern European group, and <1 × 10 −3 for the transethnic group), duplicate discordance, sex differences in heterozygosity, low minor allele frequencies (<0.01) and/or low imputation quality score (<0.75).
Genotype imputation was performed separately on the twelve datasets (six maternal and six fetal) on the TOPMed Imputation Server using Minimac4 (version 1.5.7) [46] and the TOPMed reference panel for the Afro-Caribbean, Mexican American, Northern European, Thai, Northern European and transethnic samples [47]. Consistent strand assignments between the reference dataset and the QC-cleaned and filtered datasets were ensured using the strand-checking utility of McCarthy Group Tools. Strand was corrected and/or SNPs were removed where strandedness could not be resolved. The HAPO genotype was then imputed to the TOPMed reference panel, using an r 2 filter of 0.3 to remove unreliably imputed SNPs. A total of 5,237,318 overlapping SNPs were analyzed in a mega-analysis across the twelve imputed datasets. Principle components (PCs) were estimated separately for maternal and offspring data using the package SNPRelate version 1.26.0 in R [48].

Combined Genotype, Metabolomics and Phenotype Training and Validation Data Sets
For analyses reported here, 1385 HAPO mother/offspring pairs with all of the following data types were retained for analysis: (1) maternal genotype, (2) offspring genotype, (3) maternal fasting and 1-hr targeted and non-targeted metabolomics, (4) offspring cord blood targeted and non-targeted metabolomics, (5) maternal fasting and 1-hr glucose measurements and maternal BMI from the HAPO OGTT at~28 weeks' gestation along with additional maternal data used for model adjustments, and (6) offspring sum of skinfolds, birthweight, cord glucose, and cord C-peptide measured at delivery. Two-thirds of these samples (n = 924) were used as a training data set for Bayesian network model development and selection of potential mediation pathways. The remaining one-third (n = 461) were used as a validation data set for confirming statistical significance of purported mediation findings. Sample selection for training and validation data sets was stratified by ancestry group based on genotype data.

Variant-to-Metabolite Associations
In order to reduce computational burden when estimating Bayesian networks, an initial round of 'metabotyping' was performed, i.e., genome wide association (GWAS) analyses were performed separately for all metabolites as outcomes [22,26]. Metabotyping was performed on the combined training and validation data sets using a mega-analysis linear regression approach with SNPTest v2.5 against an additive genotype variable, separately for maternal fasting and 1-hr metabolites with maternal genotype, and offspring cord blood metabolites with offspring genotype. Maternal fasting metabolite models were adjusted for the first 3 principal components of genetic ancestry, maternal age, gestational age, BMI, height, parity (0/1+), mean arterial pressure and fasting plasma glucose (1-hr glucose for 1-hr metabolites) at HAPO OGTT, and sample storage time. Newborn cord blood metabolites were adjusted for the first 3 principal components of genetic ancestry, maternal age, BMI, height, parity (0/1+), mean arterial pressure at OGTT, gestational age at delivery, newborn sex, and sample storage time. SNPs with nominal p < 10 −7.5 were selected for further analysis. This selection threshold was set to be slightly more inclusive than typical genome-wide significance of nominal p < 10 −8 , anticipating that the penalized network modeling approach would aid in the selection the most informative SNPs. Each metabolite with significant SNP associations was LD trimmed independently using the package SNPRelate version 1.26.0 in R [48].

Bayesian Network Modeling (BNM)
We estimated BNMs including the following variables as nodes in the network: maternal exposure variables, namely BMI at OGTT, log 10 fasting plasma glucose at OGTT, square root of 1-hr plasma glucose at OGTT (fasting and 1-hr glucose transformations were consistent with previous GWAS analyses of maternal phenotypes [5]), the 43 SNPs identified in preliminary metabotyping, 117 fasting maternal metabolites, 114 1-hr maternal metabolites, 100 cord blood metabolites, and a set of newborn anthropometric outcomes (birthweight, square root sum of skinfold, log 10 cord glucose, and log 10 cord C-peptide; data transformations were consistent with previous GWAS analyses of these newborn phenotypes [9]). A simple schematic illustrating the types of ordered relationships that might be estimated among maternal and fetal genotypes and metabolites and a corresponding set of conditional dependencies that are central to BNM estimation are illustrated in Figure 4. BNMs were estimated using the training data set, as specified above. We used a fast, score-based method that uses sparse regularization and block-cyclic coordinate descent approach to build these BNMs [17]. Conditions on edge directionality were specified using a 'blocklist', with the following edge types omitted from the range of potential estimated edges: offspring SNPs to maternal SNPs; newborn outcomes to maternal SNPs; cord blood metabolites to maternal SNPs; offspring SNPs to maternal phenotypes; offspring SNPs to fasting/1-hr maternal metabolites; newborn outcomes to offspring SNP; cord blood metabolites to offspring SNPs; maternal phenotypes to offspring SNPs; fasting/1-hr maternal metabolites to maternal SNPs; fasting/1-hr maternal metabolites to offspring SNPs; and SNPs from one chromosome to SNPs in different chromosomes. A vector of λ values was used in the solution path, starting at λ = 15 and decreasing to λ = 1, and networks were estimated across a range of these regularization penalty parameters. Separate BNMs were estimated for fasting maternal metabolites/fasting maternal glucose and for their 1-hr counterparts. Since metabolism at the fasting and 1-hr post-glucose timepoints may reflect different components of glucose metabolism, we chose to model data from the fasting and 1-hr timepoints separately.

Serial Mediation Model (SMM)
Shortest directed paths in the estimated Bayesian network under the lowest (i.e., least stringent penalty parameter) were identified as possible candidates for tion analyses using shortest path identification functionality in the igraph R packa In particular, we focused on shortest directed paths connecting a maternal SNP, fa hr metabolite, or maternal phenotype as a starting point along a directed path to born outcome as the end point of the path. These paths were interpreted as pu mediation pathways connecting a maternal exposure to a newborn phenotype a potentially informative for investigating mechanisms of fetal programming. A containing 3 or more nodes were examined using serial mediation model (SMM) with structural equation modeling using the lavaan R package version 0.6-11 [50]. rial mediator Model 6 from Model Templates for PROCESS for SPSS and SAS [ adapted to test statistical significance of serial mediation through different pathw volving maternal genetics, maternal fasting/1-hr metabolomics, maternal phenoty spring genetics, and cord blood metabolomics to inform a newborn outcome. Ind fects (IDE), direct effects (DE), and proportion mediated (PM) were estimated and ard errors were estimated through bootstrapping. False discovery rate adjustm applied to IDE p-values separately for the fasting BNM and 1-hr BNM paths [52] were adjusted for the following covariates at the HAPO OGTT: field center, mater gestational age, height, parity (0/1+), mean arterial pressure, sample storage ti spring sex, and gestational age at delivery. SMM analysis was on performed on th ing dataset. Those pathways with significant indirect effects Benjamini-Hochb justed p-values < 0.05) and a positive proportion mediated were evaluated for rep in the testing dataset according to nominal p-value < 0.05.

Network Visualization
We generated interactive visualizations to assist in the interpretation of BNM

Serial Mediation Model (SMM)
Shortest directed paths in the estimated Bayesian network under the lowest λ value (i.e., least stringent penalty parameter) were identified as possible candidates for mediation analyses using shortest path identification functionality in the igraph R package [49]. In particular, we focused on shortest directed paths connecting a maternal SNP, fasting/1-hr metabolite, or maternal phenotype as a starting point along a directed path to a newborn outcome as the end point of the path. These paths were interpreted as purported mediation pathways connecting a maternal exposure to a newborn phenotype and thus potentially informative for investigating mechanisms of fetal programming. All paths containing 3 or more nodes were examined using serial mediation model (SMM) analysis with structural equation modeling using the lavaan R package version 0.6-11 [50]. The serial mediator Model 6 from Model Templates for PROCESS for SPSS and SAS [51] was adapted to test statistical significance of serial mediation through different pathways involving maternal genetics, maternal fasting/1-hr metabolomics, maternal phenotypes, offspring genetics, and cord blood metabolomics to inform a newborn outcome. Indirect effects (IDE), direct effects (DE), and proportion mediated (PM) were estimated and standard errors were estimated through bootstrapping. False discovery rate adjustment was applied to IDE p-values separately for the fasting BNM and 1-hr BNM paths [52]. SMMs were adjusted for the following covariates at the HAPO OGTT: field center, maternal age, gestational age, height, parity (0/1+), mean arterial pressure, sample storage time, offspring sex, and gestational age at delivery. SMM analysis was on performed on the training dataset. Those pathways with significant indirect effects Benjamini-Hochberg adjusted p-values < 0.05) and a positive proportion mediated were evaluated for replication in the testing dataset according to nominal p-value < 0.05.

Network Visualization
We generated interactive visualizations to assist in the interpretation of BNMs over a range of penalty parameters using functions from the igraph version 1.2.4.1 [49], intergraph version 2.0-2 [53], networkDynamic version 0.10.0 [54], network version 1.15 [55,56], and ndtv version 0.13.0 [57] R packages. Graphs were animated over 15 penalty terms starting at λ = 15 and decreasing to λ = 1 to demonstrate network growth as the penalty decreased. The x-axis on the.html files that contain the network animations index the increasing density of the BNM estimates, i.e., an index of 1 on the x-axis in the animation corresponds to λ = 15, an index of 2 to λ = 14, and so on.  Table S1: Forty-three maternal and newborn SNPs (GRCh38); Table S2 Institutional Review Board Statement: The HAPO study protocol was approved by the institutional review board at each field center (protocol code 353-003 and date of approval 28 June 2000).

Informed Consent Statement: All participants gave written informed consent.
Data Availability Statement: The maternal and newborn genotype data on European ancestry, Mexican American, Thai and Afro-Caribbean proposed for use in these studies and the accompanying phenotype data are currently available through dbGaP (www.ncbi.nlm.nih.gov/gap, accessed on 1 June 2021). The remaining additional subset of 2680 Northern European and transethnic samples. metabolomic data and codes used for analyses will be made available by the authors upon request.