Prenatal Phthalate Exposures and Adiposity Outcomes Trajectories: A Multivariate Bayesian Factor Regression Approach

Nguyen, Phuc H.; Engel, Stephanie M.; Herring, Amy H.

doi:10.3390/ijerph22101466

Open AccessArticle

Prenatal Phthalate Exposures and Adiposity Outcomes Trajectories: A Multivariate Bayesian Factor Regression Approach

by

Phuc H. Nguyen

^1,*

,

Stephanie M. Engel

² and

Amy H. Herring

³

¹

LinkedIn Corporation, Sunnyvale, CA 94085, USA

²

Department of Epidemiology, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA

³

Department of Statistical Science, Duke University, Durham, NC 27708, USA

^*

Author to whom correspondence should be addressed.

Int. J. Environ. Res. Public Health 2025, 22(10), 1466; https://doi.org/10.3390/ijerph22101466

Submission received: 1 June 2025 / Revised: 22 August 2025 / Accepted: 12 September 2025 / Published: 23 September 2025

Download

Browse Figures

Versions Notes

Abstract

Experimental animal evidence and a growing body of observational studies suggest that prenatal exposure to phthalates may be a risk factor for childhood obesity. Using data from the Mount Sinai Children’s Environmental Health Study (MSCEHS), which measured urinary phthalate metabolites (including MEP, MnBP, MiBP, MCPP, MBzP, MEHP, MEHHP, MEOHP, and MECPP) during the third trimester of pregnancy (between 25 and 40 weeks) of 382 mothers, we examined adiposity outcomes—body mass index (BMI), fat mass percentage, waist-to-hip ratio, and waist circumference—of 180 children between ages 4 and 9. Our aim was to assess the effects of prenatal exposure to phthalates on these adiposity outcomes, with potential time-varying and sex-specific effects. We applied a novel Bayesian multivariate factor regression (BMFR) that (1) represents phthalate mixtures as latent factors—a DEHP and a non-DEHP factor, (2) borrows information across highly correlated adiposity outcomes to improve estimation precision, (3) models potentially non-linear time-varying effects of the latent factors on adiposity outcomes, and (4) fully quantifies uncertainty using state-of-the-art prior specifications. The results show that in boys, at younger ages (4–6), all phthalate components are associated with lower adiposity outcomes; however, after age 7, they are associated with higher outcomes. In girls, there is no evidence of associations between phthalate factors and adiposity outcomes.

Keywords:

prenatal exposure; phthalates; body mass index; Bayesian statistics; factor regression; outcome trajectories; childhood

1. Introduction

The increasing prevalence of childhood obesity is a relevant public health concern. The 2017–2018 National Health and Nutrition Examination Surveys (NHANES) estimated that 19.3% of children aged 2–19 years have obesity and another 16.1% are overweight [1]. The prevalence of obesity among children and adolescents has increased more than three times in the last 30 years [2]. Obese children are at increased risk for several health conditions, including obesity in adulthood, type 2 diabetes, heart disease, arthritis, and various cancers, as well as shorter life. Studies suggest that approximately 70% of obese children face a significant risk of heart disease in adulthood [3]. Some estimates suggest that one-third of children born today (and half of Latino and black children) are expected to develop type 2 diabetes at some point in their lives [4]. Childhood obesity also has negative implications for mental health, as it is associated with an increased risk of being bullied [3].

Environmental chemical exposures, particularly exposure to phthalates, have been identified as risk factors for childhood obesity [5,6], potentially by interfering with the body’s endocrine system [7,8]. Phthalates are a group of chemicals widely used in consumer products, such as toys, food packaging, or cosmetics, that are known to have endocrine disrupting properties [9,10]. In vivo and in vitro animal studies suggest that fetal development is a potential critical window of vulnerability to phthalate exposures, which may promote obesity, and the effect may depend on gender [8,11,12]. Observational studies in humans, including birth cohort studies, further support these findings by showing associations between prenatal phthalate exposure and childhood adiposity outcomes [13,14,15,16], some reporting sex-specific effects [14,17,18].

In this paper, we investigate the time-varying health effects of prenatal phthalate exposures on childhood obesity using data from the Mount Sinai Children’s Environmental Health Study (MSCEHS) [13]. The study measured the urinary phthalate metabolites of mothers during the third trimester of pregnancy and followed the health indices of children, including adiposity outcomes, between ages 4 and 9. We would like to identify groups of phthalate metabolites critical to these adiposity outcomes and estimate their time-varying effects and any sex-specific effects using a new modeling framework: the Bayesian multivariate factor regression model (BMFR). Our approach addresses the challenge of fully quantifying the uncertainty in estimating the time-varying effects of the mixture and provides several advantages tailored to our cohort data.

Many studies have examined the time-varying effects of exposure to phthalates on adiposity outcomes or growth trajectories, but the results remain inconsistent [14,18,19,20,21,22,23,24,25]. A key limitation is that traditional statistical models, or even mixture methods not designed to estimate time-varying effects, lack full uncertainty quantification. In traditional statistical models, key modeling choices, such as the functional form of time-varying effects, are chosen or estimated first, then treated as fixed in the subsequent outcome model. For example, linear mixed-effects models or generalized estimating equations typically rely on interaction terms between a single exposure and age to model time-varying effects [21,22,23,25]. These approaches cannot consider all phthalate metabolites simultaneously due to multicollinearity, and assumptions about the functional form of time-varying relationships (e.g., linear or polynomial) are treated as fixed. Fixing these choices underestimates uncertainty, leading to confidence intervals that are too narrow and p-values that are artificially small. More advanced methods, such as growth mixture models, latent class growth models, or functional principal component analysis, can capture nonlinear growth trajectories, but they require a second-stage model to estimate associations between mixture exposures and outcome trajectories [18,24]. In such multistage analyses, outputs from earlier stages (e.g., a number of selected trajectories or factors) are treated as fixed in later stages, again leading to underestimated uncertainty. Other advanced mixture methods, such as Bayesian kernel machine regression [26], quantile g-computation [27], and weighted quantile sum regression [28], do not model nonlinear time-varying effects. Bayesian varying-coefficient kernel machine regression (BVCKMR) is another advanced mixture method that can model nonlinear effects across exposure levels on outcome trajectories [29]. However, at fixed exposure levels, BVCKMR’s estimated effects over time are limited to a functional form.

To address these limitations, BMFR applies state-of-the-art prior specifications to infer modeling decisions and fully quantify their uncertainty. The model includes variable selection priors for covariates, avoiding fixed subsets [30]; Gaussian process priors for flexible modeling of time-varying health effects without assuming a specific functional form [31]; and half-t priors for robust between-subject variances [32]. To model the exposure mixture, BMFR assumes that structured variations in correlated exposures can be attributed to a small number of latent factors that also explain part of the variations in the outcomes, while fully quantifying the uncertainty in the number of factors through the multiplicative gamma process (MGP) prior [33]. This contrasts with principal component analysis (PCA) [20], which maximizes variation in exposures without regard to outcome relevance, and structural equation models [34], which require a fixed number of factors.

Our implementation of BMFR also has custom features specific to our cohort. BMFR jointly models multiple adiposity outcomes, including BMI z-scores, waist circumference, waist-to-hip ration, and fat mass percentage, to improve estimation precision. Previous studies have generally examined these outcomes individually [14,16,17,20,21,23,35], hypothesized that mixtures affect all outcomes similarly [20], or searched for consistent results across outcomes [16,21,23]. Because these outcomes are highly positively correlated and likely reflect shared mechanisms through which phthalates influence obesity, modeling them jointly is expected to improve estimation precision. Finally, BMFR quantifies uncertainty in imputing missing data and measurements below the limit of detection (LOD). Instead of relying on standard fixed-value imputations (e.g., mean or one-half of the LOD), BMFR handles imputation within the Markov Chain Monte Carlo (MCMC) process, propagating uncertainty in imputed values into the credible intervals for the effects of interest. An R package, optimized in C++, for BMFR is freely available at https://github.com/phuchonguyen/famr (accessed on 20 August 2023). Section 2 describes in detail the data from the MSCEHS. Section 3 describes our proposed model BMFR and its prior specifications. Section 4 validates BMFR’s utility through simulation studies. Section 5 describes the analysis of time-varying health effects of prenatal phthalate exposures on childhood obesity using the MSCEHS data.

2. Data

2.1. Study Population

Between 1998 and 2002, MSCEHS recruited 479 first-time mothers with singleton pregnancies from the Mount Sinai Diagnostic and Treatment Center and two adjacent private practices in New York City. Among these women, 75 were excluded due to medical complications (n = 3), infant or fetal death (n = 2), very premature birth (before 32 weeks of gestation or <1500 g; n = 5), miscarriage (n = 1), delivery of an infant with genetic abnormalities or malformations (n = 5), inability to obtain biologic samples before delivery (n = 12), relocation or transfer to a hospital outside of New York City (n = 28), or loss to follow-up (n = 19) [13]. The final cohort consisted of 404 babies with birth data recorded. The children were invited back for three follow-up visits at ages 4–5.5, 6, and 7–9. Of the 404 babies, 382 had their mothers’ prenatal concentrations of phthalate metabolites measured in urine. Two additional observations were excluded because they had very dilute urine (<10 mg/dL creatinine) that can produce inaccurate biomarker measurements [14]. Of the 382 babies, only 180 came back for at least one follow-up visit. This results in 362 total visits for fat mass percentage, 363 total visits for body mass index (BMI), 364 total visits for waist-to-hip ratio, and 364 total visits for waist circumference. Figure 1 shows the pattern of loss to follow-up of these 382 babies. All observed outcome measurements were included in our analysis. Our analysis was a secondary analysis of the de-identified data from the MSCEHS study.

2.2. Phthalate Exposures

Mothers who were pregnant between 25 and 40 weeks (with a mean of 31.5 weeks) provided a urine sample that was analyzed by the CDC laboratory for various phthalate metabolites, including MEP, MnBP, MiBP, MCPP, MBzP, MEHP, MEHHP, MEOHP, and MECPP. The DEHP group consists of MEHP, MEHHP, MEOHP, and MECPP. To account for inaccuracies in analytical standards, correction factors of 0.72 and 0.66 were applied to MBzP and MEP concentrations and limits of detection (LOD), respectively [36]. Urinary concentration was measured using creatinine. To adjust for the dilution of urine samples, we standardize the metabolites’ concentrations by a Cratio, as well as include creatinine concentration as a covariate in the analysis as suggested by [37]. The Cratio is calculated as the ratio between predicted creatinine conditional on observed covariates of the mother including the mother’s age, mother’s BMI, mother’s gestational weight gain adequacy category, mother’s smoking status, mother’s education, mother’s race, and observed creatinine [37]. As seen in Figure 2 (Left), phthalate metabolites are highly and positively correlated with each other, especially those within the DEHP group and those within the non-DEHP group. There are four MiBP, one MEP, one MBzP, four MCPP, one MECPP, one MEHHP, one MEOHP, and 15 MEHP measurements under their respective LOD of detections in total. We impute these within the MCMC sampler as discussed later.

2.3. Adiposity Outcomes

The MSCEHS assessed the weight and body composition of the infants through bioelectrical impedance analysis using a pediatric Tanita scale at three follow-up visits scheduled at approximately 4 years, 6 years, and 7 years, though the age the children at actual visits ranges from 4 to 10 years (or 48 to 122 months). We consider the following four outcomes in our multivariate analysis: fat mass percentage (FMP), BMI z-score (BMIz), waist-to-hip ratio (WHR), and waist circumference (WC). FMP is based on fat mass estimates reported by the Tanita scale (model TBF-300; Tanita Corporation of America) and calculated as (fat mass/weight)

\times 100

. BMI is calculated as weight (in kilograms)/height (in meters)². It is then standardized by age and sex using a CDCSAS (version 9.3) macro [38] to produce BMIz. There are 362 total visits for FMP, 363 for BMIz, 364 for WHR, and 364 for WC from 180 subjects across all their follow-up visits. Figure 2 (Right) shows that these outcomes (except WHR) are highly and positively correlated with each other.

2.4. Covariates

Mothers were interviewed for 2 h during enrollment to gather covariate data. The computerized perinatal database at Mount Sinai Hospital was used to obtain pregnancy and delivery information. Gestational weight gain adequacy is calculated by dividing the observed gestational weight gain (last pregnancy weight minus self-reported pre-pregnancy weight) by the expected gestational weight gain based on the 2009 Institute of Medicine guidelines times 100 [39]. We categorize gestational weight gain as inadequate if the ratio is <86%, adequate if 86–120%, and excessive if >120%. The final baseline covariates include the mother’s age, mother’s BMI, mother’s gestational weight gain adequacy category, mother’s smoking status, mother’s education, mother’s race, whether mother breastfed, child’s sex, child’s birth weight, and creatinine concentration. The children’s age in months was also recorded at each follow-up visit and included as a covariate. Table 1 summarizes the baseline covariates of children with at least one follow-up visit included in our analysis. Most covariates have no missing values or only negligible amounts, except for maternal gestational weight gain. The baseline characteristics of the male and female subsamples are similar, with only minor differences in race/ethnicity distribution.

3. Bayesian Multivariate Factor Regression for Time-Varying Effects

3.1. Model Correlated Chemical Mixtures with a Latent Factor Model

Let

X_{i}

be a p-vector of correlated metabolite concentrations measured during the third trimester of pregnancy for subject i. We assume that variation in

X_{i}

can be attributed to

K < p

latentvariables:

\begin{matrix} X_{i} & \sim N_{p} (Θ η_{i}, Σ_{X}) \end{matrix}

(1)

\begin{matrix} η_{i} & \sim N_{K} (0, I) \end{matrix}

(2)

\begin{matrix} Σ_{X} & = d i a g (σ_{X, 1}^{2}, \dots, σ_{X, p}^{2}) \end{matrix}

(3)

where

η_{i}

is a K-vector of unobserved latent factors of subject i,

Θ

is the factor loadings matrix, and

σ_{X, 1}^{2}, \dots, σ_{X, p}^{2}

are idiosyncratic noise variances. We assume the exposures have been mean-centered and remove the intercepts. Independent priors on each

σ_{X, 1}^{2} \dots σ_{X, p}^{2}

are chosen to be those often used in factor analysis. We use the multiplicative gamma process (MGP) prior on the factor loadings to learn sparse loadings structure and infer the number of factors K [33]:

\begin{matrix} θ_{j k} & \sim N (0, ϕ_{j k}^{- 1} τ_{k}^{- 1}) \end{matrix}

(4)

\begin{matrix} ϕ_{j k} & \sim G (v / 2, v / 2), j = 1, \dots, p; k = 1, \dots, K \end{matrix}

(5)

\begin{matrix} τ_{h} & = \prod_{l = 1}^{h} δ_{l}, δ_{1} \sim G (a_{1}, 1), δ_{l \geq 2} \sim G (a_{2}, 1) \end{matrix}

(6)

3.2. Model Correlated Outcomes as a Function of Latent Factors

Let

Y_{i t} = {(y_{i t 1}, \dots, y_{i t q})}^{T}

be a vector of q outcomes at follow-up time

t = 1, \dots, T_{i}

, where

T_{i}

is the number of follow-ups with at least one measured outcome for subject i. We assume each outcome at each follow-up has been mean-centered and remove the intercepts. We also assume the variation in the outcomes

Y_{i t}

can be decomposed into the variation explained by the latent factors of

X_{i}

, the variation due to unobserved factors and idiosyncratic noise:

\begin{matrix} Y_{i t} & \sim N_{q} (B (t) η_{i} + ξ_{i}, Σ_{Y}) \end{matrix}

(7)

\begin{matrix} ξ_{i} & \sim N_{H} (0, ν^{2} Σ_{Y}) \end{matrix}

(8)

\begin{matrix} Σ_{Y} & \sim I W (s_{0}, S_{0}) \end{matrix}

(9)

where

ν^{2}

has a half-t prior with a small degree of freedom for an uninformative prior that still behaves well in the case that between-subject variance

ν^{2}

is close to zero, as suggested by [32]. Random variables

ξ_{i}

are subject-level random intercepts.

Σ_{Y}

is the residual covariance, which describes variances and covariances in the outcome due to unmeasured factors as well as random noise. We can interpret

\frac{ν^{2}}{ν^{2} + 1}

as the proportion of total residual variation explained by between-subject variation, and

\frac{1}{ν^{2} + 1}

as the proportion of residual variation explained by within-subject variation.

3.3. Model Health Effects as Flexible Functions of Time

B is a

(q \times K)

matrix of regression functions, which is of primary interest in our analysis. Element

j k

in B models the effect of the

k^{t h}

latent factor on the

j^{t h}

outcome that can vary smoothly and flexibly over follow-up times. We consider Gaussian processes as priors to learn these smooth regression functions on a discrete-time grid

t = 1, \dots, T

, where T is the total number of unique ages at which children had follow-up visits. At the same time, we want to incorporate our belief that effects across adiposity outcomes, which are driven by similar mechanisms through which phthalates interfere with the body’s hormones, should be correlated. As a result, instead of placing independent Gaussian process priors on elements of B, we adopt the following factorization:

\begin{matrix} B (t) & = Λ U (t) \end{matrix}

(10)

\begin{matrix} u_{h k} & \sim G P (0, c_{κ} (t, t^{'})), c_{κ} (t, t^{'}) = e^{- \frac{1}{2} {[\frac{(t - t^{'})}{κ}]}^{2}} \end{matrix}

(11)

\begin{matrix} λ_{j h} & \sim M G P, j = 1, \dots, p; h = 1, \dots, H; k = 1, \dots, K \end{matrix}

(12)

where

Λ

is a

q \times H

matrix, with

H \leq K

, that linearly combines H independent basis functions into elements of B. A similar factorization was used by [40], but their work focused on covariance regression. We also use the MGP prior on the basis functions loadings to learn sparse loadings structure and help infer the number of basis functions H [33]. U is a matrix of independent basis functions. We choose the Gaussian kernel

c (t, t^{'})

for all elements of U to ensure the time-varying effects are smooth functions of time with the same wiggliness encoded in a shared length scale

κ

. Since the input is on a grid, we place a uniform prior on a grid of plausible values for

κ

. Conditional on

Λ

, the

k^{t h}

column of B has a separable Gaussian process prior:

\begin{matrix} B_{k} & \sim N_{q \times T} (0, Λ Λ^{T}, C) \end{matrix}

(13)

where

Λ Λ^{T}

describes the covariance in the regression functions of factor k across outcomes, and

C_{r s} = c_{κ} (t_{r}, t_{s})

. Thus,

C o v (B_{j k}) = {[Λ Λ^{T}]}_{j j} C

, so we set the amplitude of the kernel to 1 for identifiability.

3.4. Model Linear Effects and Interactions of Covariates

Let

Z_{i t}

be a L-vector of covariates, including both baseline covariates and those collected at follow-up time t. For covariates where the linear relationships are reasonable, we can add a linear effect term to Equation (4) as follows:

\begin{matrix} Y_{i t} & \sim N_{q} (B (t) η_{i} + B^{(c)} Z_{i t} + ξ_{i}, Σ_{Y}) \end{matrix}

(14)

We endow the regression coefficient matrix

B^{(c)}

with a global-local shrinkage prior on matrix normal parameters of [30]:

\begin{matrix} B^{(c)} & \sim N_{q \times L} (0, Σ_{Y}, Ψ^{(c)}) \end{matrix}

(15)

\begin{matrix} Ψ^{(c)} & = d i a g (ψ_{1}^{(c)}, \dots, ψ_{L}^{(c)}) \end{matrix}

(16)

\begin{matrix} ψ_{l}^{(c)} | ζ_{l}^{(c)} & \sim G (u, ζ_{l}^{(c)}) \end{matrix}

(17)

\begin{matrix} ζ_{l}^{(c)} & \sim G (v, r) \end{matrix}

(18)

Shrinkage parameter

ψ_{l}^{(c)}

provides variable selection to determine if predictor l is important to all outcomes, which fits our application. Outcome-specific effects within column l can additionally shrink toward zero. The authors in [30] suggested setting the global shrinkage parameter

r = 1 / (K \sqrt{n ln n})

to satisfy sufficient conditions for posterior consistency [30]. When

u = v = 1 / 2

, this is the horseshoe prior [30,41]. A similar setup can be used for any linear interactions between the latent factors and covariates.

3.5. Imputation of Censored and Missing Data

We have very few missing outcomes at recorded follow-up visits (two missing FMP and one missing BMIz measurement out of 364 observations). In the case that it is reasonable to assume outcomes are missing at random conditional on observed covariates and birth weight data [42], we impute them during MCMC. We sample

Y_{i t, m i s}

given

Y_{i t, o b s}, ω

from a conditional multivariate normal, where

ω

are all unknown parameters:

\begin{matrix} Y_{i t, m i s} & | Y_{i t, o b s}, ω \sim N (m, V) \end{matrix}

(19)

\begin{matrix} m & = B {(t)}_{m i s} η_{i, m i s} + Σ_{Y, m i s, o b s} Σ_{Y, o b s, o b s}^{- 1} [Y_{i t, o b s} - B {(t)}_{o b s} η_{i, o b s}] \end{matrix}

(20)

\begin{matrix} V & = Σ_{Y, m i s, m i s} - Σ_{Y, m i s, o b s} Σ_{Y, o b s, o b s}^{- 1} Σ_{Y, o b s, m i s} \end{matrix}

(21)

where

B {(t)}_{m i s}, η_{i, m i s}

are parameters corresponding to the indices of the missing values,

B {(t)}_{o b s}, η_{i, o b s}

are parameters at observed indices,

Σ_{Y, m i s, o b s}

is the covariances between missing and observed indices,

Σ_{Y, m i s, m i s}

is the covariance matrix of missing indices, and

Σ_{Y, o b s, o b s}

is the covariance matrix of observed indices.

Moreover, we often observe censored metabolite concentrations that are below the limit of detection (LOD). The LOD is defined as the lowest concentration of an analyte in a sample that can be reliably distinguished from the highest concentration of the same analyte in a sample with no such analyte [43]. We can impute metabolite concentrations under the LOD by sampling from a conditional truncated normal at each MCMC iteration:

X_{i j} | X_{i j} \in {- \infty, log (L O D_{j})}, ω \sim T N (θ_{j .}^{T} η_{i}, σ_{X, j}^{2}, - \infty, log (L O D_{j}))

(22)

where

L O D_{j}

is the LOD of the

j^{t h}

chemical,

θ_{j .}

is the

j^{t h}

row of loading matrix

Θ

, and

T N (m, v, a, b)

is a truncated normal distribution with mean m, variance v, and support

[a, b]

.

3.6. Posterior Computation

See Appendices for full conditional and adaptive Metropolis-within-Gibbs updates.

4. Simulations

We compare our proposed method BMFR with the following approaches: (1) two-stage univariate regressions, (2) BVCKMR [29], and (3) a baseline mean model. In the two-stage approach, we reduce the dimension of the exposures using PCA, keeping the first few principal components (PCs), and then fit LMM for each outcome separately. The baseline mean model returns the mean of each outcome. We simulate data from the following three scenarios to demonstrate our method’s utility compared to existing approaches. For all scenarios, we generate data for

n = 200

subjects. We generate the exposures according to a factor model, creating correlated exposures with group structures, similar to the observed phthalates metabolites:

\begin{matrix} X_{i} & = Θ η_{i} + e_{i} for i = 1, \dots, n \end{matrix}

(23)

\begin{matrix} η_{i} & \sim N_{K^{*}} (0, I), e_{i} \sim N_{p} (0, I) \end{matrix}

(24)

We generate sparse

Θ

so that every five metabolites load onto one factor for

p = 10

and

K^{*} = 2

latent factors. The non-zero entries of

Θ

are sampled from

N (0, 1)

. We generate

q = 5

outcomes measured at

T_{i} = 10

time points for all subjects:

\begin{matrix} Y_{i t} & = g (X_{i}, t) + ξ_{i} + ϵ_{i t} \end{matrix}

(25)

\begin{matrix} ξ_{i} & \sim N_{q} (0, C_{ξ}), ϵ_{i t} \sim N_{q} (0, C_{ϵ}) \end{matrix}

(26)

We use different exposure-response functions

g

including time-varying effects, different distributions for the random intercept

ξ_{i}

, and random error

ϵ_{i t}

for each scenario. Below is the description of each simulation scenario:

Scenario 1:: Linear exposure-response function where the first two factors are important, independent responses:

$\begin{matrix} g (X, t) & = h (η, t) = β_{1}^{T} η + β_{2}^{T} η t, j = 1, \dots, 10; k = 1, \dots, K^{*} \end{matrix}$

(27)

$\begin{matrix} β_{1 j k} & \sim U (- 3, 3), β_{2 j k} \sim U (- 0.5, 0.5) \end{matrix}$

(28)

$\begin{matrix} C_{ξ} & = I, C_{ϵ} = 0.5 I \end{matrix}$

(29)

where $h$ is the latent factor-response function. The induced effects of X range from 0 to 1. Under this setting, all assumptions for PCA-LMM are satisfied, though it does not propagate the uncertainty from the first stage.
Scenario 2:: Non-linear time-varying exposure-response function where the first two factors are important, positively correlated responses:

$\begin{matrix} g (X, t) & = h (η, t) = [β u {(t)}^{T}] η \end{matrix}$

(30)

$\begin{matrix} u_{1} (t) & = 3.5 / (1 + e x p (- 3 t + 25)), u_{2} (t) = 9 d n o r m ((t - 5.5) / 1.5) \end{matrix}$

(31)

$\begin{matrix} β & \sim N_{q} (0, I) \end{matrix}$

(32)

where $d n o r m (t)$ is the standard normal pdf at t. Covariances $C_{ξ}$ and $C_{ϵ}$ have a composite symmetry structure with a high correlation of 0.7. Under this setting, all assumptions for our BMFR model are satisfied.
Scenario 3:: Quadratic exposure-response function where three metabolites are important and responses are independent, the most favorable scenario for BVCKMR:

$\begin{matrix} g (X, t) & = β_{1} X_{1}^{2} - β_{2} X_{6}^{2} + 0.5 β_{3} X_{1} X_{2} + β_{4} X_{7} + β_{5} X_{8} + 0.3 (β_{6} X_{1}^{2} + β_{7} X_{7} + β_{8} X_{8}) t \end{matrix}$

(33)

$\begin{matrix} β_{l j} & \sim U n i f (0.25, 0.5) ⋃ U n i f (- 0.5, 0.25), l = 1, \dots, 8 \end{matrix}$

(34)

$\begin{matrix} C_{ξ} & = I, C_{ϵ} = 0.5 I \end{matrix}$

(35)

We choose

K = K^{*} + 2

for PCA and our method, and

H = 2

for our method, to resemble analyses where

K, H

are close to but not exactly the true latent dimension. For predictive performance evaluation, we calculate the mean predictive square error (MPSE) on a test set of 200 subjects at 10 time points. Additionally, to evaluate how well the methods measure the relative importance of each chemical, we calculate the Spearman correlation between the true relative importance rank and the inferred rank of chemicals in X. The rank is based on the absolute value of the effect at each time point, summed over all time points. We calculate the rank for our model by first calculating the induced effects in the original predictors X at each time point as shown in [44]. Similarly, we can calculate the effects from the PCA-LMM analysis in X as

{\hat{β}}_{X} = V^{(K)} {\hat{β}}_{P C s}

, where

V^{(K)}

is the first K left singular vectors, and

{\hat{β}}_{P C s}

is a vector of regression coefficients for the PCs. Table 2 shows the MPSE results. Note that PCA-LMM performed worst in all scenarios, even when all its assumptions were met. Our method performs best when the data are generated according to its model. When the data are more favorable to the other models, our method still performs better than PCA-LMM. Table 3 shows the Spearman correlation results. Here, our method performs best in the first two scenarios and is very close to the best in the third scenario.

5. Analysis of MSCEHS Cohort Data

5.1. Data Preprocessing

We log-transformed WC so that its marginal is more approximately normal. We mean-centered all outcomes. We also used the logarithm of phthalate metabolites as exposures. We corrected for urinary dilution by dividing the metabolite concentrations (not on the log scale) by the Cratio as discussed in Section 2. We standardize other continuous covariates and create dummy variables for categorical covariates. We used R package mice (version 3.18.0) for multiple imputation of missing values in the covariates (using predictive mean matching for continuous, logistic regression for binary, and proportional odds model for ordered categorical covariates). We created a time variable that is age in years based on age in months of the children at follow-up visits. Age ranges from 4 to 10 years old (Figure 3).

5.2. Preliminary Analysis

As a preliminary analysis, we performed two-staged PCA-LMM analyses of each of the four outcomes. We applied PCA to the standardized log chemical exposures. We used the first three principal components (PCs) for the LMM stage because they explained over 90% of the variations in the exposures. The first three PCs can be interpreted as the non-DEHP (excluding MEP) factor, DEHP factor, and MEP factor, respectively. The PCA factor loading matrix is available in Appendix B.

We fitted four independent LMMs for four outcomes with random intercepts and interactions between the PCs and the child’s gender, controlling for all baseline covariates. We fitted a second set of LMMs with interactions between the PCs, child’s gender, and child’s age, and a third set with interactions between the PCs, child’s gender, and polynomials of degree 2 of child’s age. We used AIC, BIC, likelihood ratio tests, and 6-fold cross-validated MPSE for model comparison within each outcome. After Bonferroni correction for multiple testing, we still saw evidence from the likelihood ratio tests that models with linear interactions with age were the best fits for FMP, WHR, and WC. Models with linear interactions with age had the best cross-validated MPSE for BMIz, FMP, and WC. We saw insufficient evidence of the quadratic interactions in age being useful. Full summary tables of AIC, BIC, p-values, and MPSEs can be found in Appendix B.

5.3. Main Analysis via BMFR

In the main analysis, we fitted our proposed Bayesian multivariate factor regression with time-varying effects to all four outcomes simultaneously. We fitted two models, one for male and one for female children, to assess any sex-specific effects. We controlled for linear main effects of covariates and included random intercepts.

We used the following prior specifications. Since we standardized the log chemical exposures to have unit variances, we set hyperparameters for inverse gamma priors on idiosyncratic noise variances

σ_{X, 1}^{2}, \dots, σ_{X, 1}^{2}

so that they are less than 1 with 99% probability. For hyperparameters on the MGP prior for loadings

Θ

and

Λ

, we used

a_{1} = 2.1

and

a_{2} = 3.1

as suggested in the note by [45]. For the IW prior on

Σ_{Y}

, we set

S_{0}

to the sample covariance of the four outcomes, and a small

s_{0} = 6

so that prior is loosely centered around the sample covariance. For a weakly informative prior that

ν^{2}

should be below 100, we use a half-Cauchy with a scale of 25 as in [32].

We selected the number of factors K and the number of basis functions H using grid search. We chose the combination of K and H from a set of options (

K \in {2, 3, 4}

, and

H \in {1, 2}

) that had the smallest 6-fold cross-validated MPSE, with

H \leq K

only. We considered small values for H because of our prior belief that the effects of the exposures on all outcomes were similar. The final model for males had

K = 3, H = 1

and the one for females had

K = 3, H = 2

. We fixed the length scale parameter

κ = 6

, since we observed from our preliminary analysis that there were likely no highly variable effects. Sensitivity analysis was also performed for models that infer the length scale parameter during MCMC.

5.4. Results of Analysis via BMFR

For out-of-sample predictive performance comparison, we calculated MPSE of a baseline mean model that returned outcome-specific means. The 6-fold cross-validated MPSE of our proposed method is 0.93 compared to 0.98 of the baseline mean model. For the female model, our proposed method’s MPSE is 0.82 compared to 1.04 of the baseline mean model. On the whole data set (combining male and female), our MPSE is 0.88, compared to 0.92 of PCA-LMMs with linear interactions with age, and 1.03 of the baseline mean model.

For interpretation of the results, we resolved rotational and label-switching ambiguity in the factors using the MatchAlign algorithm proposed by [46]. Figure 4 shows the post-processed factor loading matrix and time-varying effects of each latent factor from the model fitted for male children. Time-varying effects were transformed to represent the effects of one unit increase in the latent factors. We identified two main latent factors corresponding to the non-DEHP group and DEHP group of chemicals. This is consistent with previous studies on phthalate [20]. Though both groups of chemicals seem to be associated with lower values in adiposity outcomes at younger ages, their promotive effects of obesity seem to increase over time. The latent factors identified in the model for female also includes non-DEHP and DEHP groups, though all effects seem to remain null over time (Figure 5). The sex-specific effects here could be related to phthalates being anti-androgens [47]. Though the C.I.s from the female model are not statistically significant, the effects of latent factors on WHR seem to be different from their effects on the other three outcomes. This could be related to the fact that WHR is less correlated to the other outcomes than they are to each other.

We included results from sensitivity analysis of inferring the length scale parameter during MCMC in Appendix B. Overall, the sensitivity analysis results were similar to the results in Figure 4 and Figure 5.

6. Discussion

This paper assesses the time-varying health effects of prenatal phthalate exposures on adiposity outcomes measured in children from ages 4 to 9 from the MSCEHS cohort study using the BMFR approach we propose. BMFR represents phthalate mixtures as latent factors—a DEHP and a non-DEHP factor—and borrows information across highly correlated adiposity outcomes to improve estimation precision, models potentially non-linear time-varying effects of the latent factors on adiposity outcomes, and fully quantifies uncertainty using state-of-the-art prior specifications. We find that in boys, at younger ages (4–6 years), all phthalate latent factors (DEHP and non-DEHP) show negative associations with adiposity outcomes. After age 7, these associations begin to become positive. In girls, there is no evidence of associations between phthalate components and outcomes. We also find these time-varying effects to be similar across all adiposity outcomes (BMIz, fat mass percentage, waist-to-hip ratio, and waist circumference). Our introduction of a new Bayesian mixture method for estimating time-varying effects with full uncertainty quantification and our finding of sex-specific time-varying associations of prenatal phthalate exposures with childhood obesity between age 4 to 9 are novel.

We were able to estimate sex-specific time-varying effects not previously identified in analyses of the MSCEHS cohort because BMFR is customized to this research question and includes several custom specifications for the data. BMFR applies prior specifications to infer modeling decisions and avoid the artificially narrow confidence intervals that are an unintended consequence of fixing choices in traditional statistical models or multistage analyses. The model includes variable selection priors for covariates, avoiding fixed subsets; Gaussian process priors for flexible modeling of time-varying effects without assuming a functional form; and half-t priors for robust between-subject variances. BMFR represents highly correlated exposures with a small number of latent factors that are independent of each other but correlated with the outcomes, while fully quantifying uncertainty in the number of components through the MGP prior. BMFR also jointly models multiple outcomes of adiposity—BMIz, waist circumference, waist-hip ratio, and fat mass percentage—available in our cohort to improve estimation precision. Finally, BMFR quantifies the uncertainty in imputing missing data and values below the limit of detection (LOD) within the MCMC sampling, propagating this uncertainty into the credible intervals for the effects of interest. An R package for BMFR is freely available at https://github.com/phuchonguyen/famr (accessed on 20 August 2023). The package implements the MCMC algorithm in Appendix A in C++ for optimal computational speed.

Our finding of time-varying effects is similar to results from [25], which reported that exposure to phthalate in the first trimester was associated with lower BMI six months after birth but higher BMI in older ages, although they did not observe specific sex effects. Other studies have also found a somewhat similar time-varying effect: higher maternal urinary phthalate concentrations associated with lower fetal growth and birth weight, followed by higher growth trajectories later on [48,49], with sex-specific effects [48]. Our results also align with [20], which found non-DEHP metabolites associated with lower BMI, fat mass, and waist circumference in boys aged 5 and 7, and with [50], which reported DEHP associated with higher outcomes among boys between ages 8 and 10, and provide a potential explanation for seemingly inconsistent results between the two studies. The time-varying effect we identify in boys may appear null when aggregating across all ages, which is consistent with a previous analysis of this cohort [42] that estimated average effects over time and did not observe associations with percent fat mass or sex-specific modification. However, there are other studies inconsistent with ours, including reports of DEHP associated with a lower BMI in girls [14] and a higher weight gain in early childhood that stabilized during puberty in girls [24]. Our findings may have important implications for pregnancy care guidelines and child health. The time-varying effects of prenatal exposure suggest that these exposures may influence childhood obesity years after the time of exposure. They further suggest that the third trimester of pregnancy may be a vulnerable window of exposure, making interventions to reduce phthalate exposure during pregnancy important.

This study also has limitations and opportunities for future improvements. BMFR does not model nonlinear dose-response relationships. As a result, we did not investigate different effects at different exposure levels, though this could be performed by stratifying the analysis by tertiles of exposure. This may be important, as mixture effects could vary in nonlinear ways across exposure levels [26,29]. In this cohort, exposures were measured from a single urine sample, with collection times ranging from 25 to 40 weeks of gestation. Given the short half-lives of urinary phthalate metabolites [51] and the likely episodic nature of exposures [42], this limits the precision of exposure measurement. Future research should consider multiple exposure measurements and aim to identify the most critical window of vulnerability during the fetal period to growth trajectories. Data on adiposity outcomes before age 4 and after age 9 are not available in this cohort. Stratification by sex also reduces sample size and power to detect small effects in our analysis. Our findings would be strengthened by replication across a longer time window, from birth through puberty, and with data from a larger cohort study. Finally, residual confounding may remain, as we could not account for child calorie intake, or maternal consumer product preferences, which could influence exposure levels [42].

7. Conclusions

This paper presents a Bayesian multivariate factor regression approach to assessing the time-varying health effects of prenatal phthalate exposures as measured in maternal urine sample during the third trimester of pregnancy on adiposity outcomes measured in young children from age 4 to 9 using data from the MSCEHS cohort study. BMFR addresses challenges in analyzing mixture exposures by representing them as latent factors that predict the outcomes. It also allows non-linear time-varying effects of exposure mixture to be estimated with full uncertainty quantification, while improving estimate precision by borrowing information across correlated adiposity outcomes. The results show that in boys, at younger ages (4–6), all phthalate components show an association with lower adiposity outcomes; however, after age 7, they begin to show an association with higher outcomes. In girls, there is no evidence of associations between phthalate components and adiposity outcomes.

Author Contributions

Conceptualization, all authors; methodology, P.H.N. and A.H.H.; implementation, formal analysis, and validation, P.H.N.; data curation, A.H.H. and S.M.E.; writing—original draft preparation, P.H.N.; writing—review and editing, all authors; visualization, P.H.N.; supervision, A.H.H.; funding acquisition, A.H.H. All authors have read and agreed to the published version of the manuscript.

Funding

Phuc H. Nguyen and Amy H. Herring were funded by grants R01ES027498 and R01ES028804 of the National Institute of Environmental Health Sciences of the United States National Institutes of Health. Stephanie M. Engel was partially supported by research funding from grants R01ES035625, P30ES010126, RD-84021901, R01ES033518, and R01ES027498.

Institutional Review Board Statement

The study was secondary analysis of de-identified data from the Mount Sinai Children’s Environmental Health Study, and has received Administrative Review from the ethics committee of the the University of North Carolina at Chapel Hill Office of Human Research Ethics (study number 11-0317, Reference ID 447995, and annual administrative review acknowledgment date of 14 October 2024).

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request.

Acknowledgments

We want to thank David Dunson, Joseph Mathews, Youngsoo Baek, and Raphaël Morsomme for the helpful discussions about this work.

Conflicts of Interest

Author Phuc H. Nguyen was employed by the company LinkedIn Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NHANES	National Health and Nutrition Examination Surveys
MSCEHS	Mount Sinai Children’s Environmental Health Study
DEHP	di-2-ethylhexyl phthalate
LOD	limits of detection
BMI	body mass index
FMP	fat mass percentage
WHR	waist-to-hip ratio
WC	waist circumference
BVCKMR	Bayesian varying coefficient kernel machine regression
BMFR	Bayesian multivariate factor regression
PCA	principal component analysis

Appendix A. MCMC Algorithm

Let X be an

n \times p

matrix of mixtures data,

Y_{i t}

be a vector of q outcomes at time t for subject i, and K be a bound on the number of latent factors.

Sample idiosyncratic noise variances for the mixtures $Σ_{X}$ :

$\begin{matrix} σ_{X, j}^{- 2} | . & \sim G (\frac{2.5 + n}{2}, \frac{2.5 (0.084) + \sum_{i = 1}^{n} {(X_{i j} - θ_{j}^{T} η_{i})}^{2}}{2}) \end{matrix}$

(A1)

where $θ_{j}$ is the $j^{t h}$ row of loading matrix $Θ$ , for $j = 1, \dots, p$ . We use prior $σ_{X, j}^{- 2} \sim G (\frac{2.5}{2}, \frac{2.5 (0.084)}{2})$ since columns of X have been standardized to have variance 1.
Sample the mixtures loading matrix $Θ$ :

$\begin{matrix} θ_{j} | . & \sim N_{K} ({(D_{j}^{- 1} + \frac{η^{T} η}{σ_{X, j}^{2}})}^{- 1} \frac{η^{T} X_{. j}}{σ_{X, j}^{2}}, {(D_{j}^{- 1} + \frac{η^{T} η}{σ_{X, j}^{2}})}^{- 1}) \end{matrix}$

(A2)

where $X_{. j}$ is the $j^{t h}$ column of X, and $θ_{j}$ is the $j^{t h}$ row of $Θ$ .

$ϕ_{j k} | . \sim G (\frac{v + 1}{2}, \frac{v + τ_{k} θ_{j k}^{2}}{2})$

for $k = 1, \dots, K$ and $v = 3$ as suggested in [33].

$δ_{1} | . \sim G (a_{1} + \frac{p K}{2}, 1 + \frac{1}{2} \sum_{l = 1}^{K} τ_{l}^{(1)} \sum_{j = 1}^{p} ϕ_{j l} θ_{j l}^{2})$

$δ_{k} | . \sim G (a_{2} + \frac{p (K - k + 1)}{2}, 1 + \frac{1}{2} \sum_{l = 1}^{K} τ_{l}^{(k)} \sum_{j = 1}^{p} ϕ_{j l} θ_{j l}^{2})$

for $k \geq 2$ , where $τ_{l}^{(k)} = \prod_{t = 1, t \neq k}^{l} δ_{t}$ , for $k = 1, \dots, K$ .
Set $τ_{k} = \prod_{t = 1}^{k} δ_{t}$ .
We set $a_{1} = 2.1$ and $a_{2} = 3.1$ following the note by [45].
Sample latent factors for each subject using adaptive Metropolis-within-Gibbs:
Sample a proposal $η_{i}^{*} \sim N (η_{i}^{(s)}, {\tilde{s}}_{i}^{(s)})$ , where $η_{i}^{(s)}$ is the value for subject i at iteration s.
The log posterior density at $η_{i}$ (after integrating out $ξ_{i}$ ) is proportional to

$\begin{matrix} l (η_{i}) & = l_{X_{i}} (η_{i}) + l_{η} (η_{i}) + \sum_{t}^{T_{i}} l_{Y_{i t}} (η_{i}) \end{matrix}$

(A3)

$\begin{matrix} l_{X_{i}} (η_{i}) & = - \frac{1}{2} η_{i}^{T} Θ^{T} Σ_{X}^{- 1} Θ η_{i}; l_{η} (η_{i}) = - \frac{1}{2} η_{i}^{T} η_{i} \end{matrix}$

(A4)

$\begin{matrix} l_{Y_{i t}} (η_{i}) & = - \frac{1}{2} {(1 + ν^{2})}^{- 1} m_{i t} {(η_{i})}^{T} Σ_{Y}^{- 1} m_{i t} (η_{i}) \end{matrix}$

(A5)

$\begin{matrix} m_{i t} (η_{i}) & = Y_{i t} - (B (t) η_{i} + \sum_{l} z_{i t l} B_{l}^{(i n)} η_{i} + B^{(c)} Z_{i t}) \end{matrix}$

(A6)

With probability $min (1, e x p {l (η_{i}^{*}) - l (η_{i}^{(s)})})$ , accept the proposal and set $η_{i}^{(s + 1)} = η_{i}^{*}$ . Otherwise, set $η_{i}^{(s + 1)} = η_{i}^{(s)}$ .
Update the proposal scaling ${\tilde{s}}_{i}^{(s)}$ every 50 iterations according to Chapter 4, Section 4.3 in [52].
Sample subject-level random effects:

$\begin{matrix} ξ_{i} | . & \sim N ({(\frac{1}{ν^{2}} + T_{i})}^{- 1} \sum_{t}^{T_{i}} {\tilde{Y}}_{i t}, {(\frac{1}{ν^{2}} + T_{i})}^{- 1} Σ_{Y}) \end{matrix}$

(A7)

for $i = 1, \dots, n$ , where $T_{i}$ is the number of follow-up times of subject i, and ${\tilde{Y}}_{i t} = Y_{i t} - B (t) η_{i} - \sum_{l}^{L} z_{i t l} B_{l}^{(i n)} η_{i} - B^{(c)} Z_{i t}$ .
Sample $ν^{2} | .$ using Metropolis–Hastings.
Sample basis function loading matrix $Λ$ :

$\begin{matrix} λ_{. h} | . & \sim N_{q} (V_{. h} m_{. h}, V_{. h}) \end{matrix}$

(A8)

$\begin{matrix} m_{. h} & = \sum_{i, t} u_{h .} {(t)}^{T} η_{i} Y_{i t} \end{matrix}$

(A9)

$\begin{matrix} V_{. h} & = {[D_{h}^{- 1} + Σ^{- 1} {(\sum_{i, t} u_{h .} {(t)}^{T} η_{i})}^{2}]}^{- 1} \end{matrix}$

(A10)

where $D_{h} = τ_{h}^{*} d i a g (ϕ_{1 h}^{*}, \dots, ϕ_{q h}^{*})$ contains MGP shrinkage parameters. Sample these the same as in Step 2.
Sample basis functions $U (t)$ :

$\begin{matrix} D_{i}^{(h k)} & = [\begin{matrix} λ_{. h} η_{i k} O_{i 1} & \dots & 0 \\ 0 & \dots & 0 \\ 0 & \dots & λ_{. h} η_{i k} O_{i T} \end{matrix}] \end{matrix}$

(A11)

$\begin{matrix} v_{i t}^{(h k)} & = [\begin{matrix} \sum_{l \neq h} λ_{1 l} η_{i k} u {(t)}_{l k} \\ \dots \\ \sum_{l \neq h} λ_{q l} η_{i k} u {(t)}_{l k} \end{matrix}] \end{matrix}$

(A12)

$\begin{matrix} v_{i}^{(h k)} & = [\begin{matrix} v_{i 1}^{(h k)} \\ \dots \\ v_{i T}^{(h k)} \end{matrix}] \end{matrix}$

(A13)

$\begin{matrix} u_{(h k)} & \sim N (V^{(h k)} m^{(h k)}, V^{(h k)}) \end{matrix}$

(A14)

$\begin{matrix} V^{(h k)} & = {[C^{- 1} + \sum_{i} {(D_{i}^{(h k)})}^{T} Σ^{- 1} D_{i}^{(h k)}]}^{- 1} \end{matrix}$

(A15)

$\begin{matrix} m^{(h k)} & = \sum_{i} {(D_{i}^{(h k)})}^{T} Σ^{- 1} (y_{i}^{(k)} - v_{i}^{(h k)}) \end{matrix}$

(A16)
Sample Gaussian Process bandwidth $κ$ from a set of plausible values:
Sample one option $κ^{*}$ w.p. $\frac{l (κ^{*}; U)}{\sum l (\tilde{κ}; U)}$ where $l (κ^{*}; U)$ is the likelihood of $κ^{*}$ .
Sample matrix of main effects of covariates:

$\begin{matrix} B^{(c)} | . & \sim N_{q \times L} (\frac{{\tilde{Y}}^{T} Z}{1 + ν^{2}} {(\frac{Z^{T} Z}{1 + ν^{2}} + Ψ^{(c) - 1})}^{- 1}, Σ_{Y}, {(\frac{Z^{T} Z}{1 + ν^{2}} + Ψ^{(c) - 1})}^{- 1}) \end{matrix}$

(A17)

where $\tilde{Y} = {[{\tilde{Y}}_{11}, \dots, {\tilde{Y}}_{1 T_{1}}, \dots, {\tilde{Y}}_{n T_{n}}]}^{T}$ is an $\sum_{i} T_{i} \times q$ matrix, and ${\tilde{Y}}_{i t} = Y_{i t} - B (t) η_{i} - \sum_{l}^{L} z_{i t l} B_{l}^{(i n)} η_{i}$ . Sample shrinkage parameters for the linear regression coefficient matrix $B^{(c)}$ :

$\begin{matrix} ψ_{l}^{(c)} | . & \sim G I G (u - \frac{q}{2}, 2 ζ_{l}^{(c)}, B_{l}^{(c) T} Σ^{- 1} B_{l}^{(c)}) \end{matrix}$

(A18)

$\begin{matrix} ζ_{l}^{(c)} | . & \sim G (v, r + ψ_{k}^{(c)}) \end{matrix}$

(A19)

where $B_{l}^{(c)}$ is the $l^{t h}$ column of $B^{(c)}$ , for $l = 1, \dots, L$ .
Sample the outcomes’ noise variance parameters:

$\begin{matrix} Σ | . & \sim I W (\sum_{i}^{n} T_{i} + K T + K L + L + s_{0}, S S + s_{1} I_{q}) \end{matrix}$

(A20)

$\begin{matrix} S S & = Y^{†} Y^{† T} {(1 + ν^{2})}^{- 1} + \sum_{k}^{K} B_{k} {(ψ_{k} C)}^{- 1} B_{k}^{T} \end{matrix}$

(A21)

$\begin{matrix} + \sum_{l}^{L} B_{l}^{(i n)} {Ψ^{(i n)}}^{- 1} B_{l}^{(i n) T} + B^{(c)} {Ψ^{(c)}}^{- 1} {B^{(c)}}^{T} \end{matrix}$

(A22)

where $Y^{†} = [Y_{11}^{†}, \dots, Y_{1 T_{1}}^{†}, \dots, Y_{n T_{n}}^{†}]$ is a $q \times \sum_{i}^{n} T_{i}$ matrix, and
$Y_{i t}^{†} = Y_{i t} - B (t) η_{i} - \sum_{l}^{L} z_{i t l} B_{l}^{(i n)} η_{i} - B^{(c)} Z_{i t}$ .

We use the following parametrization of generalized inverse Gaussian

y \sim G I G (p, a, b)

if

p (y) \propto y^{p - 1} e^{- (a y + b / y) / 2}

for

y > 0

.

Appendix B. Analysis of Mount Sinai Birth Cohort Data

Appendix B.1. Preliminary Analysis

Figure A1. PCA varimax factor loading matrix of phthalate metabolites.

Table A1. Model comparison for including none, linear, or polynomial degree 2 interactions with time to predict BMIz.

Interaction	AIC	BIC	Chisq	p-Value	MPSE
none	727	809	-	-	1.05
linear	736	845	4.87	0.67	1.03
quadratic	747	883	3.24	0.86	1.13

Table A2. Model comparison for including none, linear, or polynomial degree 2 interactions with time to predict FMP.

Interaction	AIC	BIC	Chisq	p-Value	MPSE
none	859	941	-	-	1.03
linear	743	852	130	< $2 \times 10^{- 16}$	0.90
quadratic	749	885	8	0.33	1.32

Table A3. Model comparison for including none, linear, or polynomial degree 2 interactions with time to predict WHR.

Interaction	AIC	BIC	Chisq	p-Value	MPSE
none	1004	1086	-	-	0.94
linear	988	1097	30	$7 \times 10^{- 5}$	0.98
quadratic	986	1123	15	0.03	2.61

Table A4. Model comparison for including none, linear, or polynomial degree 2 interactions with time to predict WC.

Interaction	AIC	BIC	Chisq	p-Value	MPSE
none	939	1021	-	-	1.1
linear	752	861	201	< $2 \times 10^{- 16}$	0.80
quadratic	761	898	4.4	0.73	1.73

Appendix B.2. Sensitivity Analysis

Figure A2. MatchAligned factor loading matrix and time-varying effects of each latent factor from the model fitted for male with an inferred

κ

.

Figure A2. MatchAligned factor loading matrix and time-varying effects of each latent factor from the model fitted for male with an inferred

κ

.

Figure A3. MatchAligned factor loading matrix and time-varying effects of each latent factor from the model fitted for females with an inferred

κ

.

Figure A3. MatchAligned factor loading matrix and time-varying effects of each latent factor from the model fitted for females with an inferred

κ

.

References

Fryar, C.; Carroll, M.; Afful, J. Prevalence of overweight, obesity, and severe obesity among children and adolescents aged 2–19 years: United States, 1963–1965 through 2017–2018. NCHS Health E-Stats 2020. [Google Scholar]
Committee on Accelerating Progress in Obesity Prevention; Food and Nutrition Board; Institute of Medicine. Accelerating Progress in Obesity Prevention: Solving the Weight of the Nation; National Academies Press: Washington, DC, USA, 2012. [Google Scholar]
Gollust, S.E.; Niederdeppe, J.; Barry, C.L. Framing the consequences of childhood obesity to increase public support for obesity prevention policy. Am. J. Public Health 2013, 103, e96–e102. [Google Scholar] [CrossRef]
Narayan, K.V.; Boyle, J.P.; Thompson, T.J.; Sorensen, S.W.; Williamson, D.F. Lifetime risk for diabetes mellitus in the United States. JAMA 2003, 290, 1884–1890. [Google Scholar] [CrossRef]
Wu, B.; Jiang, Y.; Jin, X.; He, L. Using three statistical methods to analyze the association between exposure to 9 compounds and obesity in children and adolescents: NHANES 2005–2010. Environ. Health 2020, 19, 94. [Google Scholar] [CrossRef] [PubMed]
Seo, M.Y.; Moon, S.; Kim, S.H.; Park, M.J. Associations of Phthalate Metabolites and Bisphenol A Levels with Obesity in Children: The Korean National Environmental Health Survey (KoNEHS) 2015 to 2017. Endocrinol. Metab. 2022, 37, 249–260. [Google Scholar] [CrossRef] [PubMed]
Newbold, R.R. Impact of environmental endocrine disrupting chemicals on the development of obesity. Hormones 2010, 9, 206–217. [Google Scholar] [CrossRef] [PubMed]
Amato, A.A.; Wheeler, H.B.; Blumberg, B. Obesity and endocrine-disrupting chemicals. Endocr. Connect. 2021, 10, R87–R105. [Google Scholar] [CrossRef]
Hauser, R.; Calafat, A. Phthalates and human health. Occup. Environ. Med. 2005, 62, 806–818. [Google Scholar] [CrossRef]
Encarnação, T.; Pais, A.A.; Campos, M.G.; Burrows, H.D. Endocrine disrupting chemicals: Impact on human health, wildlife and the environment. Sci. Prog. 2019, 102, 3–42. [Google Scholar] [CrossRef]
Kim, S.; Park, M. Phthalate exposure and childhood obesity. Ann. Pediatr. Endocrinol. Metab. 2014, 19, 69–75. [Google Scholar] [CrossRef]
Roundtable on Environmental Health Sciences, Research, and Medicine; Board on Population Health and Public Health Practice; Health and Medicine Division; National Academies of Sciences, Engineering, and Medicine. The Interplay Between Environmental Chemical Exposures and Obesity: Proceedings of a Workshop; National Academies Press: Washington, DC, USA, 2016. [Google Scholar]
Engel, S.M.; Berkowitz, G.S.; Barr, D.B.; Teitelbaum, S.L.; Siskind, J.; Meisel, S.J.; Wetmur, J.G.; Wolff, M.S. Prenatal organophosphate metabolite and organochlorine levels and performance on the Brazelton Neonatal Behavioral Assessment Scale in a multiethnic pregnancy cohort. Am. J. Epidemiol. 2007, 165, 1397–1404. [Google Scholar] [CrossRef]
Buckley, J.P.; Engel, S.M.; Braun, J.M.; Whyatt, R.M.; Daniels, J.L.; Mendez, M.A.; Richardson, D.B.; Xu, Y.; Calafat, A.M.; Wolff, M.S.; et al. Prenatal phthalate exposures and body mass index among 4 to 7 year old children: A pooled analysis. Epidemiology 2016, 27, 449. [Google Scholar] [CrossRef]
Golestanzadeh, M.; Riahi, R.; Kelishadi, R. Association of exposure to phthalates with cardiometabolic risk factors in children and adolescents: A systematic review and meta-analysis. Environ. Sci. Pollut. Res. 2019, 26, 35670–35686. [Google Scholar] [CrossRef] [PubMed]
Berger, K.; Hyland, C.; Ames, J.L.; Mora, A.M.; Huen, K.; Eskenazi, B.; Holland, N.; Harley, K.G. Prenatal exposure to mixtures of phthalates, parabens, and other phenols and obesity in five-year-olds in the CHAMACOS cohort. Int. J. Environ. Res. Public Health 2021, 18, 1796. [Google Scholar] [CrossRef] [PubMed]
Vafeiadi, M.; Myridakis, A.; Roumeliotaki, T.; Margetaki, K.; Chalkiadaki, G.; Dermitzaki, E.; Venihaki, M.; Sarri, K.; Vassilaki, M.; Leventakou, V.; et al. Association of early life exposure to phthalates with obesity and cardiometabolic traits in childhood: Sex specific associations. Front. Public Health 2018, 6, 327. [Google Scholar] [CrossRef] [PubMed]
Gao, H.; Geng, M.l.; Gan, H.; Huang, K.; Zhang, C.; Zhu, B.b.; Sun, L.; Wu, X.; Zhu, P.; Tao, F.b.; et al. Prenatal single and combined exposure to phthalates associated with girls’ BMI trajectory in the first six years. Ecotoxicol. Environ. Saf. 2022, 241, 113837. [Google Scholar] [CrossRef]
Botton, J.; Philippat, C.; Calafat, A.M.; Carles, S.; Charles, M.A.; Slama, R.; Eden Mother-Child Cohort Study Group. Phthalate pregnancy exposure and male offspring growth from the intra-uterine period to five years of age. Environ. Res. 2016, 151, 601–609. [Google Scholar] [CrossRef]
Maresca, M.M.; Hoepner, L.A.; Hassoun, A.; Oberfield, S.E.; Mooney, S.J.; Calafat, A.M.; Ramirez, J.; Freyer, G.; Perera, F.P.; Whyatt, R.M.; et al. Prenatal exposure to phthalates and childhood body size in an urban cohort. Environ. Health Perspect. 2016, 124, 514–520. [Google Scholar] [CrossRef]
Harley, K.G.; Berger, K.; Rauch, S.; Kogut, K.; Claus Henn, B.; Calafat, A.M.; Huen, K.; Eskenazi, B.; Holland, N. Association of prenatal urinary phthalate metabolite concentrations and childhood BMI and obesity. Pediatr. Res. 2017, 82, 405–415. [Google Scholar] [CrossRef]
Yang, T.C.; Peterson, K.E.; Meeker, J.D.; Sánchez, B.N.; Zhang, Z.; Cantoral, A.; Solano, M.; Tellez-Rojo, M.M. Exposure to Bisphenol A and phthalates metabolites in the third trimester of pregnancy and BMI trajectories. Pediatr. Obes. 2018, 13, 550–557. [Google Scholar] [CrossRef]
Ye’elah, E.B.; Doherty, D.A.; Main, K.M.; Frederiksen, H.; Keelan, J.A.; Newnham, J.P.; Hart, R.J. The influence of prenatal exposure to phthalates on subsequent male growth and body composition in adolescence. Environ. Res. 2021, 195, 110313. [Google Scholar] [CrossRef]
Heggeseth, B.C.; Holland, N.; Eskenazi, B.; Kogut, K.; Harley, K.G. Heterogeneity in childhood body mass trajectories in relation to prenatal phthalate exposure. Environ. Res. 2019, 175, 22–33. [Google Scholar] [CrossRef]
Sol, C.M.; Delgado, G.; Kannan, K.; Jaddoe, V.W.; Trasande, L.; Santos, S. Fetal exposure to phthalates and body mass index from infancy to adolescence. The Generation R study. Environ. Res. 2025, 274, 121253. [Google Scholar] [CrossRef]
Bobb, J.F.; Valeri, L.; Claus Henn, B.; Christiani, D.C.; Wright, R.O.; Mazumdar, M.; Godleski, J.J.; Coull, B.A. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics 2015, 16, 493–508. [Google Scholar] [CrossRef] [PubMed]
Keil, A.P.; Buckley, J.P.; O’Brien, K.M.; Ferguson, K.K.; Zhao, S.; White, A.J. A quantile-based g-computation approach to addressing the effects of exposure mixtures. Environ. Health Perspect. 2020, 128, 047004. [Google Scholar] [CrossRef]
Czarnota, J.; Gennings, C.; Wheeler, D.C. Assessment of weighted quantile sum regression for modeling chemical mixtures and cancer risk. Cancer Inform. 2015, 14, CIN–S17295. [Google Scholar] [CrossRef] [PubMed]
Liu, S.H.; Bobb, J.F.; Claus Henn, B.; Gennings, C.; Schnaas, L.; Tellez-Rojo, M.; Bellinger, D.; Arora, M.; Wright, R.O.; Coull, B.A. Bayesian varying coefficient kernel machine regression to assess neurodevelopmental trajectories associated with exposure to complex mixtures. Stat. Med. 2018, 37, 4680–4694. [Google Scholar] [CrossRef]
Bai, R.; Ghosh, M. High-dimensional Multivariate Posterior Consistency under Global-local shrinkage priors. J. Multivar. Anal. 2018, 167, 157–170. [Google Scholar] [CrossRef]
Bernardo, J.; Berger, J.; Dawid, A.; Smith, A. Regression and classification using Gaussian process priors. Bayesian Stat. 1998, 6, 475. [Google Scholar]
Gelman, A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 2006, 1, 515–534. [Google Scholar] [CrossRef]
Bhattacharya, A.; Dunson, D.B. Sparse Bayesian infinite factor models. Biometrika 2011, 98, 291–306. [Google Scholar] [CrossRef]
Hoyle, R.H. The structural equation modeling approach: Basic concepts and fundamental issues. In Structural Equation Modeling: Concepts, Issues, and Applications; Sage Publications, Inc.: Thousand Oaks, CA, USA, 1995. [Google Scholar]
Bowman, A.; Peterson, K.E.; Dolinoy, D.C.; Meeker, J.D.; Sánchez, B.N.; Mercado-Garcia, A.; Téllez-Rojo, M.M.; Goodrich, J.M. Phthalate exposures, DNA methylation and adiposity in Mexican children through adolescence. Front. Public Health 2019, 7, 162. [Google Scholar] [CrossRef] [PubMed]
CDC. Updated Tables, February 2012 What’s New and Different? 2012. Available online: https://www.cdc.gov/environmental-exposure-report/whats-new/whats_new_022012.html (accessed on 1 April 2015).
O’Brien, K.M.; Upson, K.; Buckley, J.P. Lipid and creatinine adjustment to evaluate health effects of environmental exposures. Curr. Environ. Health Rep. 2017, 4, 44–50. [Google Scholar] [CrossRef] [PubMed]
CDC. A SAS Program for the CDC Growth Charts (Ages 0 to <20 Years). 2004. Available online: https://www.cdc.gov/growth-chart-training/hcp/computer-programs/sas.html (accessed on 7 January 2013).
Yaktine, A.L.; Rasmussen, K.M. Weight Gain During Pregnancy: Reexamining the Guidelines; National Academies Press: Washington, DC, USA, 2010. [Google Scholar]
Fox, E.B.; Dunson, D.B. Bayesian nonparametric covariance regression. J. Mach. Learn. Res. 2015, 16, 2501–2542. [Google Scholar]
Armagan, A.; Dunson, D.B.; Clyde, M. Generalized Beta Mixtures of Gaussians. Adv. Neural Inf. Process. Syst. 2011, 24, 523–531. [Google Scholar] [PubMed]
Buckley, J.P.; Engel, S.M.; Mendez, M.A.; Richardson, D.B.; Daniels, J.L.; Calafat, A.M.; Wolff, M.S.; Herring, A.H. Prenatal phthalate exposures and childhood fat mass in a New York City cohort. Environ. Health Perspect. 2016, 124, 507–513. [Google Scholar] [CrossRef]
Armbruster, D.A.; Pry, T. Limit of blank, limit of detection and limit of quantitation. Clin. Biochem. Rev. 2008, 29, S49–S52. [Google Scholar]
Ferrari, F.; Dunson, D.B. Bayesian factor analysis for inference on interactions. J. Am. Stat. Assoc. 2021, 116, 1521–1532. [Google Scholar] [CrossRef]
Durante, D. A note on the multiplicative gamma process. Stat. Probab. Lett. 2017, 122, 198–204. [Google Scholar] [CrossRef]
Poworoznek, E.; Ferrari, F.; Dunson, D. Efficiently resolving rotational ambiguity in Bayesian matrix sampling with matching. arXiv 2021, arXiv:2107.13783. [Google Scholar] [CrossRef]
Fisher, J.S. Environmental anti-androgens and male reproductive health: Focus on phthalates and testicular dysgenesis syndrome. Reproduction 2004, 127, 305–315. [Google Scholar] [CrossRef]
Li, J.; Qian, X.; Zhou, Y.; Li, Y.; Xu, S.; Xia, W.; Cai, Z. Trimester-specific and sex-specific effects of prenatal exposure to di (2-ethylhexyl) phthalate on fetal growth, birth size, and early-childhood growth: A longitudinal prospective cohort study. Sci. Total Environ. 2021, 777, 146146. [Google Scholar] [CrossRef]
Ferguson, K.K.; Bommarito, P.A.; Arogbokun, O.; Rosen, E.M.; Keil, A.P.; Zhao, S.; Barrett, E.S.; Nguyen, R.H.; Bush, N.R.; Trasande, L.; et al. Prenatal phthalate exposure and child weight and adiposity from in utero to 6 years of age. Environ. Health Perspect. 2022, 130, 047006. [Google Scholar] [CrossRef]
Montes, J.O.A.; Villarreal, A.B.; Romieu, I.; Barr, D.B.; Martínez, K.C.; Cadena, L.H. Modification of the association by sex between the prenatal exposure to di (2-ethylhexyl) phthalate and fat percentage in a cohort of Mexicans schoolchildren. Int. J. Obes. 2022, 46, 121–128. [Google Scholar] [CrossRef]
Samandar, E.; Silva, M.J.; Reidy, J.A.; Needham, L.L.; Calafat, A.M. Temporal stability of eight phthalate metabolites and their glucuronide conjugates in human urine. Environ. Res. 2009, 109, 641–646. [Google Scholar] [CrossRef]
Brooks, S.; Gelman, A.; Jones, G.; Meng, X.L. Handbook of Markov Chain Monte Carlo; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]

Figure 1. Loss to follow-up patterns for WC, BMIz, and FMP in the study population. Each row on the y-axis corresponds to one of 382 babies in MSCEHS (randomly ordered), and each column on the x-axis represents an outcome. Cells are shaded maroon if the value is observed and shown in light yellow if the value is missing.

Figure 2. (Left): Sample Pearson’s correlation of phthalate metabolites concentrations. (Right): Sample Pearson’s correlation of adiposity outcomes.

Figure 3. Histogram of ages at which the children had adiposity outcomes’ measurements.

Figure 4. MatchAligned factor loading matrix and time-varying effects of each latent factor from the model fitted for males. The blue band displays 95% posterior credible interval, and the black solid line shows the posterior mean.

Figure 5. MatchAligned factor loading matrix and time-varying effects of each latent factor from the model fitted for females. The blue band displays 95% posterior credible interval, and the black solid line shows the posterior mean.

Table 1. Sample characteristics of participants with at least one follow-up, stratified by child sex, in the Mount Sinai Children’s Environmental Health Study 1998–2002.

Characteristics	Study Sample	Male Sample	Female Sample
	n (%)	n (%)	n (%)
	Mean ± SD	Mean ± SD	Mean ± SD
Total (n)	180	97	83
Race/ethnicity
Non-Hispanic white	33 (18.3)	20 (20.6)	13 (15.7)
Non-Hispanic black	51 (28.3)	27 (27.8)	24 (28.9)
Hispanic or other	96 (53.3)	50 (51.5)	46 (55.4)
Maternal age at delivery (years)	24.4 ± 6.4	24.6 ± 6.7	24.2 ± 6.1
Maternal education (≥college degree)	39 (21.7)	21 (21.6)	18 (21.7)
Maternal prepregnancy BMI (kg/m²)	23.9 ± 4.7	24.0 ± 5.0	23.8 ± 4.3
Missing	0	0	1 (1.2)
Maternal gestational weight gain (lbs)	40.8 ± 18.4	39.0 ± 18.0	42.8 ± 18.7
Missing	22 (12.2)	13 (13.4)	9 (10.8)
Maternal smoking during pregnancy
Ever	31 (17.3)	18 (18.6)	13 (15.7)
Never	149 (82.7)	79 (81.4)	70 (84.3)
Breastfed
Ever	113 (62.8)	58 (59.79)	55 (66.3)
Never	66 (36.7)	39 (40.2)	27 (32.5)
Missing	1 (0.6)	0	1 (1.2)
Child’s birthweight (g)	3296 ± 458	3352 ± 475	3229 ± 430

Table 2. Average MPSE of 100 simulations per scenario. The best performance for each scenario is given in boldface.

Model	Scenario 1	Scenario 2	Scenario 3
Oracle	1.51 (0.05)	1.51 (0.08)	1.30 (0.02)
Mean predictor	12.0 (2.65)	7.57 (4.6)	3.93 (0.25)
PCA-LMM	7.14 (1.95)	5.93 (2.81)	3.23 (0.28)
BVCKMR	4.44 (1.17)	5.69 (2.55)	1.60 (0.04)
BMFR (our model)	5.33 (1.46)	2.92 (1.31)	2.63 (0.22)

Table 3. Average Spearman’s correlation of rank of variable importance of 100 simulations per scenario. The best performance for each scenario is given in boldface.

Model	Scenario 1	Scenario 2	Scenario 3
Oracle	1	1	1
Mean predictor	-	-	-
PCA-LMM	0.81 (0.07)	0.63 (0.22)	0.29 (0.14)
BVCKMR	0.76 (0.08)	0.54 (0.19)	0.59 (0.1)
BMFR (our model)	0.89 (0.06)	0.89 (0.08)	0.55 (0.17)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nguyen, P.H.; Engel, S.M.; Herring, A.H. Prenatal Phthalate Exposures and Adiposity Outcomes Trajectories: A Multivariate Bayesian Factor Regression Approach. Int. J. Environ. Res. Public Health 2025, 22, 1466. https://doi.org/10.3390/ijerph22101466

AMA Style

Nguyen PH, Engel SM, Herring AH. Prenatal Phthalate Exposures and Adiposity Outcomes Trajectories: A Multivariate Bayesian Factor Regression Approach. International Journal of Environmental Research and Public Health. 2025; 22(10):1466. https://doi.org/10.3390/ijerph22101466

Chicago/Turabian Style

Nguyen, Phuc H., Stephanie M. Engel, and Amy H. Herring. 2025. "Prenatal Phthalate Exposures and Adiposity Outcomes Trajectories: A Multivariate Bayesian Factor Regression Approach" International Journal of Environmental Research and Public Health 22, no. 10: 1466. https://doi.org/10.3390/ijerph22101466

APA Style

Nguyen, P. H., Engel, S. M., & Herring, A. H. (2025). Prenatal Phthalate Exposures and Adiposity Outcomes Trajectories: A Multivariate Bayesian Factor Regression Approach. International Journal of Environmental Research and Public Health, 22(10), 1466. https://doi.org/10.3390/ijerph22101466

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prenatal Phthalate Exposures and Adiposity Outcomes Trajectories: A Multivariate Bayesian Factor Regression Approach

Abstract

1. Introduction

2. Data

2.1. Study Population

2.2. Phthalate Exposures

2.3. Adiposity Outcomes

2.4. Covariates

3. Bayesian Multivariate Factor Regression for Time-Varying Effects

3.1. Model Correlated Chemical Mixtures with a Latent Factor Model

3.2. Model Correlated Outcomes as a Function of Latent Factors

3.3. Model Health Effects as Flexible Functions of Time

3.4. Model Linear Effects and Interactions of Covariates

3.5. Imputation of Censored and Missing Data

3.6. Posterior Computation

4. Simulations

5. Analysis of MSCEHS Cohort Data

5.1. Data Preprocessing

5.2. Preliminary Analysis

5.3. Main Analysis via BMFR

5.4. Results of Analysis via BMFR

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. MCMC Algorithm

Appendix B. Analysis of Mount Sinai Birth Cohort Data

Appendix B.1. Preliminary Analysis

Appendix B.2. Sensitivity Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI