1. Introduction
The increasing prevalence of childhood obesity is a relevant public health concern. The 2017–2018 National Health and Nutrition Examination Surveys (NHANES) estimated that 19.3% of children aged 2–19 years have obesity and another 16.1% are overweight [
1]. The prevalence of obesity among children and adolescents has increased more than three times in the last 30 years [
2]. Obese children are at increased risk for several health conditions, including obesity in adulthood, type 2 diabetes, heart disease, arthritis, and various cancers, as well as shorter life. Studies suggest that approximately 70% of obese children face a significant risk of heart disease in adulthood [
3]. Some estimates suggest that one-third of children born today (and half of Latino and black children) are expected to develop type 2 diabetes at some point in their lives [
4]. Childhood obesity also has negative implications for mental health, as it is associated with an increased risk of being bullied [
3].
Environmental chemical exposures, particularly exposure to phthalates, have been identified as risk factors for childhood obesity [
5,
6], potentially by interfering with the body’s endocrine system [
7,
8]. Phthalates are a group of chemicals widely used in consumer products, such as toys, food packaging, or cosmetics, that are known to have endocrine disrupting properties [
9,
10]. In vivo and in vitro animal studies suggest that fetal development is a potential critical window of vulnerability to phthalate exposures, which may promote obesity, and the effect may depend on gender [
8,
11,
12]. Observational studies in humans, including birth cohort studies, further support these findings by showing associations between prenatal phthalate exposure and childhood adiposity outcomes [
13,
14,
15,
16], some reporting sex-specific effects [
14,
17,
18].
In this paper, we investigate the time-varying health effects of prenatal phthalate exposures on childhood obesity using data from the Mount Sinai Children’s Environmental Health Study (MSCEHS) [
13]. The study measured the urinary phthalate metabolites of mothers during the third trimester of pregnancy and followed the health indices of children, including adiposity outcomes, between ages 4 and 9. We would like to identify groups of phthalate metabolites critical to these adiposity outcomes and estimate their time-varying effects and any sex-specific effects using a new modeling framework: the Bayesian multivariate factor regression model (BMFR). Our approach addresses the challenge of fully quantifying the uncertainty in estimating the time-varying effects of the mixture and provides several advantages tailored to our cohort data.
Many studies have examined the time-varying effects of exposure to phthalates on adiposity outcomes or growth trajectories, but the results remain inconsistent [
14,
18,
19,
20,
21,
22,
23,
24,
25]. A key limitation is that traditional statistical models, or even mixture methods not designed to estimate time-varying effects, lack full uncertainty quantification. In traditional statistical models, key modeling choices, such as the functional form of time-varying effects, are chosen or estimated first, then treated as fixed in the subsequent outcome model. For example, linear mixed-effects models or generalized estimating equations typically rely on interaction terms between a single exposure and age to model time-varying effects [
21,
22,
23,
25]. These approaches cannot consider all phthalate metabolites simultaneously due to multicollinearity, and assumptions about the functional form of time-varying relationships (e.g., linear or polynomial) are treated as fixed. Fixing these choices underestimates uncertainty, leading to confidence intervals that are too narrow and
p-values that are artificially small. More advanced methods, such as growth mixture models, latent class growth models, or functional principal component analysis, can capture nonlinear growth trajectories, but they require a second-stage model to estimate associations between mixture exposures and outcome trajectories [
18,
24]. In such multistage analyses, outputs from earlier stages (e.g., a number of selected trajectories or factors) are treated as fixed in later stages, again leading to underestimated uncertainty. Other advanced mixture methods, such as Bayesian kernel machine regression [
26], quantile g-computation [
27], and weighted quantile sum regression [
28], do not model nonlinear time-varying effects. Bayesian varying-coefficient kernel machine regression (BVCKMR) is another advanced mixture method that can model nonlinear effects across exposure levels on outcome trajectories [
29]. However, at fixed exposure levels, BVCKMR’s estimated effects over time are limited to a functional form.
To address these limitations, BMFR applies state-of-the-art prior specifications to infer modeling decisions and fully quantify their uncertainty. The model includes variable selection priors for covariates, avoiding fixed subsets [
30]; Gaussian process priors for flexible modeling of time-varying health effects without assuming a specific functional form [
31]; and half-t priors for robust between-subject variances [
32]. To model the exposure mixture, BMFR assumes that structured variations in correlated exposures can be attributed to a small number of latent factors that also explain part of the variations in the outcomes, while fully quantifying the uncertainty in the number of factors through the multiplicative gamma process (MGP) prior [
33]. This contrasts with principal component analysis (PCA) [
20], which maximizes variation in exposures without regard to outcome relevance, and structural equation models [
34], which require a fixed number of factors.
Our implementation of BMFR also has custom features specific to our cohort. BMFR jointly models multiple adiposity outcomes, including BMI z-scores, waist circumference, waist-to-hip ration, and fat mass percentage, to improve estimation precision. Previous studies have generally examined these outcomes individually [
14,
16,
17,
20,
21,
23,
35], hypothesized that mixtures affect all outcomes similarly [
20], or searched for consistent results across outcomes [
16,
21,
23]. Because these outcomes are highly positively correlated and likely reflect shared mechanisms through which phthalates influence obesity, modeling them jointly is expected to improve estimation precision. Finally, BMFR quantifies uncertainty in imputing missing data and measurements below the limit of detection (LOD). Instead of relying on standard fixed-value imputations (e.g., mean or one-half of the LOD), BMFR handles imputation within the Markov Chain Monte Carlo (MCMC) process, propagating uncertainty in imputed values into the credible intervals for the effects of interest. An R package, optimized in C
++, for BMFR is freely available at
https://github.com/phuchonguyen/famr (accessed on 20 August 2023).
Section 2 describes in detail the data from the MSCEHS.
Section 3 describes our proposed model BMFR and its prior specifications.
Section 4 validates BMFR’s utility through simulation studies.
Section 5 describes the analysis of time-varying health effects of prenatal phthalate exposures on childhood obesity using the MSCEHS data.
4. Simulations
We compare our proposed method BMFR with the following approaches: (1) two-stage univariate regressions, (2) BVCKMR [
29], and (3) a baseline mean model. In the two-stage approach, we reduce the dimension of the exposures using PCA, keeping the first few principal components (PCs), and then fit LMM for each outcome separately. The baseline mean model returns the mean of each outcome. We simulate data from the following three scenarios to demonstrate our method’s utility compared to existing approaches. For all scenarios, we generate data for
subjects. We generate the exposures according to a factor model, creating correlated exposures with group structures, similar to the observed phthalates metabolites:
We generate sparse
so that every five metabolites load onto one factor for
and
latent factors. The non-zero entries of
are sampled from
. We generate
outcomes measured at
time points for all subjects:
We use different exposure-response functions including time-varying effects, different distributions for the random intercept , and random error for each scenario. Below is the description of each simulation scenario:
- Scenario 1:
Linear exposure-response function where the first two factors are important, independent responses:
where
is the latent factor-response function. The induced effects of
X range from 0 to 1. Under this setting, all assumptions for PCA-LMM are satisfied, though it does not propagate the uncertainty from the first stage.
- Scenario 2:
Non-linear time-varying exposure-response function where the first two factors are important, positively correlated responses:
where
is the standard normal pdf at
t. Covariances
and
have a composite symmetry structure with a high correlation of 0.7. Under this setting, all assumptions for our BMFR model are satisfied.
- Scenario 3:
Quadratic exposure-response function where three metabolites are important and responses are independent, the most favorable scenario for BVCKMR:
We choose
for PCA and our method, and
for our method, to resemble analyses where
are close to but not exactly the true latent dimension. For predictive performance evaluation, we calculate the mean predictive square error (MPSE) on a test set of 200 subjects at 10 time points. Additionally, to evaluate how well the methods measure the relative importance of each chemical, we calculate the Spearman correlation between the true relative importance rank and the inferred rank of chemicals in
X. The rank is based on the absolute value of the effect at each time point, summed over all time points. We calculate the rank for our model by first calculating the induced effects in the original predictors
X at each time point as shown in [
44]. Similarly, we can calculate the effects from the PCA-LMM analysis in
X as
, where
is the first K left singular vectors, and
is a vector of regression coefficients for the PCs.
Table 2 shows the MPSE results. Note that PCA-LMM performed worst in all scenarios, even when all its assumptions were met. Our method performs best when the data are generated according to its model. When the data are more favorable to the other models, our method still performs better than PCA-LMM.
Table 3 shows the Spearman correlation results. Here, our method performs best in the first two scenarios and is very close to the best in the third scenario.
6. Discussion
This paper assesses the time-varying health effects of prenatal phthalate exposures on adiposity outcomes measured in children from ages 4 to 9 from the MSCEHS cohort study using the BMFR approach we propose. BMFR represents phthalate mixtures as latent factors—a DEHP and a non-DEHP factor—and borrows information across highly correlated adiposity outcomes to improve estimation precision, models potentially non-linear time-varying effects of the latent factors on adiposity outcomes, and fully quantifies uncertainty using state-of-the-art prior specifications. We find that in boys, at younger ages (4–6 years), all phthalate latent factors (DEHP and non-DEHP) show negative associations with adiposity outcomes. After age 7, these associations begin to become positive. In girls, there is no evidence of associations between phthalate components and outcomes. We also find these time-varying effects to be similar across all adiposity outcomes (BMIz, fat mass percentage, waist-to-hip ratio, and waist circumference). Our introduction of a new Bayesian mixture method for estimating time-varying effects with full uncertainty quantification and our finding of sex-specific time-varying associations of prenatal phthalate exposures with childhood obesity between age 4 to 9 are novel.
We were able to estimate sex-specific time-varying effects not previously identified in analyses of the MSCEHS cohort because BMFR is customized to this research question and includes several custom specifications for the data. BMFR applies prior specifications to infer modeling decisions and avoid the artificially narrow confidence intervals that are an unintended consequence of fixing choices in traditional statistical models or multistage analyses. The model includes variable selection priors for covariates, avoiding fixed subsets; Gaussian process priors for flexible modeling of time-varying effects without assuming a functional form; and half-t priors for robust between-subject variances. BMFR represents highly correlated exposures with a small number of latent factors that are independent of each other but correlated with the outcomes, while fully quantifying uncertainty in the number of components through the MGP prior. BMFR also jointly models multiple outcomes of adiposity—BMIz, waist circumference, waist-hip ratio, and fat mass percentage—available in our cohort to improve estimation precision. Finally, BMFR quantifies the uncertainty in imputing missing data and values below the limit of detection (LOD) within the MCMC sampling, propagating this uncertainty into the credible intervals for the effects of interest. An R package for BMFR is freely available at
https://github.com/phuchonguyen/famr (accessed on 20 August 2023). The package implements the MCMC algorithm in
Appendix A in C
++ for optimal computational speed.
Our finding of time-varying effects is similar to results from [
25], which reported that exposure to phthalate in the first trimester was associated with lower BMI six months after birth but higher BMI in older ages, although they did not observe specific sex effects. Other studies have also found a somewhat similar time-varying effect: higher maternal urinary phthalate concentrations associated with lower fetal growth and birth weight, followed by higher growth trajectories later on [
48,
49], with sex-specific effects [
48]. Our results also align with [
20], which found non-DEHP metabolites associated with lower BMI, fat mass, and waist circumference in boys aged 5 and 7, and with [
50], which reported DEHP associated with higher outcomes among boys between ages 8 and 10, and provide a potential explanation for seemingly inconsistent results between the two studies. The time-varying effect we identify in boys may appear null when aggregating across all ages, which is consistent with a previous analysis of this cohort [
42] that estimated average effects over time and did not observe associations with percent fat mass or sex-specific modification. However, there are other studies inconsistent with ours, including reports of DEHP associated with a lower BMI in girls [
14] and a higher weight gain in early childhood that stabilized during puberty in girls [
24]. Our findings may have important implications for pregnancy care guidelines and child health. The time-varying effects of prenatal exposure suggest that these exposures may influence childhood obesity years after the time of exposure. They further suggest that the third trimester of pregnancy may be a vulnerable window of exposure, making interventions to reduce phthalate exposure during pregnancy important.
This study also has limitations and opportunities for future improvements. BMFR does not model nonlinear dose-response relationships. As a result, we did not investigate different effects at different exposure levels, though this could be performed by stratifying the analysis by tertiles of exposure. This may be important, as mixture effects could vary in nonlinear ways across exposure levels [
26,
29]. In this cohort, exposures were measured from a single urine sample, with collection times ranging from 25 to 40 weeks of gestation. Given the short half-lives of urinary phthalate metabolites [
51] and the likely episodic nature of exposures [
42], this limits the precision of exposure measurement. Future research should consider multiple exposure measurements and aim to identify the most critical window of vulnerability during the fetal period to growth trajectories. Data on adiposity outcomes before age 4 and after age 9 are not available in this cohort. Stratification by sex also reduces sample size and power to detect small effects in our analysis. Our findings would be strengthened by replication across a longer time window, from birth through puberty, and with data from a larger cohort study. Finally, residual confounding may remain, as we could not account for child calorie intake, or maternal consumer product preferences, which could influence exposure levels [
42].