# Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{8}

^{9}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Initiative

## 3. Results: Key Advancements Offered by New and Expanded Methods

#### 3.1. Overall Effect Estimation

#### 3.2. Toxic AGENT Identification (Variable Selection)

#### 3.3. Pattern Identification

^{2}MF) also aims to robustly identify exposure patterns. BN

^{2}MF also estimates the number of exposure patterns as one of the model parameters. Furthermore, BN

^{2}MF also provides confidence estimates around the estimated parameters, quantifying the model’s confidence in the estimation of these parameters [7].

#### 3.4. A Priori Defined Groups

#### 3.5. Interactions and Non-Linearities

## 4. Other Statistical Advancements for Mixtures

#### 4.1. Data Science and Data Preparation Strategies

#### 4.2. Estimation of the Exposure-Response Surface

#### 4.3. Timing of Exposures and Periods of Susceptibility

#### 4.4. Epidemiological Methods and Causal Models

#### 4.5. Toxicity and Related Chemical Information

#### 4.6. Spatiotemporal Methods

#### 4.7. Risk Assessment and Regulatory Relevance

#### 4.8. Model Performance, Efficiency, and Interpretation

## 5. Software

## 6. Discussion

## 7. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Taylor, K.W.; Joubert, B.R.; Braun, J.M.; Dilworth, C.; Gennings, C.; Hauser, R.; Heindel, J.J.; Rider, C.V.; Webster, T.F.; Carlin, D.J. Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology: Lessons from an Innovative Workshop. Environ. Health Perspect.
**2016**, 124, A227–A229. [Google Scholar] [CrossRef] - Carlin, D.J.; Rider, C.V.; Woychik, R.; Birnbaum, L.S. Unraveling the health effects of environmental mixtures: An NIEHS priority. Environ. Health Perspect.
**2013**, 121, A6–A8. [Google Scholar] [CrossRef] [PubMed] - NIEHS. Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME). Available online: https://grants.nih.gov/grants/guide/rfa-files/RFA-ES-17-001.html (accessed on 16 November 2021).
- Devick, K.L.; Bobb, J.F.; Mazumdar, M.; Henn, B.C.; Bellinger, D.C.; Christiani, D.C.; Wright, R.O.; Williams, P.L.; Coull, B.A.; Valeri, L. Bayesian kernel machine regression-causal mediation analysis. arXiv
**2018**, arXiv:1811.10453. [Google Scholar] [CrossRef] - McGee, G.; Wilson, A.; Webster, T.F.; Coull, B.A. Bayesian Multiple Index Models for Environmental Mixtures. arXiv
**2021**, arXiv:2101.05352. [Google Scholar] [CrossRef] - Weisskopf, M.G.; Seals, R.M.; Webster, T.F. Bias Amplification in Epidemiologic Analysis of Exposure to Mixtures. Environ. Health Perspect.
**2018**, 126, 047003. [Google Scholar] [CrossRef] - Gibson, E.A.; Rowland, S.T.; Goldsmith, J.; Paisley, J.; Herbstman, J.B.; Kiourmourtzoglou, M.-A. Bayesian non-parametric non-negative matrix factorization for pattern identification in environmental mixtures. arXiv
**2021**, arXiv:2109.12164. [Google Scholar] - Gibson, E.A.; Zhang, J.; Yan, J.; Chillrud, L.; Benavides, J.; Nunez, Y.; Herbstman, J.B.; Goldsmith, J.; Wright, J.; Kioumourtzoglou, M.-A. Principal Component Pursuit for Pattern Identification in Environmental Mixtures. arXiv
**2021**, arXiv:2111.00104. [Google Scholar] - Jin, B.; Peruzzi, M.; Dunson, D.B. Bag of DAGs: Flexible & Scalable Modeling of Spatiotemporal Dependence. arXiv
**2021**, arXiv:2112.11870. [Google Scholar] - Jin, B.; Dunson, D.B.; Rager, J.E.; Reif, D.; Engel, S.M.; Herring, A.H. Bayesian Matrix Completion for Hypothesis Testing. arXiv
**2020**, arXiv:2009.08405. [Google Scholar] - Moran, K.R.; Dunson, D.; Wheeler, M.W.; Herring, A.H. Bayesian joint modeling of chemical structure and dose response curves. Ann. Appl. Stat.
**2021**, 15, 1405–1430. [Google Scholar] [CrossRef] - Ferrari, F.; Dunson, D.B. Bayesian factor analysis for inference on interactions. J. Am. Stat. Assoc.
**2020**, 116, 1521–1532. [Google Scholar] [CrossRef] [PubMed][Green Version] - Schiavon, L.; Canale, A.; Dunson, D.B. Generalized infinite factorization models. arXiv
**2021**, arXiv:2103.10333. [Google Scholar] [CrossRef] - Dunson, D.B.; Wu, H.-T.; Wu, N. Diffusion based gaussian processes on restricted domains. arXiv
**2020**, arXiv:2010.07242. [Google Scholar] - Peruzzi, M.; Banerjee, S.; Dunson, D.B.; Finley, A.O. Grid-Parametrize-Split (GriPS) for improved scalable inference in spatial big data analysis. arXiv
**2021**, arXiv:2101.03579. [Google Scholar] - Ferrari, F.; Dunson, D.B. Identifying main effects and interactions among exposures using Gaussian processes. Ann. Appl. Stat.
**2020**, 14, 1743–1758. [Google Scholar] [CrossRef] - Dunson, D.B.; Wu, N. Inferring Manifolds From Noisy Data Using Gaussian Processes. arXiv
**2021**, arXiv:2110.07478. [Google Scholar] - Roy, A.; Lavine, I.; Herring, A.H.; Dunson, D.B. Perturbed factor analysis: Accounting for group differences in exposure profiles. Ann. Appl. Stat.
**2021**, 15, 1386–1404. [Google Scholar] [CrossRef] - Poworoznek, E.; Ferrari, F.; Dunson, D. Efficiently resolving rotational ambiguity in Bayesian matrix sampling with matching. arXiv
**2021**, arXiv:2107.13783. [Google Scholar] - Peruzzi, M.; Dunson, D.B. Spatial multivariate trees for big data Bayesian regression. arXiv
**2020**, arXiv:2012.00943. [Google Scholar] - Gennings, C.; Shu, H.; Rudén, C.; Öberg, M.; Lindh, C.; Kiviranta, H.; Bornehag, C.-G. Incorporating regulatory guideline values in analysis of epidemiology data. Environ. Int.
**2018**, 120, 535–543. [Google Scholar] [CrossRef] - Antonelli, J.; Wilson, A.; Coull, B. Multiple exposure distributed lag models with variable selection. arXiv
**2021**, arXiv:2107.14567. [Google Scholar] - Wilson, A.; Hsu, H.H.L.; Chiu, Y.H.M.; Wright, R.O.; Wright, R.J.; Coull, B.A. Kernel Machine and Distributed Lag Models for Assessing Windows of Susceptibility to Environmental Mixtures in Children’s Health Studies. Ann. Appl. Stat. 2021, in press.
- Liu, J.Z.; Lee, J.; Lin, P.I.D.; Valeri, L.; Christiani, D.C.; Bellinger, D.C.; Wright, R.O.; Mazumdar, M.M.; Coull, B.A. A Cross-validated Ensemble Approach to Robust Hypothesis Testing of Continuous Nonlinear Interactions: Application to Nutrition-Environment Studies. arXiv
**2019**, arXiv:1904.10918. [Google Scholar] [CrossRef] - Mork, D.; Wilson, A. Estimating perinatal critical windows of susceptibility to environmental mixtures via structured Bayesian regression tree pairs. arXiv
**2021**, arXiv:2102.09071. [Google Scholar] [CrossRef] [PubMed] - Mork, D.; Wilson, A. Treed distributed lag nonlinear models. arXiv
**2021**, arXiv:2010.06147. [Google Scholar] [CrossRef] [PubMed] - Mork, D.; Kioumourtzoglou, M.-A.; Weisskopf, M.; Coull, B.A.; Wilson, A. Heterogeneous Distributed Lag Models to Estimate Personalized Effects of Maternal Exposures to Air Pollution. arXiv
**2021**, arXiv:2109.13763. [Google Scholar] - Gennings, C.; Curtin, P.; Bello, G.; Wright, R.; Arora, M.; Austin, C. Lagged WQS regression for mixtures with many components. Environ. Res.
**2020**, 186, 109529. [Google Scholar] [CrossRef] - Antonelli, J.; Mazumdar, M.; Bellinger, D.; Christiani, D.; Wright, R.; Coull, B. Estimating the health effects of environmental mixtures using Bayesian semiparametric regression and sparsity inducing priors. Ann. Appl. Stat.
**2020**, 14, 257–275. [Google Scholar] [CrossRef][Green Version] - Tanner, E.M.; Bornehag, C.-G.; Gennings, C. Repeated holdout validation for weighted quantile sum regression. MethodsX
**2019**, 6, 2855–2860. [Google Scholar] [CrossRef] - Sonabend, A.; Zhang, J.; Schwartz, J.; Coull, B.A.; Lu, J. Scalable Gaussian Process Regression Via Median Posterior Inference for Estimating Multi-Pollutant Mixture Health Effects. Available online: https://slideslive.com/38940890/scalable-gaussian-process-regression-via-median-posterior-inference-for-estimating-multipollutant-mixture-health-effects (accessed on 16 November 2021).
- Feldman, J.; Kowal, D. A Bayesian Framework for Generation of Fully Synthetic Mixed Datasets. arXiv
**2021**, arXiv:2102.08255. [Google Scholar] - Kowal, D. Bayesian subset selection and variable importance for interpretable prediction and classification. arXiv
**2021**, arXiv:2104.10150. [Google Scholar] - Kowal, D.R.; Bravo, M.; Leong, H.; Bui, A.; Griffin, R.J.; Ensor, K.B.; Miranda, M.L. Bayesian variable selection for understanding mixtures in environmental exposures. Stat. Med.
**2021**, 40, 4850–4871. [Google Scholar] [CrossRef] [PubMed] - Kowal, D.R. Fast, Optimal, and Targeted Predictions Using Parameterized Decision Analysis. J. Am. Stat. Assoc.
**2021**, 1–12. [Google Scholar] [CrossRef] - Schedler, J.C.; Ensor, K.B. A spatiotemporal case-crossover model of asthma exacerbation in the City of Houston. Stat
**2021**, 10, e357. [Google Scholar] [CrossRef] - Actkinson, B.; Ensor, K.; Griffin, R.J. SIBaR: A new method for background quantification and removal from mobile air pollution measurements. Atmos. Meas. Tech.
**2021**, 14, 5809–5821. [Google Scholar] [CrossRef] - Li, H. Mvnimpute. Available online: https://github.com/hli226/mvnimpute (accessed on 16 November 2021).
- Chen, H.Y. Statistical inference on explained variation in high-dimensional linear model with dense effects. arXiv
**2022**, arXiv:2201.08723. [Google Scholar] - Chen, H.Y. Semiparametric Odds Ratio Model and Its Applications; Chapman and Hall/CRC: Boca Raton, FL, USA, 2022. [Google Scholar]
- Chen, H.Y.; Li, H.; Argos, M.; Persky, V.; Turyk, M. Statistical methods for assessing explained variations of a health outcome by mixtures of exposures. Prep. Spec. Issue Int. J. Environ. Res. Public Health
**2022**. [Google Scholar] - Tanner, E.; Lee, A.; Colicino, E. Environmental mixtures and children’s health: Identifying appropriate statistical approaches. Curr. Opin. Pediatr.
**2020**, 32, 315–320. [Google Scholar] [CrossRef] - Hamra, G.B.; Buckley, J.P. Environmental exposure mixtures: Questions and methods to address them. Curr. Epidemiol. Rep.
**2018**, 5, 160–165. [Google Scholar] [CrossRef] - Gibson, E.A.; Nunez, Y.; Abuawad, A.; Zota, A.R.; Renzetti, S.; Devick, K.L.; Gennings, C.; Goldsmith, J.; Coull, B.A.; Kioumourtzoglou, M.A. An overview of methods to address distinct research questions on environmental mixtures: An application to persistent organic pollutants and leukocyte telomere length. Environ. Health
**2019**, 18, 76. [Google Scholar] [CrossRef][Green Version] - Davalos, A.D.; Luben, T.J.; Herring, A.H.; Sacks, J.D. Current approaches used in epidemiologic studies to examine short-term multipollutant air pollution exposures. Ann. Epidemiol.
**2017**, 27, 145–153.e1. [Google Scholar] [CrossRef] [PubMed][Green Version] - Carrico, C.; Gennings, C.; Wheeler, D.C.; Factor-Litvak, P. Characterization of Weighted Quantile Sum Regression for Highly Correlated Data in a Risk Analysis Setting. J. Agric. Biol. Environ. Stat.
**2015**, 20, 100–120. [Google Scholar] [CrossRef] [PubMed] - Keil, A.P.; Buckley, J.P.; O’Brien, K.M.; Ferguson, K.K.; Zhao, S.; White, A.J. A Quantile-Based g-Computation Approach to Addressing the Effects of Exposure Mixtures. Environ. Health Perspect.
**2020**, 128, 47004. [Google Scholar] [CrossRef] [PubMed][Green Version] - Bobb, J.F.; Valeri, L.; Claus Henn, B.; Christiani, D.C.; Wright, R.O.; Mazumdar, M.; Godleski, J.J.; Coull, B.A. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics
**2015**, 16, 493–508. [Google Scholar] [CrossRef] - Wood, S. Generalized Additive Models: An Introduction with R; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
- Yang, J.; Benyamin, B.; McEvoy, B.P.; Gordon, S.; Henders, A.K.; Nyholt, D.R.; Madden, P.A.; Heath, A.C.; Martin, N.G.; Montgomery, G.W.; et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet.
**2010**, 42, 565–569. [Google Scholar] [CrossRef][Green Version] - Janson, L.; Barber, R.F.; Candes, E. EigenPrism: Inference for high dimensional signal-to-noise ratios. J. R. Stat. Soc. Ser. B Stat. Methodol.
**2017**, 79, 1037–1065. [Google Scholar] [CrossRef][Green Version] - De Vito, R.; Bellio, R.; Trippa, L.; Parmigiani, G. Bayesian multi-study factor analysis for high-throughput biological data. arXiv
**2018**, arXiv:1806.09896. [Google Scholar] - De Vito, R.; Bellio, R. Multi-Study Factor Analysis (MSFA). Biometrics
**2019**, 75, 337–346. [Google Scholar] [CrossRef][Green Version] - Bhattacharya, A.; Dunson, D.B. Sparse Bayesian infinite factor models. Biometrika
**2011**, 98, 291–306. [Google Scholar] [CrossRef] - Tikhonov, G.; Opedal, O.H.; Abrego, N.; Lehikoinen, A.; de Jonge, M.M.J.; Oksanen, J.; Ovaskainen, O. Joint species distribution modelling with the r-package Hmsc. Methods Ecol. Evol.
**2020**, 11, 442–447. [Google Scholar] [CrossRef] - Papastamoulis, P.; Ntzoufras, I. On the identifiability of Bayesian factor analytic models. arXiv
**2020**, arXiv:2004.05105. [Google Scholar] - Mardia, K.V.; Kent, J.T.; Bibby, J.M. Multivariate Analysis; Academic Press: San Diego, CA, USA, 1979. [Google Scholar]
- Wang, C.; Jiang, B.; Zhu, L. Penalized interaction estimation for ultrahigh dimensional quadratic regression. arXiv
**2019**, arXiv:1901.07147. [Google Scholar] [CrossRef] - Hao, N.; Feng, Y.; Zhang, H.H. Model Selection for High-Dimensional Quadratic Regression via Regularization. J. Am. Stat. Assoc.
**2018**, 113, 615–625. [Google Scholar] [CrossRef][Green Version] - Haris, A.; Witten, D.; Simon, N. Convex Modeling of Interactions with Strong Heredity. J. Comput. Graph. Stat.
**2016**, 25, 981–1004. [Google Scholar] [CrossRef] [PubMed][Green Version] - Bien, J.; Taylor, J.; Tibshirani, R. A Lasso for Hierarchical Interactions. Ann. Stat.
**2013**, 41, 1111–1141. [Google Scholar] [CrossRef] - Bravo, M.A.; Miranda, M.L. Effects of accumulated environmental, social and host exposures on early childhood educational outcomes. Environ. Res.
**2021**, 198, 111241. [Google Scholar] [CrossRef] - McGee, G.; Wilson, A.; Coull, B.A.; Webster, T.F. Incorporating biological knowledge in analyses of environmental mixtures. arXiv
**2022**. forthcoming. [Google Scholar] - Buckley, J.P.; Hamra, G.B.; Braun, J.M. Statistical Approaches for Investigating Periods of Susceptibility in Children’s Environmental Health Research. Curr. Environ. Health Rep.
**2019**, 6, 1–7. [Google Scholar] [CrossRef] - Gasparrini, A. Distributed Lag Linear and Non-Linear Models in R: The Package dlnm. J. Stat. Softw.
**2011**, 43, 1–20. [Google Scholar] [CrossRef][Green Version] - Sanchez, B.N.; Hu, H.; Litman, H.J.; Tellez-Rojo, M.M. Statistical methods to study timing of vulnerability with sparsely sampled data on environmental toxicants. Environ. Health Perspect.
**2011**, 119, 409–415. [Google Scholar] [CrossRef] - Bello, G.A.; Arora, M.; Austin, C.; Horton, M.K.; Wright, R.O.; Gennings, C. Extending the Distributed Lag Model framework to handle chemical mixtures. Environ. Res.
**2017**, 156, 253–264. [Google Scholar] [CrossRef] [PubMed] - Liu, S.H.; Bobb, J.F.; Lee, K.H.; Gennings, C.; Claus Henn, B.; Bellinger, D.; Austin, C.; Schnaas, L.; Tellez-Rojo, M.M.; Hu, H.; et al. Lagged kernel machine regression for identifying time windows of susceptibility to exposures of complex mixtures. Biostatistics
**2018**, 19, 325–341. [Google Scholar] [CrossRef] [PubMed] - Webster, T.F.; Weisskopf, M.G. Epidemiology of exposure to mixtures: We can’t be casual about causail;ty when using or testing methods. arXiv
**2020**, arXiv:2007.01370. [Google Scholar] - Bobb, J.F.; Claus Henn, B.; Valeri, L.; Coull, B.A. Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ. Health
**2018**, 17, 67. [Google Scholar] [CrossRef][Green Version] - Peruzzi, M. Multi-Outcome Meshed Gaussian Processes on Projected Inputs for Scalable Inference with Exposome Data. Available online: https://www.isglobal.org/documents/10179/9117539/1_Topic5_Michele_Peruzzi.pdf/30c5e893-f2da-4452-b2c4-a3e075425b77 (accessed on 16 December 2021).
- Environmental Mixtures Workshop: Applications in Environmental Health Studies. Available online: http://www.publichealth.columbia.edu/research/precision-prevention/workshop-analyzing-mixtures-environmental-health-studies (accessed on 16 November 2021).

**Figure 1.**Mixtures Methods x Research Questions

^{1}: Methods Preceding PRIME.

^{1}Research questions following Gibson et al., 2019 review: (1) Overall effect estimation: What is the overall effect of the mixture and what is the magnitude of association? (2) Toxic agent identification: Which congeners or chemicals are associated with the outcome? What congeners/chemicals are most important? (3) Pattern identification: Are there specific exposure patterns in the data? (4) A priori defined groups: What are the associations between an outcome and a priori defined groups of exposures? (5) Interactions and non-linearities: Are there interactions between exposures? Is the exposure-response surface non-linear? (6) Exposure-response relationship: What is the exposure-response relationship between each chemical and the outcome? Because almost all methods that investigate interactions also characterize potentially nonlinear exposure-response functions, we group questions #5 and #6 into a single bubble in this figure.

**Figure 2.**Mixtures Methods x Research Questions

^{1}: Highlighted Methods from PRIME.

^{1}Research questions following Gibson et al., 2019 review: (1) Overall effect estimation: What is the overall effect of the mixture and what is the magnitude of association? (2) Toxic agent identification: Which congeners or chemicals are associated with the outcome? What congeners/chemicals are most important? (3) Pattern identification: Are there specific exposure patterns in the data? These can be managed with clustering and dimension reduction methods. (4) A priori defined groups: What are the associations between an outcome and a priori defined groups of exposures? (5) Interactions and non-linearities: Are there interactions between exposures? Is the exposure-response surface non-linear? (6) Exposure-response relationship: What is the exposure-response relationship between each chemical and the outcome? Because almost all methods that investigate interactions also characterize potentially nonlinear exposure-response functions, we group questions #5 and #6 into a single bubble in this figure.

Project (Institutions(s)) ^{1} | Summary | Exposures ^{2} | Study Populations ^{3} |
---|---|---|---|

Development and testing of response surface methods for investigating the epidemiology of exposure to mixtures (BU/Harvard) | Combines aspects of response surface modeling with index methods into the Bayesian Multiple Index Method (BMIM) and incorporates toxicological information. Special cases are a single index model and a full response surface of all exposures as in BKMR. | Dioxin-like compounds, PCBs, phthalates, parabens, bisphenols triclosan, UV filters, BFRs, PBDEs | RCC, EARTH |

Principal Component Pursuit to assess exposure to environmental mixtures in epidemiologic studies (Columbia) | Adapts the method Principal Component Pursuit (PCP), used in computer vision applications, to the epidemiologic setting of mixtures of environmental pollutants. | PCBs, metals, air pollution | CHDS, CCCEH, SHS, SPARCS |

Structured nonparametric methods for mixtures of exposures (Duke) | Incorporates chemical structure data and mechanistic constraints into nonparametric Bayesian regression methods to improve stability, performance, and interpretation in estimating dose response. Supplemental funding develops Bayesian modeling frameworks for including exposures in epidemiological models of infectious disease spread, as well as flexible spatiotemporal modeling with applications to study exposure effects on COVID-19 hospitalizations. | Phenols, OPs, perchlorate, PFCs, phthalates, BFRs, PAHs, pyrethroids, air pollutants | MSSM, NHANES, CHAMACOS, CLEAR, CDC COVID Data Tracker, NYTimes COVID Data, State Population by Characteristics |

Methods for data integration and risk assessment for environmental mixtures (MSSM/Harvard) | Integrates temporally resolved exposure into models, evaluates how early (“priming” or “protective”) exposures can impact susceptibility to later exposures, and estimates regulatory guideline values for mixtures. | Tooth metal biomarkers; EDCs, dietary data | Colorado birth data; SELMA |

Bringing Modern Data Science Tools to Bear on Environmental Mixtures (Notre Dame/Rice) | Develops data architecture to capture complex spatial location data for families, environmental exposures, and social stressors that vary over time. Leverages modern data science by applying rapidly evolving techniques for architecting data combined with hierarchical Bayesian models with variable selection, spatial models, and machine learning algorithms to large-scale environmental mixture and social exposure datasets of direct importance to child outcomes. | Air pollution, lead, social stressors | Aggregate North Carolina birth records, blood lead surveillance data, and educational system data to social and environmental exposures |

Innovative Methodologic Advances for Mixtures Research in Epidemiology (UI Chicago) | Adapts genomics approaches to evaluate the total main effects and interactions of chemical exposures. Applies novel multivariate models for analyzing the complex relationship between health outcomes, biological intermediates, and environmental pollutants. | POPs, PCBs, OCPs, BFRs, PFCs, dioxins, heavy metals | NHANES, GLFCS, HCHS/SOL |

^{1}Listed in alphabetical order, by institution. Project details available at NIH RePORTER: https://reporter.nih.gov/, accessed on 21 December 2021. Institutions: Columbia University Mailman School of Public Health, University of Illinois Chicago, Icahn School of Medicine at Mount Sinai, Harvard T.H. Chan School of Public Health, University of Notre Dame, Rice University, Boston University School of Public Health, Duke University.

^{2}BFRs: Brominated Flame Retardants; EDCs: Endocrine Disrupting Chemicals, OCPs: Organochlorine Pesticides; OPs: Organophosphorus Pesticides; PAHs: Polycyclic Aromatic Hydrocarbons; PBDEs: Polybrominated Diphenyl Ethers; PCBs: Polychlorinated Biphenyls; PFCs: Perfluorinated Chemicals; POPs: Persistent Organic Pollutants; UV: Ultraviolet.

^{3}CCCEH: Columbia Center for Children’s Environmental Health; CDC COVID Data Tracker: https://covid.cdc.gov/covid-data-tracker/#variant-proportions and https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-Jurisdi/unsk-b7fc, accessed on 21 December 2021; CHAMACOS: Center for the Health Assessment of Mothers and Children of Salinas; CHDS: Child Health and Development Studies; CLEAR: Climate Change, Environmental Contaminants and Reproductive Health; EARTH: Environment And Reproductive Health cohort; GLFCS: Great Lakes Fish Consumption Study; HCHS-SOL: Hispanic Community Health Study/Study of Latinos; MSSM: Mount Sinai Children’s Environmental Health Study; NHANES: National Health and Nutrition Examination Survey; NYTimes COVID Data: https://github.com/nytimes/covid-19-data, accessed on 21 December, 2021 RCC: Russian Children’s Cohort; SELMA: Swedish Environmental Longitudinal Mother and child, Asthma and allergy study; SHS: Strong Heart Study; SPARCS: NY Statewide Planning and Research Cooperative System; State Population by Characteristics: published by the U.S. Census Bureau breaks down 2019 U.S. state populations by Age. From Single Year of Age and Sex Population Estimates: 1 April 2010 to 1 July 2019—CIVILIAN (SC-EST2019-AGESEX-CIV) https://www.census.gov/data/tables/time-series/demo/popest/2010s-state-detail.html, accessed on 21 December 2021, WAS: Wisconsin Angler Study.

Project ^{1} | Method Acronym | Method Title | Summary | Reference |
---|---|---|---|---|

BU/ Harvard | BKMR-CMA | Bayesian Kernel Machine Regression-Causal Mediation Analysis | Performs a causal mediation analysis when exposure within the mediation framework is a mixture. Estimates a multivariate exposure response surface in a model for the mediator given exposure, and another for the outcome given the mediator and the outcome, both using BKMR. | [4] |

BU/ Harvard | BMIM | Bayesian Multiple Index Model | Unifies exposure index models with the response surface method BKMR, allowing a spectrum of intermediate models of multiple indices. Models non-linear, non-additive relationships between indices and an outcome. Special cases are a single exposure index and a response surface of all exposures. | [5] |

BU/ Harvard | DAG analysis | Use of causal methods for determining which exposures to include in a model | Applies directed acyclic graphs (DAGs) to determine inclusion of exposure variables. In some circumstances, including an exposure variable can increase bias. Determines causal relationships between exposures (or groups of exposures) and a health outcome. | [6] |

Columbia | BN2MF | Bayesian Non-parametric non-negative Matrix Factorization | Matrix factorization that provides non-negative (and more interpretable) solutions for factors and loadings and uncertainty estimates for the estimated parameters. Used for exposure pattern identification, similar to PCP. | [7] |

Columbia | PCP | Principal Component Pursuit | Unsupervised robust exposure pattern identification. Decomposes exposure matrix into a low-rank matrix (consistent patterns) and a sparse matrix (unique exposure events). Robust exposure pattern identification. | [8] |

Duke | BAG | Bag of DAGs | A computationally efficient method to construct a class of non-stationary spatiotemporal processes in point-referenced geostatistical models. Accounts for uncertainty in directions of association over space and time by considering a mixture of direct acyclic graphs (DAGs) | [9] |

Duke | BMC | Bayesian Matrix Completion for hypothesis testing | Bayesian inference about chemical activity on mean and variance of dose-response measurements accounting for sparsity of data. Used to characterize chemical activity and its uncertainty. | [10] |

Duke | BS3FA | Bayesian partially supervised sparse and smooth factor analysis | Bayesian inference on how chemical structure relates to variation in dose-response measurements. Addresses how to jointly model structural variability in molecular features of a chemical and its dose-response profile. | [11] |

Duke | FIN | Factor analysis for interactions | Bayesian factor analysis for inference on interactions. Estimates interactions between highly correlated chemical exposures and effect on health outcomes. | [12] |

Duke | GIF-SIS | Generalized infinite factor model | Shrinkage prior to the loadings matrix of infinite factor models that incorporate meta covariates to inform the sparsity structure and has desirable shrinkage properties. Addresses how to incorporate a priori known structure among variables when fitting a member of the broad class of factorization models. | [13] |

Duke | GL-GPs | Graph Laplacian based Gaussian Process | Gaussian process model with a covariance function that respects the geometry of highly restricted or nonlinear domains. Develops a covariance function for nonparametric regression that respects the intrinsic geometry of the domain without sacrificing computational tractability. | [14] |

Duke | GriPS | Computational improvements for Bayesian multivariate regression models based on latent meshed gaussian processes | Computational improvements for Bayesian multivariate regression models based on latent Meshed Gaussian Processes. Addresses how to efficiently solve the big-n problem for GPs when the number of outcomes is large. | [15] |

Duke | MixSelect | Identifying main effects and interactions among exposures using Gaussian processes | Identifies main effects and interactions among exposures using Gaussian processes. Addresses how to model potentially non-linear effects and high-order interactions of chemical exposures on health outcomes. | [16] |

Duke | MrGap | Manifold Reconstruction via Gaussian Process | Local covariance Gaussian process model for estimating a manifold in high dimensional space from noisy data. Conducts inference on a low-dimensional, nonlinear manifold in high dimensional space when data are subject to measurement error. | [17] |

Duke | PFA | Perturbed factor analysis | Factor analysis that captures common structure among groups of related observations. Distinguishes shared and group-specific covariance structure and expresses shared structure via a set of shared factors. | [18] |

Duke | MatchAlign | Resolving rotational ambiguity in matrix sampling | Efficiently resolving rotational ambiguity in Bayesian matrix sampling with matching. Does inference on unidentifiable random matrices. | [19] |

Duke | SPAMTREE | Spatial Multivariate Trees | Bayesian multivariate regression methods for big data using sparse treed Gaussian processes. Jointly models several imbalanced variables flexibly and scalably via GPs | [20] |

MSSM/ Harvard | ACR | Acceptable Concentration Range model | New class of nonlinear statistical models for human data that incorporates and evaluates regulatory guideline values into analyses of health effects of exposure to chemical mixtures. Allows for human data to suggest points of departure for comparison to in vivo estimates from single chemicals. | [21] |

MSSM/ Harvard | Mult DLAG | Multiple exposure distributed lag models with variable selection | A method to identify the presence of time-dependent interactions (interactions among chemical exposures experienced during different exposure windows) in a critical windows analysis. Identifies critical windows of exposure to multiple chemicals, and whether exposures experienced at different developmental windows interact with one another on a health outcome. | [22] |

MSSM/ Harvard | BKMR-DLM | Bayesian Kernel Machine Regression-Distributed Lag Model | Develops distributed lag models for assessing critical windows of exposure associated with a mixture. The model simultaneously estimates a time-weighted combination of each exposure and estimates a multivariate exposures-response surface of these time-weighted exposures using BKMR. | [23] |

MSSM/ Harvard | CVEK | Cross-validated kernel ensemble | Performs tests of interaction between two sets of exposures (i.e., two mixtures) while placing minimal assumptions on the main effects of each mixture. Asks whether one mixture (e.g., a collection of nutrients) modifies the effect of another (e.g., a metal mixture) as a whole. | [24] |

MSSM/ Harvard | Bayes Tree Pairs | Bayesian Regression Tree Pairs | Estimates critical windows of susceptibility to an environmental mixture. Uses an additive ensemble of tree pairs to estimate main effects and interactions between time-resolved predictors with variable selection. | [25] |

MSSM/ Harvard | DLMtree | Bayesian Treed Distributed Lab Models | Distributed lag linear and non-linear models. Method to improve the precision of critical window identification compared to methods that use spline or penalized spline basis functions. Interest focuses on identifying critical windows of exposure using data on a single exposure measured over time. | [26] |

MSSM/ Harvard | Het-DLM | Heterogeneous distributed lag models | Methods for precision children’s environmental health—that is, methods to identify subject characteristics (child sex, maternal age, etc.) that modify distributed lag effects of exposure. Addresses which subjects exhibit the strongest associations with an exposure measured over multiple developmental windows, and whether the critical windows of exposure vary among subgroups. | [27] |

MSSM/ Harvard | LWQS | Lagged Weighted Quantile Sum (WQS) regression | Uses a reverse distributed lag model for assessing critical windows of exposure associated with a mixture when the exposure temporal pattern differs across subjects. Can also incorporate strata-specific associations. Useful for identifying time-varying associations of a mixture effect and later life health/developmental outcomes. | [28] |

MSSM/ Harvard | NLinteraction | Bayesian semiparametric regression with sparsity inducing priors | Estimates effects of environmental mixtures to allow for interactions of any order. Provides variable importance measures for both main effects and interactions among exposures within a mixture, while making minimal assumptions on the forms of those effects. | [29] |

MSSM/ Harvard | RH-WQS | Repeated holdout Weighted Quantile Sum (WQS) regression | Generalizes WQS regression to include repeated holdout random data splits. Estimates a mixture effect using an empirically estimated weighted index. | [30] |

MSSM/ Harvard | SGP-MPI | Scalable Gaussian Process regression via Median Posterior Inference | Takes a split-and-conquer strategy to fitting BKMR to big data. Yields summaries of the multivariate exposure-response surface, as well as variable importance measures of each individual exposure. | [31] |

ND/Rice | BDS | Bayesian Data Synthesis | A Bayesian framework used to simulate fully synthetic datasets of mixed data types. The dataset may be comprised of mixed categorical, binary, count, and continuous datatypes. Can handle missing data and has customized metrics for attributing risk disclosure and other privacy concerns. | [32] |

ND/Rice | BSSVI | Bayesian subset selection and variable importance for interpretable prediction and classification | Used to collect and summarize all near-optimal subset models to provide a complete predictive picture. Useful in the presence of correlated covariates, weak signals, and/or small sample sizes, where different subsets may be indistinguishable in their predictive accuracy. | [33] |

ND/Rice | BVSM | Bayesian variable selection for understanding mixtures in environmental exposures | Variable selection via sparse summaries of a linear regression model. Given a Bayesian regression model with social and environmental covariates, addresses which variables matter most for predicting educational outcomes. | [34] |

ND/Rice | FOTP | Fast, optimal, and targeted predictions using parameterized decision analysis | Computes targeted summaries and prediction for specific decision tasks. Given a target (or functional) of interest and a Bayesian model, constructs accurate, simple, and efficient predictions of future values or functionals of future values. Model summaries can be customized for each functionality. | [35] |

ND/Rice | SCC | Spatiotemporal case-crossover | Presents a strategy for the case-crossover study design in a spatial-temporal setting. Incorporates a temporal case-crossover and a geometrically aware spatial random effect based on the Hausdorff distance. | [36] |

ND/Rice | SiBAR | State Informed Background Removal | Computational technique to quantify ‘background’ versus ‘source influenced’ contributions to air pollutant time series. Addresses whether a hidden Markov model can be used and what the ‘background’ levels of pollutants are measured across an urban area. | [37] |

UI Chicago | MVNimpute | Imputation of multivariate data by normal model | Implements multiple imputation to the data when there are missing and/or censored values. | [38] |

UI Chicago | SPORM | Semi-Parametric Odds Ratio Model | Flexible semiparametric model for estimating complex relationship among multiple variables. Associations are modeled by odds ratio functions. | [14,39] |

UI Chicago | TEV | Estimation and inference on the explained variation parameter | Estimates the explained variation of an outcome by a set of mixture pollutants. | [40,41] |

^{1}Listed in alphabetical order, by institution. Project details available at NIH RePORTER: https://reporter.nih.gov/, accessed on 21 December 2021. Institutions: Columbia University Mailman School of Public Health, University of Illinois Chicago, Icahn School of Medicine at Mount Sinai, Harvard T.H. Chan School of Public Health, University of Notre Dame, Rice University, Boston University School of Public Health, Duke University.

Method Acronym ^{2} | Overall Effect Estimation | Toxic Agent Identification (Variable Selection) | Pattern Identification | A Priori Defined Groups | Interactions and Non-Linearities |
---|---|---|---|---|---|

FIN | X | X | X | X | |

BSSVI | X | X | X | X | |

SGP-MPI | X | X | X | ||

RH-WQS | x | X | |||

Mult DLAG | X | X | X | ||

MatchAlign | X | X | X | ||

LWQS | x | X | |||

GriPS | X | X | X | ||

DLMtree | X | X | X | ||

DAG analysis | X | X | |||

BVSM | X | X | X | ||

BMIM | X | X | X | X | |

BKMR-DLM | X | X | X | ||

BKMR-CMA | X | X | X | X | |

Bayes Tree Pairs | X | X | X | ||

ACR | X | X | |||

SPAMTREE | X | X | X | ||

FOTP | X | X | X | ||

BAG | X | X | X | ||

TEV | X | X | |||

SCC | X | ||||

GL-GPs | X | X | |||

BDS | X | X | |||

SPORM | X | X | X | X | |

SiBAR | X | X | |||

BS3FA | X | X | |||

NLinteraction | X | X | |||

Het-DLM | X | ||||

BMC | X | ||||

PFA | X | ||||

PCP | X | ||||

MrGap | X | ||||

MixSelect | X | ||||

GIF-SIS | X | X | |||

BN2MF | X | ||||

CVEK | X | X |

^{1}Research questions following Gibson et al., 2019 review: (1) Overall effect estimation: What is the overall effect of the mixture and what is the magnitude of association? (2) Toxic agent identification: Which congeners or chemicals are associated with the outcome? What congeners/chemicals are most important? (3) Pattern identification: Are there specific exposure patterns in the data? These can be managed with clustering and dimension reduction methods. (4) A priori defined groups: What are the associations between an outcome and a priori defined groups of exposures? (5) Interactions and non-linearities: Are there interactions between exposures? (6) Exposure-response relationship: What is the exposure-response relationship between each chemical and the outcome? Because almost all methods that investigate interactions also characterize potentially nonlinear exposure-response functions, we group questions #5 and #6 into a single column in this Table.

^{2}Method acronyms: ACR: Acceptable Concentration Range model; Bayes Tree Pairs: Bayesian Regression Tree Pairs; BAG: Bag of DAGs; BDS: Bayesian Data Synthesis; BKMR-CMA: Bayesian Kernel Machine Regression Causal Mediation Analysis; BKMR-DLM: Bayesian Kernel Machine Regression-Distributed Lag Model; BMC: Bayesian Matrix Completion for hypothesis testing; BMIM: Bayesian Multiple Index Model; BN2MF: Bayesian Non-parametric non-negative Matrix Factorization; BS3FA: Bayesian partially supervised sparse and smooth factor analysis; BSSVI: Bayesian subset selection and variable importance for interpretable prediction and classification; BVSM: Bayesian variable selection for understanding mixtures in environmental exposures; CVEK: Cross-validated kernel ensemble; DAG analysis: Directed Acyclic Graphs Analysis; DLMtree: Bayesian Treed Distributed Lab Models; FIN: Factor analysis for interactions; FOTP: Fast, optimal, and targeted predictions using parameterized decision analysis; GIF-SIS: General; zed infinite factor model; GL-GPs: Graph Laplacian based Gaussian Process; GriPS: Computational improvements for Bayesian multivariate regression models based on latent meshed Gaussian processes; Het-DLM: Heterogeneous distributed lag models; LWQS: Lagged Weighted Quantile Sum (WQS) regression; MatchAlign: Resolving rotational ambiguity in matrix sampling; MixSelect: Identifying main effects and interactions among exposures using Gaussian processes; MrGap: Manifold Reconstruction via Gaussian Process; Mult DLAG: Multiple exposure distributed lag models with variable selection; MVNimpute: Imputation of multivariate data by normal model; NLinteraction: Bayesian semiparametric regression with sparsity inducing priors; PCP: Principal Component Pursuit; PFA: Perturbed factor analysis; RH-WQS: Repeated holdout Weighted Quantile Sum (WQS) regression; SCC: Spatiotemporal case-crossover; SGP-MPI: Scalable Gaussian Process regression via Median Posterior Inference; SiBAR: State Informed Background Removal; SPAMTREE: Spatial Multivariate Trees; SPORM: Estimating complex relationship among outcome, biomarkers, and exposures; TEV: Estimation and inference on the explained variation parameter.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Joubert, B.R.; Kioumourtzoglou, M.-A.; Chamberlain, T.; Chen, H.Y.; Gennings, C.; Turyk, M.E.; Miranda, M.L.; Webster, T.F.; Ensor, K.B.; Dunson, D.B.; Coull, B.A. Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods. *Int. J. Environ. Res. Public Health* **2022**, *19*, 1378.
https://doi.org/10.3390/ijerph19031378

**AMA Style**

Joubert BR, Kioumourtzoglou M-A, Chamberlain T, Chen HY, Gennings C, Turyk ME, Miranda ML, Webster TF, Ensor KB, Dunson DB, Coull BA. Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods. *International Journal of Environmental Research and Public Health*. 2022; 19(3):1378.
https://doi.org/10.3390/ijerph19031378

**Chicago/Turabian Style**

Joubert, Bonnie R., Marianthi-Anna Kioumourtzoglou, Toccara Chamberlain, Hua Yun Chen, Chris Gennings, Mary E. Turyk, Marie Lynn Miranda, Thomas F. Webster, Katherine B. Ensor, David B. Dunson, and Brent A. Coull. 2022. "Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods" *International Journal of Environmental Research and Public Health* 19, no. 3: 1378.
https://doi.org/10.3390/ijerph19031378