Bayesian Nonlinear Models for Repeated Measurement Data: An Overview, Implementation, and Applications
Abstract
:1. Introduction
 Guidance for Bayesian workflow to solve a reallife problem is provided for domain experts to facilitate efficient collaboration with quantitative researchers;
 Recently developed prior distributions and Bayesian computation techniques for a basic model and its extensions are illustrated for statisticians to develop more complex models built on the basic model;
 Illustrated methodologies can be directly exploited in diverse applications, ranging from small data to big data problems, for quantitative researchers, modeling scientists, and professional programmers working in diverse industries.
2. Trends and Workflow of Bayesian Nonlinear Mixed Effects Models
2.1. Rise in the Use of Bayesian Approaches for the Nonlinear Mixed Effects Models
2.2. Bayesian Workflow
3. Applications of Bayesian Nonlinear Mixed Effects Model in RealLife Problems
3.1. The Setting
3.2. Example 1: Pharmacokinetics Analysis
3.3. Example 2: Decline Curve Analysis
3.4. Example 3: Yield Curve Modeling
3.5. Example 4: Early Stage of Epidemic
3.6. Statistical Problem
 (1)
 There exist repeated measures of a continuous response over time for each subject;
 (2)
 There exists a variation of individual observations over time;
 (3)
 There exists a variation from subjecttosubject in trajectories;
 (4)
 There exist covariates measured at baseline for each subject.
4. The Model
4.1. Basic Model
 Stage 1: IndividualLevel Model$${y}_{ij}=f({t}_{ij};{\mathbf{\theta}}^{i})+{\u03f5}_{ij},\phantom{\rule{1.em}{0ex}}{\u03f5}_{ij}\sim \mathcal{N}(0,{\sigma}^{2}),\phantom{\rule{1.em}{0ex}}(i=1,\cdots ,N;j=1,\cdots ,{M}_{i}).$$In (2), the conditional mean $\mathbb{E}\left[{y}_{ij}\right{\mathbf{\theta}}^{i},{\sigma}^{2}]=f({t}_{ij};{\mathbf{\theta}}^{i})$ is a known function governing withinindividual temporal behavior dictated by a Kdimensional parameter ${\mathbf{\theta}}^{i}=({\theta}_{1i},{\theta}_{2i},$$\cdots ,$${\theta}_{li},\cdots ,{\theta}_{Ki}{)}^{\top}\in {\mathbb{R}}^{K}$ specific to the subject i. We assume that the residuals, ${\u03f5}_{ij}$, are normally distributed with mean zero and with an unknown variance, ${\sigma}^{2}$.
 Stage 2: Population Model$${\theta}_{li}={\alpha}_{l}+{\mathbf{x}}_{i}^{\top}{\mathbf{\beta}}_{l}+{\eta}_{li},\phantom{\rule{1.em}{0ex}}{\eta}_{li}\sim \mathcal{N}(0,{\omega}_{l}^{2}),\phantom{\rule{1.em}{0ex}}(i=1,\cdots ,N;l=1,\cdots ,K).$$In (3), the lth model parameter ${\theta}_{li}$ is used as the response of an ordinary linear regression with predictor ${\mathbf{x}}_{i}$, with intercept ${\alpha}_{l}\in \mathbb{R}$ and coefficient vector ${\mathbf{\beta}}_{l}=({\beta}_{l1},{\beta}_{l2},\cdots ,{\beta}_{lP})\in {\mathbb{R}}^{P}$. By letting ${\mathbf{\eta}}_{i}=({\eta}_{1i},{\eta}_{2i},\cdots ,$${\eta}_{li},\cdots ,{\eta}_{Ki})\in {\mathbb{R}}^{K}$, we assume that the ${\mathbf{\eta}}_{i}$ is distributed according a Kdimensional Gaussian distribution $\mathcal{N}(\mathbf{0},\mathrm{\Omega})$ with covariance matrix $\mathrm{\Omega}=\mathrm{diag}({\omega}_{1}^{2},{\omega}_{2}^{2},\cdots ,{\omega}_{l}^{2},\cdots ,{\omega}_{K}^{2})\in {\mathbb{R}}^{K\times K}$. The diagonality in $\mathrm{\Omega}$ implies that each model parameter are uncorrelated across l.
 Stage 3: Prior$${\sigma}^{2}\sim \pi \left({\sigma}^{2}\right),\phantom{\rule{1.em}{0ex}}{\alpha}_{l}\sim \pi \left({\alpha}_{l}\right),\phantom{\rule{1.em}{0ex}}{\mathbf{\beta}}_{l}\sim \pi \left({\mathbf{\beta}}_{l}\right),\phantom{\rule{1.em}{0ex}}{\omega}_{l}^{2}\sim \pi \left({\omega}_{l}^{2}\right),\phantom{\rule{1.em}{0ex}}(l=1,\cdots ,K).$$
4.2. Vectorized Form of the Basic Model
 Stage 1: IndividualLevel Model$${\mathbf{y}}_{i}={\mathit{f}}_{i}({\mathit{t}}_{i},{\mathbf{\theta}}^{i})+{\mathbf{\u03f5}}_{i},\phantom{\rule{1.em}{0ex}}{\mathbf{\u03f5}}_{i}\sim {\mathcal{N}}_{{M}_{i}}(\mathbf{0},{\sigma}^{2}\mathbf{I}),\phantom{\rule{1.em}{0ex}}(i=1,\cdots ,N).$$In (6), ${\mathit{f}}_{i}({\mathit{t}}_{i},{\mathbf{\theta}}^{i})$ is a ${M}_{i}$dimensional vector whose elements are temporally stacked: ${\mathit{f}}_{i}({\mathit{t}}_{i},{\mathbf{\theta}}^{i})={(f({t}_{i1};{\mathbf{\theta}}^{i}),f({t}_{i2};{\mathbf{\theta}}^{i}),\cdots ,f({t}_{i{M}_{i}};{\mathbf{\theta}}^{i}))}^{\top}$ for the subject i. The vector ${\mathbf{\u03f5}}_{i}$ is distributed according to the ${M}_{i}$dimensional Gaussian distribution with mean $\mathbf{0}$ and covariance matrix ${\sigma}^{2}\mathbf{I}$.
 Stage 2: Population Model (lindexing)$${\mathbf{\theta}}_{l}=\mathbf{1}{\alpha}_{l}+\mathbf{X}{\mathbf{\beta}}_{l}+{\mathbf{\eta}}_{l},\phantom{\rule{1.em}{0ex}}{\mathbf{\eta}}_{l}\sim {\mathcal{N}}_{N}(0,{\omega}_{l}^{2}\mathbf{I}),\phantom{\rule{1.em}{0ex}}(l=1,\cdots ,K).$$In (7), for each l, the Ndimensional model parameter vector ${\mathbf{\theta}}_{l}$ is used as the response vector of an ordinary linear regression: (i) NbyP design matrix $\mathbf{X}={[{\mathbf{x}}_{1}{\mathbf{x}}_{2}\cdots {\mathbf{x}}_{N}]}^{\top}$; (ii) intercept ${\alpha}_{l}$; (iii) coefficient vector ${\mathbf{\beta}}_{l}$, and (iv) isotropic Gaussian error vector ${\mathbf{\eta}}_{l}={({\eta}_{l1},{\eta}_{l2},\cdots ,{\eta}_{lN})}^{\top}$ with variance ${\omega}_{l}^{2}$. (Notation $\mathbf{1}$ in (7) represents an allones vector.).
 Stage 2${}^{\prime}$: Population Model (iindexing)$${\mathbf{\theta}}^{i}=\mathbf{\alpha}+\mathit{B}{\mathbf{x}}_{i}+{\mathbf{\eta}}^{i},\phantom{\rule{1.em}{0ex}}{\mathbf{\eta}}^{i}\sim {\mathcal{N}}_{K}(\mathbf{0},\mathrm{\Omega}),\phantom{\rule{1.em}{0ex}}(i=1,\cdots ,N).$$Equation (8) is derived by incorporating each of the N columns of the model matrix (5). Here, $\mathbf{\alpha}$ represents a Kdimensional vector $\mathbf{\alpha}={({\alpha}_{1},{\alpha}_{2},\cdots ,{\alpha}_{K})}^{\top}$, and $\mathit{B}$ represents a KbyP matrix with rows ${\mathbf{\beta}}_{l}$ ($l=1,\cdots ,K$). Here, the Kdimensional vector $\mathit{B}{\mathbf{x}}_{i}$ in the righthand side of (8) is the mathematically identical to ${\mathbf{X}}_{i}\mathbf{\beta}$, where ${\mathbf{X}}_{i}={\mathbf{I}}_{K}\otimes {\mathbf{x}}_{i}^{\top}\in {\mathbb{R}}^{K\times KP}$ and $\mathbf{\beta}=({\mathbf{\beta}}_{1},{\mathbf{\beta}}_{2},\cdots ,{\mathbf{\beta}}_{K})\in {\mathbb{R}}^{KP}$ (${\mathbf{I}}_{K}$ is the KbyK identity matrix and ⊗ represents the Kronecker matrix product.). The error vector ${\mathbf{\eta}}^{i}={({\eta}_{1i},{\eta}_{2i},\cdots ,{\eta}_{Ki})}^{\top}$ is distributed according a Kdimensional Gaussian distribution with mean $\mathbf{0}$ and covariance matrix $\mathrm{\Omega}=\mathrm{diag}({\omega}_{1}^{2},{\omega}_{2}^{2},\cdots ,{\omega}_{K}^{2})$.
 Stage 3: Prior$${\sigma}^{2}\sim \pi \left({\sigma}^{2}\right),\phantom{\rule{1.em}{0ex}}\mathbf{\alpha}\sim \pi \left(\mathbf{\alpha}\right),\phantom{\rule{1.em}{0ex}}\mathit{B}\sim \pi \left(\mathit{B}\right),\phantom{\rule{1.em}{0ex}}\mathrm{\Omega}\sim \pi \left(\mathrm{\Omega}\right).$$Each of the parameter blocks in $({\sigma}^{2},\mathbf{\alpha},\mathit{B},\mathrm{\Omega})$ is assumed to be independent a priori.
5. Likelihood
5.1. Outline
5.2. Likelihood Based on Stage 1
5.3. Likelihood Based on Stage 1 and 2 from VectorForm (a)
5.4. Likelihood Based on Stage 1 and 2${}^{\prime}$ from VectorForm (b)
6. Bayesian Inference and Implementation
6.1. Bayesian Inference
6.2. Gibbs Sampling Algorithm
6.3. Parallel Computation for Model Matrix
6.4. Elliptical Slice Sampler
Algorithm 1: ESS to sample from $\pi \left({\theta}_{li}\right)$ (22) 
Goal: Sampling from the full conditional posterior distribution
$$\pi \left({\theta}_{li}\right)\propto \mathcal{L}\left({\theta}_{li}\right)\xb7\mathcal{N}\left({\theta}_{li}\right{\mu}_{li},{\omega}_{l}^{2}),$$
Input: Current state ${\theta}_{li}^{\left(s\right)}$. Output: A new state ${\theta}_{li}^{(s+1)}$.

6.5. Metropolis Adjusted Langevin Algorithm
Algorithm 2: MALA to sample from $\pi \left({\theta}_{li}\right)$ (22) 
Goal: Sampling from the full conditional posterior distribution
$$\pi \left({\theta}_{li}\right)\propto exp\phantom{\rule{4pt}{0ex}}\left(\right)open="("\; close=")">U\left({\theta}_{1i}\right)$$
$U\left({\theta}_{li}\right)={\parallel {\mathbf{y}}_{i}{\mathbf{f}}_{i}({\mathbf{t}}_{i};{\theta}_{1i},\cdots ,{\theta}_{li},\cdots ,{\theta}_{Ki})\parallel}_{2}^{2}/\left(2{\sigma}^{2}\right)+{({\theta}_{li}{\alpha}_{l}{\mathbf{x}}_{i}^{\top}{\mathbf{\beta}}_{l})}^{2}/\left(2{\omega}_{l}^{2}\right).$ Input: Current state ${\theta}_{li}^{\left(s\right)}$ and step size $\delta $. Output: A new state ${\theta}_{li}^{(s+1)}$.

6.6. Hamiltonian Monte Carlo
 (a)
 Preservation of total energy: $H({\theta}_{li}\left(t\right),{\varphi}_{li}\left(t\right))=H({\theta}_{li}\left(0\right),{\varphi}_{li}\left(0\right))$ for all $t\in [a,b]$;
 (b)
 Preservation of volume: $d{\theta}_{li}\left(t\right)d{\varphi}_{li}\left(t\right)=d{\theta}_{li}\left(0\right)d{\varphi}_{li}\left(0\right)$ for all $t\in [a,b]$;
 (c)
 Time reversibility: The mapping ${T}_{s}$ from state at t, $({\theta}_{li}\left(t\right),{\varphi}_{li}\left(t\right))$, to the state at time $t+s$, $({\theta}_{li}(t+s),{\varphi}_{li}(t+s))$, is onetoone, and hence has an inverse ${T}_{s}$.
Algorithm 3: HMC to sample from $\pi \left({\theta}_{li}\right)=\int \pi ({\theta}_{li},{\varphi}_{li})d{\varphi}_{li}$ (22) 
Goal: Sampling from the full conditional posterior distribution
$$\pi \left({\theta}_{li}\right)=\int \pi ({\theta}_{li},{\varphi}_{li})d{\varphi}_{li}\propto exp\phantom{\rule{4pt}{0ex}}\left(\right)open="("\; close=")">U\left({\theta}_{li}\right)$$
Input: Current state ${\theta}_{li}^{\left(s\right)}$, step size $\delta $, number of steps L, and mass ${m}_{li}$. Output: A new state ${\theta}_{li}^{(s+1)}$.

7. Prior Options
7.1. Priors for Variance
7.2. Priors for Intercept and Coefficient Vector
 Spikeandslab priors. Each component of the coefficients ${\mathbf{\beta}}_{l}$ is assumed to be drawn from$${\beta}_{lb}{\tau}_{l}\sim (1{\tau}_{l})\xb7{\delta}_{0}\left({\beta}_{lb}\right)+{\tau}_{l}\xb7f\left({\beta}_{lb}\right),\phantom{\rule{1.em}{0ex}}(l=1,\cdots ,K;b=1,\cdots ,P),$$
 Continuous shrinkage priors. Each component of the coefficients ${\mathbf{\beta}}_{l}$ is assumed to be drawn from$${\beta}_{lb}{\lambda}_{lb},{\tau}_{l},{\omega}_{l}^{2}\sim \mathcal{N}(0,{\lambda}_{lb}^{2}{\tau}_{l}^{2}{\omega}_{l}^{2}),\phantom{\rule{1.em}{0ex}}(l=1,\cdots ,K;b=1,\cdots ,P),$$$${\lambda}_{lb}\sim f\left({\lambda}_{lb}\right),{\tau}_{l}\sim g\left({\tau}_{l}\right),{\omega}_{l}^{2}\sim h\left({\omega}_{l}^{2}\right),\phantom{\rule{1.em}{0ex}}(l=1,\cdots ,K;b=1,\cdots ,P),$$
7.3. Priors for Covariance Matrix
 Jeffreys prior. The common noninformative prior has been the Jeffreys improper prior$$\begin{array}{c}\hfill \pi \left(\mathrm{\Omega}\right)\propto {\left(\mathrm{det}\mathrm{\Omega}\right)}^{(K+1)/2}.\end{array}$$
 InverseWishart prior. The common informative prior is the inverseWishart prior [256]$$\begin{array}{c}\hfill \pi \left(\mathrm{\Omega}\right)=\mathcal{IW}(\mathbf{V},d)=\frac{{\left(\mathrm{det}\mathbf{V}\right)}^{d/2}}{{2}^{dK/2}{\Gamma}_{K}(d/2)}{\left(\mathrm{det}\mathrm{\Omega}\right)}^{(d+K+1)/2}exp\left(\right)open="("\; close=")">\frac{1}{2}\mathrm{tr}\left(\right)open="["\; close="]">{\mathrm{\Omega}}^{1}\mathbf{V}& ,\end{array}$$
 LKJ prior. LKJ prior is supported over the correlation matrix space ${\mathcal{R}}^{K}$, or equivalently over the set of $K\times K$ Cholesky factors of real symmetric positive definite matrces$$\begin{array}{c}\hfill \pi \left(\mathbf{R}\right)=\left(\right)open="["\; close="]">{2}^{{\sum}_{q=1}^{Q1}(2\gamma 2+Qq)(Qq)}\prod _{q=1}^{Q1}\mathcal{B}{\left(\right)}^{\gamma}Qq& {\left(\mathrm{det}\phantom{\rule{0.166667em}{0ex}}\mathbf{R}\right)}^{\gamma 1},\end{array}$$
8. Model Selection
8.1. Setting
8.2. Deviance Information Criterion
8.3. Widely Applicable Information Criterion
8.4. Posterior Predictive Loss Criterion
9. Extensions and Recent Developments
9.1. Residual Error Models
9.2. Bayesian Nonparametric Methods
9.3. Software Development
9.4. Future Research Topics
10. Discussion
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
 Sterba, S.K. Fitting nonlinear latent growth curve models with individually varying time points. Struct. Equ. Model. Multidiscip. J. 2014, 21, 630–647. [Google Scholar] [CrossRef]
 McArdle, J.J. Latent variable growth within behavior genetic models. Behav. Genet. 1986, 16, 163–200. [Google Scholar] [CrossRef] [PubMed]
 Cook, N.R.; Ware, J.H. Design and analysis methods for longitudinal research. Annu. Rev. Public Health 1983, 4, 1–23. [Google Scholar] [CrossRef] [PubMed]
 Mehta, P.D.; West, S.G. Putting the individual back into individual growth curves. Psychol. Methods 2000, 5, 23. [Google Scholar] [CrossRef] [PubMed]
 Zeger, S.L.; Liang, K.Y. An overview of methods for the analysis of longitudinal data. Stat. Med. 1992, 11, 1825–1839. [Google Scholar] [CrossRef] [PubMed]
 Diggle, P.; Diggle, P.J.; Heagerty, P.; Liang, K.Y.; Zeger, S. Analysis of Longitudinal Data; Oxford University Press: Oxford, UK, 2002. [Google Scholar]
 Demidenko, E. Mixed Models: Theory and Applications with R; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
 Snijders, T.A.; Bosker, R.J. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling; Sage: Los Angeles, CA, USA, 2011. [Google Scholar]
 Goldstein, H. Multilevel Statistical Models; John Wiley & Sons: Hoboken, NJ, USA, 2011; Volume 922. [Google Scholar]
 Raudenbush, S.W.; Bryk, A.S. Hierarchical Linear Models: Applications and Data Analysis Methods; Sage: Thousand Oaks, CA, USA, 2002; Volume 1. [Google Scholar]
 Efron, B. The future of indirect evidence. Stat. Sci. A Rev. J. Inst. Math. Stat. 2010, 25, 145. [Google Scholar] [CrossRef] [PubMed]
 Sheiner, L.B.; Rosenberg, B.; Melmon, K.L. Modelling of individual pharmacokinetics for computeraided drug dosage. Comput. Biomed. Res. 1972, 5, 441–459. [Google Scholar] [CrossRef]
 Lindstrom, M.J.; Bates, D.M. Nonlinear mixed effects models for repeated measures data. Biometrics 1990, 46, 673–687. [Google Scholar] [CrossRef]
 Davidian, M.; Giltinan, D.M. Nonlinear models for repeated measurement data: An overview and update. J. Agric. Biol. Environ. Stat. 2003, 8, 387–419. [Google Scholar] [CrossRef]
 Davidian, M.; Giltinan, D.M. Nonlinear Models for Repeated Measurement Data; Routledge: New York, NY, USA, 1995. [Google Scholar]
 Beal, S. The NONMEM System. 1980. Available online: https://iconplc.com/innovation/nonmem/ (accessed on 20 February 2022).
 Stan Development Team. RStan: The R Interface to Stan. R Package Version 2.21.3. 2021. Available online: https://mcstan.org/rstan/ (accessed on 20 February 2022).
 Fidler, M.; Wilkins, J.J.; Hooijmaijers, R.; Post, T.M.; Schoemaker, R.; Trame, M.N.; Xiong, Y.; Wang, W. Nonlinear mixedeffects model development and simulation using nlmixr and related R opensource packages. CPT Pharmacometr. Syst. Pharmacol. 2019, 8, 621–633. [Google Scholar] [CrossRef] [Green Version]
 Wang, W.; Hallow, K.; James, D. A tutorial on RxODE: Simulating differential equation pharmacometric models in R. CPT Pharmacometr. Syst. Pharmacol. 2016, 5, 3–10. [Google Scholar] [CrossRef] [PubMed]
 Stegmann, G.; Jacobucci, R.; Harring, J.R.; Grimm, K.J. Nonlinear mixedeffects modeling programs in R. Struct. Equ. Model. Multidiscip. J. 2018, 25, 160–165. [Google Scholar] [CrossRef]
 Vonesh, E.; Chinchilli, V.M. Linear and Nonlinear Models for the Analysis of Repeated Measurements; CRC Press: Boca Raton, FL, USA, 1996. [Google Scholar]
 Lee, S.Y. Structural Equation Modeling: A Bayesian Approach; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
 Dellaportas, P.; Smith, A.F. Bayesian inference for generalized linear and proportional hazards models via Gibbs sampling. J. R. Stat. Soc. Ser. C 1993, 42, 443–459. [Google Scholar] [CrossRef]
 Bush, C.A.; MacEachern, S.N. A semiparametric Bayesian model for randomised block designs. Biometrika 1996, 83, 275–285. [Google Scholar] [CrossRef]
 Zeger, S.L.; Karim, M.R. Generalized linear models with random effects; a Gibbs sampling approach. J. Am. Stat. Assoc. 1991, 86, 79–86. [Google Scholar] [CrossRef]
 Brooks, S.P. Bayesian computation: A statistical revolution. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 2003, 361, 2681–2697. [Google Scholar] [CrossRef] [PubMed]
 Bennett, J.; Wakefield, J. A comparison of a Bayesian population method with two methods as implemented in commercially available software. J. Pharmacokinet. Biopharm. 1996, 24, 403–432. [Google Scholar] [CrossRef]
 Wakefield, J. The Bayesian analysis of population pharmacokinetic models. J. Am. Stat. Assoc. 1996, 91, 62–75. [Google Scholar] [CrossRef]
 Gelman, A.; Bois, F.; Jiang, J. Physiological pharmacokinetic analysis using population modeling and informative prior distributions. J. Am. Stat. Assoc. 1996, 91, 1400–1412. [Google Scholar] [CrossRef]
 Lee, S.Y.; Lei, B.; Mallick, B. Estimation of COVID19 spread curves integrating global data and borrowing information. PLoS ONE 2020, 15, e0236860. [Google Scholar] [CrossRef] [PubMed]
 Lee, S.Y.; Mallick, B.K. Bayesian Hierarchical Modeling: Application Towards Production Results in the Eagle Ford Shale of South Texas. Sankhya B 2021, 1–43. [Google Scholar] [CrossRef]
 Hammersley, J. Monte Carlo Methods; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
 Green, P.J.; Łatuszyński, K.; Pereyra, M.; Robert, C.P. Bayesian computation: A summary of the current state, and samples backwards and forwards. Stat. Comput. 2015, 25, 835–862. [Google Scholar] [CrossRef] [Green Version]
 Plummer, M. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd International Workshop on Distributed Statistical Computing, Vienna, Austria, 20–22 March 2003; Volume 124, pp. 1–10. [Google Scholar]
 Lunn, D.; Spiegelhalter, D.; Thomas, A.; Best, N. The BUGS project: Evolution, critique and future directions. Stat. Med. 2009, 28, 3049–3067. [Google Scholar] [CrossRef] [PubMed]
 Beal, S.L.; Sheiner, L.B. Estimating population kinetics. Crit. Rev. Biomed. Eng. 1982, 8, 195–222. [Google Scholar] [PubMed]
 Wolfinger, R. Laplace’s approximation for nonlinear mixed models. Biometrika 1993, 80, 791–795. [Google Scholar] [CrossRef]
 Delyon, B.; Lavielle, M.; Moulines, E. Convergence of a stochastic approximation version of the EM algorithm. Ann. Stat. 1999, 27, 94–128. [Google Scholar] [CrossRef]
 Lee, S.Y. Gibbs sampler and coordinate ascent variational inference: A settheoretical review. Commun. Stat. Theory Methods 2021, 1–21. [Google Scholar] [CrossRef]
 Robert, C.P.; Casella, G. The metropolis—Hastings algorithm. In Monte Carlo Statistical Methods; Springer: Berlin/Heidelberg, Germany, 1999; pp. 231–283. [Google Scholar]
 Neal, R.M. MCMC using Hamiltonian dynamics. Handb. Markov Chain Monte Carlo 2011, 2, 2. [Google Scholar]
 Hoffman, M.D.; Gelman, A. The NoUTurn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 2014, 15, 1593–1623. [Google Scholar]
 Dwivedi, R.; Chen, Y.; Wainwright, M.J.; Yu, B. Logconcave sampling: MetropolisHastings algorithms are fast! In Proceedings of the Conference on Learning Theory, Stockholm, Sweden, 6–9 July 2018; pp. 793–797. [Google Scholar]
 Ma, Y.A.; Chen, Y.; Jin, C.; Flammarion, N.; Jordan, M.I. Sampling can be faster than optimization. Proc. Natl. Acad. Sci. USA 2019, 116, 20881–20885. [Google Scholar] [CrossRef] [Green Version]
 Neal, R.M. Slice sampling. Ann. Stat. 2003, 31, 705–767. [Google Scholar] [CrossRef]
 SAS Institute. SAS OnlineDoc, Version 8. 1999. Available online: http://v8doc.sas.com/sashtml/main.htm (accessed on 20 February 2022).
 Beal, S.L.; Sheiner, L.B.; Boeckmann, A.; Bauer, R.J. NONMEM Users Guides; NONMEM Project Group, University of California: San Francisco, CA, USA, 1992. [Google Scholar]
 Lavielle, M. Monolix User Guide Manual. 2005. Available online: https://monolix.lixoft.com/ (accessed on 20 February 2022).
 Lunn, D.J.; Thomas, A.; Best, N.; Spiegelhalter, D. WinBUGSa Bayesian modelling framework: Concepts, structure, and extensibility. Stat. Comput. 2000, 10, 325–337. [Google Scholar] [CrossRef]
 Bürkner, P.C. brms: An R package for Bayesian multilevel models using Stan. J. Stat. Softw. 2017, 80, 1–28. [Google Scholar] [CrossRef] [Green Version]
 Chernoff, H. Largesample theory: Parametric case. Ann. Math. Stat. 1956, 27, 1–22. [Google Scholar] [CrossRef]
 Wand, M. Fisher information for generalised linear mixed models. J. Multivar. Anal. 2007, 98, 1412–1416. [Google Scholar] [CrossRef] [Green Version]
 Kang, D.; Bae, K.S.; Houk, B.E.; Savic, R.M.; Karlsson, M.O. Standard error of empirical bayes estimate in NONMEM® VI. Korean J. Physiol. Pharmacol. 2012, 16, 97–106. [Google Scholar] [CrossRef] [PubMed] [Green Version]
 Breslow, N.E.; Clayton, D.G. Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 1993, 88, 9–25. [Google Scholar]
 Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis; Chapman and Hall/CRC: London, UK, 2004. [Google Scholar]
 Smid, S.C.; McNeish, D.; Miočević, M.; van de Schoot, R. Bayesian versus frequentist estimation for structural equation models in small sample contexts: A systematic review. Struct. Equ. Model. Multidiscip. J. 2020, 27, 131–161. [Google Scholar] [CrossRef] [Green Version]
 Rupp, A.A.; Dey, D.K.; Zumbo, B.D. To Bayes or not to Bayes, from whether to when: Applications of Bayesian methodology to modeling. Struct. Equ. Model. 2004, 11, 424–451. [Google Scholar] [CrossRef]
 Bonangelino, P.; Irony, T.; Liang, S.; Li, X.; Mukhi, V.; Ruan, S.; Xu, Y.; Yang, X.; Wang, C. Bayesian approaches in medical device clinical trials: A discussion with examples in the regulatory setting. J. Biopharm. Stat. 2011, 21, 938–953. [Google Scholar] [CrossRef] [PubMed]
 Campbell, G. Bayesian methods in clinical trials with applications to medical devices. Commun. Stat. Appl. Methods 2017, 24, 561–581. [Google Scholar] [CrossRef] [Green Version]
 Hoff, P.D. A First Course in Bayesian Statistical Methods; Springer: Berlin/Heidelberg, Germany, 2009; Volume 580. [Google Scholar]
 O’Hagan, A. Bayesian statistics: Principles and benefits. Frontis 2004, 3, 31–45. [Google Scholar]
 van de Schoot, R.; Depaoli, S.; King, R.; Kramer, B.; Märtens, K.; Tadesse, M.G.; Vannucci, M.; Gelman, A.; Veen, D.; Willemsen, J.; et al. Bayesian statistics and modelling. Nat. Rev. Methods Prim. 2021, 1, 1–26. [Google Scholar] [CrossRef]
 Blaxter, L.; Hughes, C.; Tight, M. How to Research; McGrawHill Education: New York, NY, USA, 2010. [Google Scholar]
 Neuman, W.L. Understanding Research; Pearson: New York, NY, USA, 2016. [Google Scholar]
 Pinheiro, J.; Bates, D. MixedEffects Models in S and SPLUS; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
 Gelman, A.; Simpson, D.; Betancourt, M. The prior can often only be understood in the context of the likelihood. Entropy 2017, 19, 555. [Google Scholar] [CrossRef] [Green Version]
 Garthwaite, P.H.; Kadane, J.B.; O’Hagan, A. Statistical methods for eliciting probability distributions. J. Am. Stat. Assoc. 2005, 100, 680–701. [Google Scholar] [CrossRef]
 O’Hagan, A.; Buck, C.E.; Daneshkhah, A.; Eiser, J.R.; Garthwaite, P.H.; Jenkinson, D.J.; Oakley, J.E.; Rakow, T. Uncertain Judgements: Eliciting Experts’ Probabilities; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2006. [Google Scholar]
 Howard, G.S.; Maxwell, S.E.; Fleming, K.J. The proof of the pudding: An illustration of the relative strengths of null hypothesis, metaanalysis, and Bayesian analysis. Psychol. Methods 2000, 5, 315. [Google Scholar] [CrossRef]
 Levy, R. Bayesian datamodel fit assessment for structural equation modeling. Struct. Equ. Model. Multidiscip. J. 2011, 18, 663–685. [Google Scholar] [CrossRef]
 Wang, L.; Cao, J.; Ramsay, J.O.; Burger, D.; Laporte, C.; Rockstroh, J.K. Estimating mixedeffects differential equation models. Stat. Comput. 2014, 24, 111–121. [Google Scholar] [CrossRef]
 Botha, I.; Kohn, R.; Drovandi, C. Particle methods for stochastic differential equation mixed effects models. Bayesian Anal. 2021, 16, 575–609. [Google Scholar] [CrossRef]
 Fucik, S.; Kufner, A. Nonlinear Differential Equations; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
 Verhulst, F. Nonlinear Differential Equations and Dynamical Systems; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
 Cohen, S.D.; Hindmarsh, A.C.; Dubois, P.F. CVODE, a stiff/nonstiff ODE solver in C. Comput. Phys. 1996, 10, 138–143. [Google Scholar] [CrossRef] [Green Version]
 Dormand, J.R.; Prince, P.J. A family of embedded RungeKutta formulae. J. Comput. Appl. Math. 1980, 6, 19–26. [Google Scholar] [CrossRef] [Green Version]
 Margossian, C.; Gillespie, B. Torsten: A Prototype Model Library for Bayesian PKPD Modeling in Stan User Manual: Version 0.81. Available online: https://metrumresearchgroup.github.io/Torsten/ (accessed on 20 February 2022).
 Chipman, H.; George, E.I.; McCulloch, R.E.; Clyde, M.; Foster, D.P.; Stine, R.A. The practical implementation of Bayesian model selection. Lect. NotesMonogr. Ser. 2001, 38, 65–134. [Google Scholar]
 Gibaldi, M.; Perrier, D. Pharmacokinetics; M. Dekker: New York, NY, USA, 1982; Volume 15. [Google Scholar]
 Jambhekar, S.S.; Breen, P.J. Basic Pharmacokinetics; Pharmaceutical Press: London, UK, 2009; Volume 76. [Google Scholar]
 Sheiner, L.; Ludden, T. Population pharmacokinetics/dynamics. Annu. Rev. Pharmacol. Toxicol. 1992, 32, 185–209. [Google Scholar] [CrossRef] [PubMed]
 Ette, E.I.; Williams, P.J. Population pharmacokinetics I: Background, concepts, and models. Ann. Pharmacother. 2004, 38, 1702–1706. [Google Scholar] [CrossRef]
 Lewis, J.; Beal, C.H. Some New Methods for Estimating the Future Production of Oil Wells. Trans. AIME 1918, 59, 492–525. [Google Scholar] [CrossRef]
 Fetkovich, M.J. Decline curve analysis using type curves. J. Pet. Technol. 1980, 32, 1065–1077. [Google Scholar] [CrossRef]
 Harris, S.; Lee, W.J. A Study of Decline Curve Analysis in the Elm Coulee Field. In SPE Unconventional Resources Conference; Society of Petroleum Engineers: The Woodlands, TX, USA, 2014. [Google Scholar]
 Nelson, C.R.; Siegel, A.F. Parsimonious modeling of yield curves. J. Bus. 1987, 60, 473–489. [Google Scholar] [CrossRef]
 Diebold, F.X.; Li, C. Forecasting the term structure of government bond yields. J. Econom. 2006, 130, 337–364. [Google Scholar] [CrossRef] [Green Version]
 Svensson, L.E. Estimating and Interpreting forward Interest Rates: Sweden 1992–1994. National Bureau of Economic Research Working Paper, no 4871. 1994. Available online: https://www.nber.org/papers/w4871 (accessed on 20 February 2022).
 Dahlquist, M.; Svensson, L.E. Estimating the term structure of interest rates for monetary policy analysis. Scand. J. Econ. 1996, 98, 163–183. [Google Scholar] [CrossRef]
 Wang, P.; Zheng, X.; Li, J.; Zhu, B. Prediction of epidemic trends in COVID19 with logistic model and machine learning technics. Chaos Solitons Fractals 2020, 139, 110058. [Google Scholar] [CrossRef]
 Wilke, C.O.; Bergstrom, C.T. Predicting an epidemic trajectory is difficult. Proc. Natl. Acad. Sci. USA 2020, 117, 28549–28551. [Google Scholar] [CrossRef]
 Bonate, P.L. PharmacokineticPharmacodynamic Modeling and Simulation; Springer: Berlin/Heidelberg, Germany, 2011; Volume 20. [Google Scholar]
 Rowland, M.; Tozer, T.N. Clinical Pharmacokinetics/Pharmacodynamics; Lippincott Williams and Wilkins Philadelphia: New York, NY, USA, 2005. [Google Scholar]
 Gabrielsson, J.; Weiner, D. Pharmacokinetic and Pharmacodynamic Data Analysis: Concepts and Applications; CRC Press: Boca Raton, FL, USA, 2001. [Google Scholar]
 Dua, P.; Hawkins, E.; Van Der Graaf, P. A tutorial on targetmediated drug disposition (TMDD) models. CPT Pharmacometr. Syst. Pharmacol. 2015, 4, 324–337. [Google Scholar] [CrossRef]
 Xu, X.S.; Yuan, M.; Zhu, H.; Yang, Y.; Wang, H.; Zhou, H.; Xu, J.; Zhang, L.; Pinheiro, J. Full covariate modelling approach in population pharmacokinetics: Understanding the underlying hypothesis tests and implications of multiplicity. Br. J. Clin. Pharmacol. 2018, 84, 1525–1534. [Google Scholar] [CrossRef] [Green Version]
 Roses, A.D. Pharmacogenetics and the practice of medicine. Nature 2000, 405, 857–865. [Google Scholar] [CrossRef]
 Food and Drug Administration. Population Pharmacokinetics Guidance for Industry. FDA Guidance Page; 1999. Available online: https://www.fda.gov/regulatoryinformation/searchfdaguidancedocuments/populationpharmacokinetics (accessed on 20 February 2022).
 Efron, B. The Jackknife, the Bootstrap and Other Resampling Plans; SIAM: Philadelphia, PA, USA, 1982. [Google Scholar]
 Broeker, A.; Wicha, S.G. Assessing parameter uncertainty in smalln pharmacometric analyses: Value of the loglikelihood profilingbased sampling importance resampling (LLPSIR) technique. J. Pharmacokinet. Pharmacodyn. 2020, 47, 219–228. [Google Scholar] [CrossRef] [Green Version]
 Bauer, R.J. NONMEM tutorial part I: Description of commands and options, with simple examples of population analysis. CPT Pharmacometr. Syst. Pharmacol. 2019, 8, 525–537. [Google Scholar] [CrossRef] [Green Version]
 Giger, F.; Reiss, L.; Jourdan, A. The reservoir engineering aspects of horizontal drilling. In SPE Annual Technical Conference and Exhibition; OnePetro: Houston, TX, USA, 1984. [Google Scholar]
 AlHaddad, S.; Crafton, J. Productivity of horizontal wells. In Low Permeability Reservoirs Symposium; OnePetro: Denver, CO, USA, 1991. [Google Scholar]
 Mukherjee, H.; Economides, M.J. A parametric comparison of horizontal and vertical well performance. SPE Form. Eval. 1991, 6, 209–216. [Google Scholar] [CrossRef]
 Joshi, S. Cost/benefits of horizontal wells. In SPE Western Regional/AAPG Pacific Section Joint Meeting; OnePetro: Long Beach, CA, USA, 2003. [Google Scholar]
 Valdes, A.; McVay, D.A.; Noynaert, S.F. Uncertainty quantification improves well construction cost estimation in unconventional reservoirs. In SPE Unconventional Resources Conference Canada; OnePetro: Calgary, AB, Canada, 2013. [Google Scholar]
 Bellarby, J. Well Completion Design; Elsevier: Amsterdam, The Netherlands, 2009. [Google Scholar]
 Currie, S.M.; Ilk, D.; Blasingame, T.A. Continuous estimation of ultimate recovery. In SPE Unconventional Gas Conference; OnePetro: Pittsburgh, PA, USA, 2010. [Google Scholar]
 Arps, J.J. Analysis of decline curves. Trans. AIME 1945, 160, 228–247. [Google Scholar] [CrossRef]
 Weibull, W. A statistical distribution function of wide applicability. J. Appl. Mech. 1951, 18, 293–297. [Google Scholar] [CrossRef]
 Ilk, D.; Rushing, J.A.; Perego, A.D.; Blasingame, T.A. Exponential vs. hyperbolic decline in tight gas sands: Understanding the origin and implications for reserve estimates using Arps’ decline curves. In SPE Annual Technical Conference and Exhibition; Society of Petroleum Engineers: Denver, CO, USA, 2008. [Google Scholar]
 Valkó, P.P.; Lee, W.J. A better way to forecast production from unconventional gas wells. In SPE Annual Technical Conference and Exhibition; Society of Petroleum Engineers: Florence, Italy, 2010. [Google Scholar]
 Clark, A.J. Decline Curve Analysis in Unconventional Resource Plays Using Logistic Growth Models. Ph.D. Thesis, The University of Texas Austion, Austin, TX, USA, 2011. [Google Scholar]
 Duong, A.N. Ratedecline analysis for fracturedominated shale reservoirs. SPE Reserv. Eval. Eng. 2011, 14, 377–387. [Google Scholar] [CrossRef] [Green Version]
 Ali, T.A.; Sheng, J.J. Production Decline Models: A Comparison Study. In SPE Eastern Regional Meeting; Society of Petroleum Engineers: Morgantown, WV, USA, 2015. [Google Scholar]
 Miao, Y.; Li, X.; Lee, J.; Zhao, C.; Zhou, Y.; Li, H.; Chang, Y.; Lin, W.; Xiao, Z.; Wu, N.; et al. Comparison of Various RateDecline Analysis Models for Horizontal Wells with Multiple Fractures in Shale gas Reservoirs. In SPE Trinidad and Tobago Section Energy Resources Conference; Society of Petroleum Engineers: Port of Spain, Trinidad and Tobago, 2018. [Google Scholar]
 Duffee, G. Forecasting interest rates. In Handbook of Economic Forecasting; Elsevier: Amsterdam, The Netherlands, 2013; Volume 2, pp. 385–426. [Google Scholar]
 Gürkaynak, R.S.; Sack, B.; Wright, J.H. The US Treasury yield curve: 1961 to the present. J. Monet. Econ. 2007, 54, 2291–2304. [Google Scholar] [CrossRef] [Green Version]
 Zaloom, C. How to read the future: The yield curve, affect, and financial prediction. Public Cult. 2009, 21, 245–268. [Google Scholar] [CrossRef] [Green Version]
 Hays, S.; Shen, H.; Huang, J.Z. Functional dynamic factor models with application to yield curve forecasting. Ann. Appl. Stat. 2012, 6, 870–894. [Google Scholar] [CrossRef]
 Chen, Y.; Niu, L. Adaptive dynamic Nelson–Siegel term structure model with applications. J. Econom. 2014, 180, 98–115. [Google Scholar] [CrossRef] [Green Version]
 Bank for International Settlements. ZeroCoupon Yield Curves: Technical Documentation; BIS Papers, no 2; Bank for International Settlements: Basel, Switzerland, 2005; Available online: https://www.bis.org/publ/bppdf/bispap25.htm (accessed on 20 February 2022).
 Hautsch, N.; Yang, F. Bayesian inference in a stochastic volatility Nelson–Siegel model. Comput. Stat. Data Anal. 2012, 56, 3774–3792. [Google Scholar] [CrossRef] [Green Version]
 Diebold, F.X.; Li, C.; Yue, V.Z. Global yield curve dynamics and interactions: A dynamic Nelson–Siegel approach. J. Econom. 2008, 146, 351–363. [Google Scholar] [CrossRef] [Green Version]
 CruzMarcelo, A.; Ensor, K.B.; Rosner, G.L. Estimating the term structure with a semiparametric Bayesian hierarchical model: An application to corporate bonds. J. Am. Stat. Assoc. 2011, 106, 387–395. [Google Scholar] [CrossRef] [Green Version]
 Richards, F. A flexible growth function for empirical use. J. Exp. Bot. 1959, 10, 290–301. [Google Scholar] [CrossRef]
 Nelder, J.A. 182. note: An alternative form of a generalized logistic equation. Biometrics 1962, 18, 614–616. [Google Scholar] [CrossRef]
 Seber, G.A.; Wild, C.J. Nonlinear Regression; John Wiley Sons: Hoboken, NJ, USA, 2003; Volume 62, p. 63. [Google Scholar]
 Anton, H.; Herr, A. Calculus with Analytic Geometry; Wiley: New York, NY, USA, 1988. [Google Scholar]
 Causton, D. A computer program for fitting the Richards function. Biometrics 1969, 25, 401–409. [Google Scholar] [CrossRef]
 Birch, C.P. A new generalized logistic sigmoid growth equation compared with the Richards growth equation. Ann. Bot. 1999, 83, 713–723. [Google Scholar] [CrossRef] [Green Version]
 Kahm, M.; Hasenbrink, G.; LichtenbergFraté, H.; Ludwig, J.; Kschischo, M. grofit: Fitting biological growth curves with R. J. Stat. Softw. 2010, 33, 1–21. [Google Scholar] [CrossRef] [Green Version]
 Cao, L.; Shi, P.J.; Li, L.; Chen, G. A New Flexible Sigmoidal Growth Model. Symmetry 2019, 11, 204. [Google Scholar] [CrossRef] [Green Version]
 Wang, X.S.; Wu, J.; Yang, Y. Richards model revisited: Validation by and application to infection dynamics. J. Theor. Biol. 2012, 313, 12–19. [Google Scholar] [CrossRef] [PubMed]
 Hsieh, Y.H.; Lee, J.Y.; Chang, H.L. SARS epidemiology modeling. Emerg. Infect. Dis. 2004, 10, 1165. [Google Scholar] [CrossRef] [PubMed] [Green Version]
 Hsieh, Y.H. Richards model: A simple procedure for realtime prediction of outbreak severity. In Modeling and Dynamics of Infectious Diseases; World Scientific: London, UK, 2009; pp. 216–236. [Google Scholar]
 Hsieh, Y.H.; Ma, S. Intervention measures, turning point, and reproduction number for dengue, Singapore, 2005. Am. J. Trop. Med. Hyg. 2009, 80, 66–71. [Google Scholar] [CrossRef] [PubMed]
 Hsieh, Y.H.; Chen, C. Turning points, reproduction number, and impact of climatological events for multiwave dengue outbreaks. Trop. Med. Int. Health 2009, 14, 628–638. [Google Scholar] [CrossRef]
 Hsieh, Y.H. Pandemic influenza A (H1N1) during winter influenza season in the southern hemisphere. Influenza Other Respir. Viruses 2010, 4, 187–197. [Google Scholar] [CrossRef]
 Wu, K.; Darcet, D.; Wang, Q.; Sornette, D. Generalized logistic growth modeling of the COVID19 outbreak in 29 provinces in China and in the rest of the world. arXiv 2020, arXiv:2003.05681. [Google Scholar] [CrossRef]
 Lee, S.Y.; Munafo, A.; Girard, P.; Goteti, K. Optimization of dose selection using multiple surrogates of toxicity as a continuous variable in phase I cancer trial. Contemp. Clin. Trials 2021, 113, 106657. [Google Scholar] [CrossRef]
 Dugel, P.U.; Koh, A.; Ogura, Y.; Jaffe, G.J.; SchmidtErfurth, U.; Brown, D.M.; Gomes, A.V.; Warburton, J.; Weichselberger, A.; Holz, F.G.; et al. HAWK and HARRIER: Phase 3, multicenter, randomized, doublemasked trials of brolucizumab for neovascular agerelated macular degeneration. Ophthalmology 2020, 127, 72–84. [Google Scholar] [CrossRef]
 Willyard, C. New human gene tally reignites debate. Nature 2018, 558, 354–356. [Google Scholar] [CrossRef] [PubMed] [Green Version]
 Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
 Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: London, UK, 2016. [Google Scholar]
 Boyd, S.; Boyd, S.P.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
 James, W.; Stein, C. Estimation with quadratic loss. In Breakthroughs in Statistics; Springer: Berlin/Heidelberg, Germany, 1992; pp. 443–460. [Google Scholar]
 Dawid, A.P. Conditional independence in statistical theory. J. R. Stat. Soc. Ser. B 1979, 41, 1–15. [Google Scholar] [CrossRef]
 Liu, Q.; Pierce, D.A. A note on Gauss—Hermite quadrature. Biometrika 1994, 81, 624–629. [Google Scholar]
 Hedeker, D.; Gibbons, R.D. A randomeffects ordinal regression model for multilevel analysis. Biometrics 1994, 50, 933–944. [Google Scholar] [CrossRef]
 Vonesh, E.F.; Wang, H.; Nie, L.; Majumdar, D. Conditional secondorder generalized estimating equations for generalized linear and nonlinear mixedeffects models. J. Am. Stat. Assoc. 2002, 97, 271–283. [Google Scholar] [CrossRef]
 Hinrichs, A.; Novak, E.; Ullrich, M.; Woźniakowski, H. The curse of dimensionality for numerical integration of smooth functions II. J. Complex. 2014, 30, 117–143. [Google Scholar] [CrossRef]
 Vonesh, E.F.; Carter, R.L. Mixedeffects nonlinear regression for unbalanced repeated measures. Biometrics 1992, 48, 1–17. [Google Scholar] [CrossRef]
 Goldstein, H. Nonlinear multilevel models, with an application to discrete response data. Biometrika 1991, 78, 45–51. [Google Scholar] [CrossRef]
 Vonesh, E.F. A note on the use of Laplaces approximation for nonlinear mixedeffects models. Biometrika 1996, 83, 447–452. [Google Scholar] [CrossRef]
 Marsden, J.E.; Hoffman, M.J. Elementary Classical Analysis; Macmillan: New York, NY, USA, 1993. [Google Scholar]
 Lindley, D.V.; Smith, A.F. Bayes estimates for the linear model. J. R. Stat. Soc. Ser. B 1972, 34, 1–18. [Google Scholar] [CrossRef]
 Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 1977, 39, 1–22. [Google Scholar]
 Meng, X.L.; Rubin, D.B. Using EM to obtain asymptotic variancecovariance matrices: The SEM algorithm. J. Am. Stat. Assoc. 1991, 86, 899–909. [Google Scholar] [CrossRef]
 Walker, S. An EM algorithm for nonlinear random effects models. Biometrics 1996, 52, 934–944. [Google Scholar] [CrossRef]
 Allassonnière, S.; Chevallier, J. A new class of stochastic EM algorithms. Escaping local maxima and handling intractable sampling. Comput. Stat. Data Anal. 2021, 159, 107159. [Google Scholar] [CrossRef]
 Kuhn, E.; Lavielle, M. Maximum likelihood estimation in nonlinear mixed effects models. Comput. Stat. Data Anal. 2005, 49, 1020–1038. [Google Scholar] [CrossRef]
 Samson, A.; Lavielle, M.; Mentré, F. The SAEM algorithm for group comparison tests in longitudinal data analysis based on nonlinear mixedeffects model. Stat. Med. 2007, 26, 4860–4875. [Google Scholar] [CrossRef]
 Kuhn, E.; Lavielle, M. Coupling a stochastic approximation version of EM with an MCMC procedure. ESAIM Probab. Stat. 2004, 8, 115–131. [Google Scholar] [CrossRef] [Green Version]
 Allassonnière, S.; Kuhn, E.; Trouvé, A. Construction of Bayesian deformable models via a stochastic approximation algorithm: A convergence study. Bernoulli 2010, 16, 641–678. [Google Scholar] [CrossRef]
 Bernardo, J.M.; Smith, A.F. Bayesian Theory; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 405. [Google Scholar]
 Lindley, D.V. Bayesian Statistics, a Review; SIAM: Philadelphia, PA, USA, 1972; Volume 2. [Google Scholar]
 Casella, G.; George, E.I. Explaining the Gibbs sampler. Am. Stat. 1992, 46, 167–174. [Google Scholar]
 Murray, I.; Prescott Adams, R.; MacKay, D.J. Elliptical Slice Sampling. In Proceedings of the Thirteenth International Conference on Artificial Intelligence And Statistics, Sardinia, Italy, 13–15 May 2010. [Google Scholar]
 Ranganath, R.; Gerrish, S.; Blei, D. Black box variational inference. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, Reykjavik, Iceland, 22–25 April 2014. [Google Scholar]
 Wang, C.; Blei, D.M. Variational inference in nonconjugate models. J. Mach. Learn. Res. 2013, 14, 1005–1031. [Google Scholar]
 Minka, T.P. Expectation propagation for approximate Bayesian inference. arXiv 2013, arXiv:1301.2294. [Google Scholar]
 Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef] [Green Version]
 Andrieu, C.; De Freitas, N.; Doucet, A.; Jordan, M.I. An introduction to MCMC for machine learning. Mach. Learn. 2003, 50, 5–43. [Google Scholar] [CrossRef] [Green Version]
 Zhang, C.; Bütepage, J.; Kjellström, H.; Mandt, S. Advances in variational inference. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2008–2026. [Google Scholar] [CrossRef] [PubMed] [Green Version]
 Team, R.C. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2013; Available online: http://www.Rproject.org (accessed on 20 February 2022).
 Lee, A.; Yau, C.; Giles, M.B.; Doucet, A.; Holmes, C.C. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. J. Comput. Graph. Stat. 2010, 19, 769–789. [Google Scholar] [CrossRef] [PubMed] [Green Version]
 Suchard, M.A.; Wang, Q.; Chan, C.; Frelinger, J.; Cron, A.; West, M. Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures. J. Comput. Graph. Stat. 2010, 19, 419–438. [Google Scholar] [CrossRef] [PubMed] [Green Version]
 Hastings, W.K. Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
 Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953, 21, 1087–1092. [Google Scholar] [CrossRef] [Green Version]
 Robert, C.P. The Metropolis–Hastings Algorithm. In Wiley StatsRef: Statistics Reference Online; John Wiley and Sons, Ltd.: Hoboken, NJ, USA, 2015; pp. 1–15. [Google Scholar]
 Chib, S.; Greenberg, E. Understanding the metropolishastings algorithm. Am. Stat. 1995, 49, 327–335. [Google Scholar]
 Duane, S.; Kennedy, A.D.; Pendleton, B.J.; Roweth, D. Hybrid monte carlo. Phys. Lett. B 1987, 195, 216–222. [Google Scholar] [CrossRef]
 Mengersen, K.L.; Tweedie, R.L. Rates of convergence of the Hastings and Metropolis algorithms. Ann. Stat. 1996, 24, 101–121. [Google Scholar] [CrossRef]
 Chen, T.; Fox, E.; Guestrin, C. Stochastic gradient hamiltonian monte carlo. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1683–1691. [Google Scholar]
 Aicher, C.; Ma, Y.A.; Foti, N.J.; Fox, E.B. Stochastic gradient mcmc for state space models. SIAM J. Math. Data Sci. 2019, 1, 555–587. [Google Scholar] [CrossRef]
 Griewank, A.; Walther, A. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation; SIAM: Philadelphia, PA, USA, 2008. [Google Scholar]
 Øksendal, B. Stochastic differential equations. In Stochastic Differential Equations; Springer: Berlin/Heidelberg, Germany, 2003; pp. 65–84. [Google Scholar]
 Uhlenbeck, G.E.; Ornstein, L.S. On the theory of the Brownian motion. Phys. Rev. 1930, 36, 823. [Google Scholar] [CrossRef]
 Roberts, G.O.; Tweedie, R.L. Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 1996, 83, 95–110. [Google Scholar] [CrossRef]
 Asai, Y.; Kloeden, P.E. Numerical schemes for random ODEs via stochastic differential equations. Commun. Appl. Anal. 2013, 17, 521–528. [Google Scholar]
 Casella, G.; Robert, C.P. Monte Carlo Statistical Methods; Springer: New York, NY, USA, 1999. [Google Scholar]
 Carpenter, B.; Gelman, A.; Hoffman, M.D.; Lee, D.; Goodrich, B.; Betancourt, M.; Brubaker, M.; Guo, J.; Li, P.; Riddell, A. Stan: A probabilistic programming language. J. Stat. Softw. 2017, 76, 1–32. [Google Scholar] [CrossRef] [Green Version]
 Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for largescale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
 Leimkuhler, B.; Reich, S. Simulating Hamiltonian Dynamics; Number 14; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
 Zou, D.; Gu, Q. On the convergence of Hamiltonian Monte Carlo with stochastic gradients. In Proceedings of the International Conference on Machine Learning (PMLR), Virtual, 18–24 July 2021; pp. 13012–13022. [Google Scholar]
 Meza, C.; Osorio, F.; De la Cruz, R. Estimation in nonlinear mixedeffects models using heavytailed distributions. Stat. Comput. 2012, 22, 121–139. [Google Scholar] [CrossRef]
 Jeffreys, H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. A 1946, 186, 453–461. [Google Scholar]
 Makalic, E.; Schmidt, D.F. A simple sampler for the horseshoe estimator. IEEE Signal Process. Lett. 2016, 23, 179–182. [Google Scholar] [CrossRef] [Green Version]
 Castillo, I.; SchmidtHieber, J.; Van der Vaart, A. Bayesian linear regression with sparse priors. Ann. Stat. 2015, 43, 1986–2018. [Google Scholar] [CrossRef] [Green Version]
 Lee, S.Y.; Pati, D.; Mallick, B.K. Tailadaptive Bayesian shrinkage. arXiv 2020, arXiv:2007.02192. [Google Scholar]
 Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
 Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
 Fan, J.; Samworth, R.; Wu, Y. Ultrahigh dimensional feature selection: Beyond the linear model. J. Mach. Learn. Res. 2009, 10, 2013–2038. [Google Scholar]
 Lu, Y.; Stuart, A.; Weber, H. Gaussian Approximations for Probability Measures on R^{d}. SIAM/ASA J. Uncertain. Quantif. 2017, 5, 1136–1165. [Google Scholar] [CrossRef] [Green Version]
 Wang, Y.; Blei, D.M. Frequentist consistency of variational Bayes. J. Am. Stat. Assoc. 2019, 114, 1147–1161. [Google Scholar] [CrossRef] [Green Version]
 Johnstone, I.M. High dimensional Bernsteinvon Mises: Simple examples. Inst. Math. Stat. Collect. 2010, 6, 87. [Google Scholar]
 Le Cam, L.; LeCam, L.M.; Yang, G.L. Asymptotics in Statistics: Some Basic Concepts; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
 Davidian, M.; Gallant, A.R. Smooth nonparametric maximum likelihood estimation for population pharmacokinetics, with application to quinidine. J. Pharmacokinet. Biopharm. 1992, 20, 529–556. [Google Scholar] [CrossRef] [Green Version]
 Wei, Y.; Higgins, J.P. Bayesian multivariate metaanalysis with multiple outcomes. Stat. Med. 2013, 32, 2911–2934. [Google Scholar] [CrossRef]
 Zellner, A. On assessing prior distributions and Bayesian regression analysis with gprior distributions. In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti; Elsevier Science Publishers, Inc.: New York, NY, USA, 1986; pp. 233–243. [Google Scholar]
 Pirmohamed, M. Pharmacogenetics and pharmacogenomics. Br. J. Clin. Pharmacol. 2001, 52, 345. [Google Scholar] [CrossRef] [PubMed] [Green Version]
 Weinshilboum, R.M.; Wang, L. Pharmacogenetics and pharmacogenomics: Development, science, and translation. Annu. Rev. Genom. Hum. Genet. 2006, 7, 223–245. [Google Scholar] [CrossRef] [PubMed]
 ArabAlameddine, M.; Di Iulio, J.; Buclin, T.; Rotger, M.; Lubomirov, R.; Cavassini, M.; Fayet, A.; Décosterd, L.; Eap, C.B.; Biollaz, J.; et al. Pharmacogeneticsbased population pharmacokinetic analysis of efavirenz in HIV1infected individuals. Clin. Pharmacol. Ther. 2009, 85, 485–494. [Google Scholar] [CrossRef] [PubMed]
 Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations; Chapman and Hall/CRC: New York, NY, USA, 2015. [Google Scholar]
 Mitchell, T.J.; Beauchamp, J.J. Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 1988, 83, 1023–1032. [Google Scholar] [CrossRef]
 George, E.I.; McCulloch, R.E. Stochastic search variable selection. Markov Chain Monte Carlo Pract. 1995, 68, 203–214. [Google Scholar]
 Johnson, V.E.; Rossell, D. On the use of nonlocal prior densities in Bayesian hypothesis tests. J. R. Stat. Soc. Ser. B 2010, 72, 143–170. [Google Scholar] [CrossRef] [Green Version]
 Yang, Y.; Wainwright, M.J.; Jordan, M.I. On the computational complexity of highdimensional Bayesian variable selection. Ann. Stat. 2016, 44, 2497–2532. [Google Scholar] [CrossRef]
 Castillo, I.; van der Vaart, A. Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Stat. 2012, 40, 2069–2101. [Google Scholar] [CrossRef]
 Park, T.; Casella, G. The bayesian lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
 Tipping, M.E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 2001, 1, 211–244. [Google Scholar]
 Griffin, J.E.; Brown, P.J. Inference with normalgamma prior distributions in regression problems. Bayesian Anal. 2010, 5, 171–188. [Google Scholar]
 Carvalho, C.M.; Polson, N.G.; Scott, J.G. The horseshoe estimator for sparse signals. Biometrika 2010, 97, 465–480. [Google Scholar] [CrossRef] [Green Version]
 Carvalho, C.M.; Polson, N.G.; Scott, J.G. Handling sparsity via the horseshoe. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, Clearwater Beach, FL, USA, 16–18 April 2009. [Google Scholar]
 Polson, N.G.; Scott, J.G. Shrink globally, act locally: Sparse Bayesian regularization and prediction. Bayesian Stat. 2010, 9, 105. [Google Scholar]
 George, E.I.; McCulloch, R.E. Approaches for Bayesian variable selection. Stat. Sin. 1997, 7, 339–373. [Google Scholar]
 Johnstone, I.M.; Silverman, B.W. Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Stat. 2004, 32, 1594–1649. [Google Scholar] [CrossRef] [Green Version]
 Pati, D.; Bhattacharya, A.; Pillai, N.S.; Dunson, D. Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Stat. 2014, 42, 1102–1130. [Google Scholar] [CrossRef] [Green Version]
 Song, Q.; Liang, F. Nearly optimal Bayesian shrinkage for high dimensional regression. arXiv 2017, arXiv:1712.08964. [Google Scholar]
 Martin, R.; Mess, R.; Walker, S.G. Empirical Bayes posterior concentration in sparse highdimensional linear models. Bernoulli 2017, 23, 1822–1847. [Google Scholar] [CrossRef] [Green Version]
 Bai, R.; Ghosh, M. Highdimensional multivariate posterior consistency under global–local shrinkage priors. J. Multivar. Anal. 2018, 167, 157–170. [Google Scholar] [CrossRef] [Green Version]
 Zhang, R.; Ghosh, M. Ultra Highdimensional Multivariate Posterior Contraction Rate Under Shrinkage Priors. arXiv 2019, arXiv:1904.04417. [Google Scholar] [CrossRef]
 Lee, S.; Kim, J.H. Exponentiated generalized Pareto distribution: Properties and applications towards extreme value theory. Commun. Stat.Theory Methods 2019, 48, 2014–2038. [Google Scholar] [CrossRef] [Green Version]
 Armagan, A.; Dunson, D.B.; Lee, J. Generalized double Pareto shrinkage. Stat. Sin. 2013, 23, 119. [Google Scholar] [CrossRef] [PubMed] [Green Version]
 O’Hara, R.B.; Sillanpää, M.J. A review of Bayesian variable selection methods: What, how and which. Bayesian Anal. 2009, 4, 85–117. [Google Scholar] [CrossRef]
 Bhadra, A.; Datta, J.; Polson, N.G.; Willard, B. Lasso meets horseshoe: A survey. Stat. Sci. 2019, 34, 405–427. [Google Scholar] [CrossRef] [Green Version]
 Gelman, A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 2006, 1, 515–534. [Google Scholar] [CrossRef]
 Fan, J.; Liao, Y.; Liu, H. An overview of the estimation of large covariance and precision matrices. Econom. J. 2016, 19, C1–C32. [Google Scholar] [CrossRef]
 Bickel, P.J.; Levina, E. Covariance regularization by thresholding. Ann. Stat. 2008, 36, 2577–2604. [Google Scholar] [CrossRef]
 Lam, C.; Fan, J. Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Stat. 2009, 37, 4254. [Google Scholar] [CrossRef]
 El Karoui, N. Highdimensionality effects in the Markowitz problem and other quadratic programs with linear constraints: Risk underestimation. Ann. Stat. 2010, 38, 3487–3566. [Google Scholar] [CrossRef]
 Stein, C. Estimation of a covariance matrix, Rietz Lecture. In Proceedings of the 39th Annual Meeting IMS, Atlanta, GA, USA, 1975. [Google Scholar]
 Pourahmadi, M. HighDimensional Covariance Estimation: With HighDimensional Data; John Wiley & Sons: Hoboken, NJ, USA, 2013; Volume 882. [Google Scholar]
 Ledoit, O.; Wolf, M. A wellconditioned estimator for largedimensional covariance matrices. J. Multivar. Anal. 2004, 88, 365–411. [Google Scholar] [CrossRef] [Green Version]
 Rajaratnam, B.; Massam, H.; Carvalho, C.M. Flexible covariance estimation in graphical Gaussian models. Ann. Stat. 2008, 36, 2818–2849. [Google Scholar] [CrossRef]
 Won, J.H.; Lim, J.; Kim, S.J.; Rajaratnam, B. Conditionnumberregularized covariance estimation. J. R. Stat. Soc. Ser. B 2013, 75, 427–450. [Google Scholar] [CrossRef] [Green Version]
 Liu, C. Bartlett’ s Decomposition of the Posterior Distribution of the Covariance for Normal Monotone Ignorable Missing Data. J. Multivar. Anal. 1993, 46, 198–206. [Google Scholar] [CrossRef] [Green Version]
 Barnard, J.; McCulloch, R.; Meng, X.L. Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Stat. Sin. 2000, 10, 1281–1311. [Google Scholar]
 Geisser, S. Bayesian estimation in multivariate analysis. Ann. Math. Stat. 1965, 36, 150–159. [Google Scholar] [CrossRef]
 Lin, S.P.; Perlman, M.D. A Monte Carlo comparison of four estimators for a covariance matrix. In Multivariate Analysis VI; Krishnaiah, P.R., Ed.; NorthHolland: Amsterdam, The Netherlands, 1985; pp. 411–429. [Google Scholar]
 Brown, P.J.; Le, N.D.; Zidek, J.V. Inference for a Covariance Matrix. In Aspects of Uncertainty; Freeman, P.R., Smith, A.F.M., Eds.; John Wiley: Chichester, UK, 1994; pp. 77–90. [Google Scholar]
 Jeffreys, H. The Theory of Probability; OUP Oxford: Oxford, UK, 1998. [Google Scholar]
 Geisser, S.; Cornfield, J. Posterior distributions for multivariate normal parameters. J. R. Stat. Soc. Ser. B 1963, 25, 368–376. [Google Scholar] [CrossRef]
 Villegas, C. On the a priori distribution of the covariance matrix. Ann. Math. Stat. 1969, 40, 1098–1099. [Google Scholar] [CrossRef]
 Schervish, M.J. Theory of Statistics; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
 James, A.T. Distributions of matrix variates and latent roots derived from normal samples. Ann. Math. Stat. 1964, 35, 475–501. [Google Scholar] [CrossRef]
 Yang, R.; Berger, J.O. Estimation of a covariance matrix using the reference prior. Ann. Stat. 1994, 22, 1195–1211. [Google Scholar] [CrossRef]
 Daniels, M.J.; Kass, R.E. Shrinkage estimators for covariance matrices. Biometrics 2001, 57, 1173–1184. [Google Scholar] [CrossRef] [Green Version]
 Wong, F.; Carter, C.K.; Kohn, R. Efficient estimation of covariance selection models. Biometrika 2003, 90, 809–830. [Google Scholar] [CrossRef] [Green Version]
 Sun, D.; Berger, J.O. Objective Bayesian analysis for the multivariate normal model. Bayesian Stat. 2007, 8, 525–562. [Google Scholar]
 Daniels, M.J.; Pourahmadi, M. Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika 2002, 89, 553–566. [Google Scholar] [CrossRef]
 Smith, M.; Kohn, R. Parsimonious covariance matrix estimation for longitudinal data. J. Am. Stat. Assoc. 2002, 97, 1141–1153. [Google Scholar] [CrossRef]
 Lewandowski, D.; Kurowicka, D.; Joe, H. Generating random correlation matrices based on vines and extended onion method. J. Multivar. Anal. 2009, 100, 1989–2001. [Google Scholar] [CrossRef] [Green Version]
 Ghosh, S.; Henderson, S.G. Behavior of the NORTA method for correlated random vector generation as the dimension increases. ACM Trans. Model. Comput. Simul. 2003, 13, 276–294. [Google Scholar] [CrossRef]
 Joe, H. Generating random correlation matrices based on partial correlations. J. Multivar. Anal. 2006, 97, 2177–2189. [Google Scholar] [CrossRef] [Green Version]
 Gilks, W.R.; Richardson, S.; Spiegelhalter, D. Markov Chain Monte Carlo in Practice; CRC Press: Boca Raton, FL, USA, 1995. [Google Scholar]
 Spiegelhalter, D.J.; Best, N.G.; Carlin, B.P.; Van Der Linde, A. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B 2002, 64, 583–639. [Google Scholar] [CrossRef] [Green Version]
 Watanabe, S. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 2010, 11, 3571–3594. [Google Scholar]
 Gelfand, A.E.; Ghosh, S.K. Model choice: A minimum posterior predictive loss approach. Biometrika 1998, 85, 1–11. [Google Scholar] [CrossRef] [Green Version]
 Akaike, H. Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike; Springer: Berlin/Heidelberg, Germany, 1998; pp. 199–213. [Google Scholar]
 Efron, B. How biased is the apparent error rate of a prediction rule? J. Am. Stat. Assoc. 1986, 81, 461–470. [Google Scholar] [CrossRef]
 Burnham, K.P.; Anderson, D.R. Practical use of the informationtheoretic approach. In Model Selection and Inference; Springer: Berlin/Heidelberg, Germany, 1998; pp. 75–117. [Google Scholar]
 Banerjee, S.; Carlin, B.P.; Gelfand, A.E. Hierarchical Modeling and Analysis for Spatial Data; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
 Gelman, A.; Hwang, J.; Vehtari, A. Understanding predictive information criteria for Bayesian models. Stat. Comput. 2014, 24, 997–1016. [Google Scholar] [CrossRef]
 Celeux, G.; Forbes, F.; Robert, C.P.; Titterington, D.M. Deviance information criteria for missing data models. Bayesian Anal. 2006, 1, 651–673. [Google Scholar] [CrossRef]
 Robert, C.; Casella, G. Monte Carlo Statistical Methods; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
 Vehtari, A.; Gelman, A. WAIC and CrossValidation in Stan; Aalto University: Helsinki, Finland, 2014. [Google Scholar]
 Box, G.E. Sampling and Bayes’ inference in scientific modelling and robustness. J. R. Stat. Soc. Ser. A 1980, 143, 383–430. [Google Scholar] [CrossRef]
 Zellner, A. Bayesian and nonBayesian estimation using balanced loss functions. In Statistical Decision Theory and Related Topics V; Springer: Berlin/Heidelberg, Germany, 1994; pp. 377–390. [Google Scholar]
 Vonesh, E.F. Nonlinear models for the analysis of longitudinal data. Stat. Med. 1992, 11, 1929–1954. [Google Scholar] [CrossRef]
 Müller, P.; Rosner, G.L. A Bayesian population model with hierarchical mixture priors applied to blood count data. J. Am. Stat. Assoc. 1997, 92, 1279–1292. [Google Scholar]
 Müller, P.; Quintana, F.A. Nonparametric Bayesian data analysis. Stat. Sci. 2004, 19, 95–110. [Google Scholar] [CrossRef]
 Hjort, N.L.; Holmes, C.; Müller, P.; Walker, S.G. Bayesian Nonparametrics; Cambridge University Press: Cambridge, UK, 2010; Volume 28. [Google Scholar]
 Walker, S.; Wakefield, J. Population models with a nonparametric random coefficient distribution. Sankhyā Indian J. Stat. Ser. 1998, 60, 196–214. [Google Scholar]
 MacKay, D.J. Introduction to Gaussian processes. NATO ASI Ser. F Comput. Syst. Sci. 1998, 168, 133–166. [Google Scholar]
 Rasmussen, C.E. Gaussian processes in machine learning. In Summer School on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2003; pp. 63–71. [Google Scholar]
 Ferguson, T.S. Prior distributions on spaces of probability measures. Ann. Stat. 1974, 2, 615–629. [Google Scholar] [CrossRef]
 Escobar, M.D. Estimating normal means with a Dirichlet process prior. J. Am. Stat. Assoc. 1994, 89, 268–277. [Google Scholar] [CrossRef]
 Escobar, M.D.; West, M. Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 1995, 90, 577–588. [Google Scholar] [CrossRef]
 McLachlan, G.J.; Lee, S.X.; Rathnayake, S.I. Finite mixture models. Annu. Rev. Stat. Its Appl. 2019, 6, 355–378. [Google Scholar] [CrossRef]
 Rasmussen, C.E. The infinite Gaussian mixture model. Advances in Neural Information Processing Systems 12. 1999, Volume 12, pp. 554–560. Available online: https://papers.nips.cc/paper/1999/hash/97d98119037c5b8a9663cb21fb8ebf47Abstract.html (accessed on 20 February 2022).
 Antoniak, C.E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat. 1974, 2, 1152–1174. [Google Scholar] [CrossRef]
 Teh, Y.W.; Jordan, M.I.; Beal, M.J.; Blei, D.M. Hierarchical dirichlet processes. J. Am. Stat. Assoc. 2006, 101, 1566–1581. [Google Scholar] [CrossRef]
 Jara, A. Theory and computations for the Dirichlet process and related models: An overview. Int. J. Approx. Reason. 2017, 81, 128–146. [Google Scholar] [CrossRef]
 Rosner, G.L.; Müller, P. Bayesian population pharmacokinetic and pharmacodynamic analyses using mixture models. J. Pharmacokinet. Biopharm. 1997, 25, 209–233. [Google Scholar] [CrossRef]
 Müller, P.; Quintana, F.; Rosner, G. A method for combining inference across related nonparametric Bayesian models. J. R. Stat. Soc. Ser. B 2004, 66, 735–749. [Google Scholar] [CrossRef]
 Brown, H.; Prescott, R. Applied Mixed Models in Medicine; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
 Congdon, P.D. Applied Bayesian Hierarchical Methods; CRC Press: Boca Raton, FL, USA, 2010. [Google Scholar]
 Jordan, M.I. Graphical models. Stat. Sci. 2004, 19, 140–155. [Google Scholar] [CrossRef]
 Lauritzen, S.L.; Dawid, A.P.; Larsen, B.N.; Leimer, H.G. Independence properties of directed Markov fields. Networks 1990, 20, 491–505. [Google Scholar] [CrossRef]
 Geman, S.; Geman, D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 1984, PAMI6, 721–741. [Google Scholar] [CrossRef] [PubMed]
 Liu, J.S. The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J. Am. Stat. Assoc. 1994, 89, 958–966. [Google Scholar] [CrossRef]
 Park, T.; Lee, S. Improving the Gibbs sampler. Wiley Interdiscip. Rev. Comput. Stat. 2021, e1546. [Google Scholar] [CrossRef]
 Spiegelhalter, D.J.; Thomas, A.; Best, N.; Lunn, D. WinBUGS Version 1.4 User Manual; MRC Biostatistics Unit: Cambridge, UK, 2003; Available online: http://www.mrcbsu.cam.ac.uk/bugs (accessed on 20 February 2022).
 Spiegelhalter, D.; Thomas, A.; Best, N.; Lunn, D. OpenBUGS user manual. Version 2007, 3, 2007. [Google Scholar]
 Barthelmé, S.; Chopin, N. Expectation propagation for likelihoodfree inference. J. Am. Stat. Assoc. 2014, 109, 315–333. [Google Scholar] [CrossRef] [Green Version]
 Zhu, J.; Chen, J.; Hu, W.; Zhang, B. Big learning with Bayesian methods. Natl. Sci. Rev. 2017, 4, 627–651. [Google Scholar] [CrossRef]
 Jordan, M.I. Message from the president: The era of big data. ISBA Bull. 2011, 18, 1–3. [Google Scholar]
 Johnson, D.; Sinanovic, S. Symmetrizing the kullbackleibler distance. IEEE Trans. Inf. Theory. 2001. Available online: https://scholarship.rice.edu/bitstream/handle/1911/19969/Joh2001Mar1Symmetrizi.PDF?sequence=1 (accessed on 20 February 2022).
 Tan, L.S.; Nott, D.J. Variational inference for generalized linear mixed models using partially noncentered parametrizations. Stat. Sci. 2013, 28, 168–188. [Google Scholar] [CrossRef] [Green Version]
 Ormerod, J.T.; Wand, M.P. Gaussian variational approximate inference for generalized linear mixed models. J. Comput. Graph. Stat. 2012, 21, 2–17. [Google Scholar] [CrossRef] [Green Version]
 Tan, L.S.; Nott, D.J. A stochastic variational framework for fitting and diagnosing generalized linear mixed models. Bayesian Anal. 2014, 9, 963–1004. [Google Scholar] [CrossRef] [Green Version]
 Ngufor, C.; Van Houten, H.; Caffo, B.S.; Shah, N.D.; McCoy, R.G. Mixed Effect Machine Learning: A framework for predicting longitudinal change in hemoglobin A1c. J. Biomed. Inform. 2019, 89, 56–67. [Google Scholar] [CrossRef] [PubMed]
 Capitaine, L.; Genuer, R.; Thiébaut, R. Random forests for highdimensional longitudinal data. Stat. Methods Med. Res. 2021, 30, 166–184. [Google Scholar] [CrossRef] [PubMed]
 Mandel, F.; Ghosh, R.P.; Barnett, I. Neural Networks for Clustered and Longitudinal Data Using Mixed Effects Models. Biometrics 2021. [CrossRef] [PubMed]
 Fu, W.; Simonoff, J.S. Unbiased regression trees for longitudinal and clustered data. Comput. Stat. Data Anal. 2015, 88, 53–74. [Google Scholar] [CrossRef]
 Tsybakov, A.B. Introduction to Nonparametric Estimation; Springer: New York, NY, USA, 2009. [Google Scholar]
 Schulz, E.; Speekenbrink, M.; Krause, A. A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions. J. Math. Psychol. 2018, 85, 1–16. [Google Scholar] [CrossRef]
 Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
 Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Characteristic  Frequentist  Bayesian 

Estimation objective  Maximize a likelihood [14,15,21]  Sample from a posterior [22,28,30] 
Computation algorithm  Firstorder approximation [36], Laplace approximation [37], and stochastic approximation of EM algorithm [38]  Gibbs sampler [39], MetropolisHastings algorithm [40], Hamiltonian Monte Carlo [41], and NoUTurn sampler [42] 
Software  SAS [46], NONMEM [47], Monolix [48], nlmixr [18]  JAGS [49], BUGS [35], Stan [17], brms [50] 
Advantages  Relatively fast computation speed, the objectivity of inference results, and widely available software packages to implement complex models  Inherent uncertainty quantification, better small sample performance, and utility of prior knowledge 
Disadvantages  Needs largesample theory for uncertainty quantification and cannot incorporate prior knowledge  Needs high computing power for big data and requires Bayesian expertise in prior elicitation 
Research Field  Problem  Objective  References 

Pharmaceutical industry  Pharmacokinetics analysis  Estimation of typical values of pharmacokinetics parameters  [79,80,81,82] 
Oil and gas industry  Decline curve analysis  Prediction of estimated ultimate recovery  [31,83,84,85] 
Financial industry  Yield curve modeling  Estimation of the interest rate parameters over time  [86,87,88,89] 
Epidemiology  Epidemic spread prediction  Prediction of final epidemic size and finding risk factors  [30,90,91] 
Residual Error Type  IndividualLevel Model  Mean $\mathbb{E}\left[{\mathit{y}}_{\mathbf{ij}}\right{\mathit{\theta}}^{\mathit{i}}]$  Variance $\mathbb{V}\left[{\mathit{y}}_{\mathbf{ij}}\right{\mathit{\theta}}^{\mathit{i}}]$ 

Additive  ${y}_{ij}=f({t}_{ij};{\mathbf{\theta}}^{i})+{\u03f5}_{ij}$  $f({t}_{ij};{\mathbf{\theta}}^{i})$  ${\sigma}^{2}$ 
Proportional  ${y}_{ij}=f({t}_{ij};{\mathbf{\theta}}^{i})\xb7(1+{\u03f5}_{ij})$  $f({t}_{ij};{\mathbf{\theta}}^{i})$  ${\left\{f({t}_{ij};{\mathbf{\theta}}^{i})\right\}}^{2}\xb7{\sigma}^{2}$ 
Exponential  ${y}_{ij}=f({t}_{ij};{\mathbf{\theta}}^{i})\xb7exp\left({\u03f5}_{ij}\right)$  $f({t}_{ij};{\mathbf{\theta}}^{i})\xb7exp({\sigma}^{2}/2)$  ${\left\{f({t}_{ij};{\mathbf{\theta}}^{i})\right\}}^{2}\xb7(exp\left({\sigma}^{2}\right)1)\xb7exp\left({\sigma}^{2}\right)$ 
Additive and proportional  ${y}_{ij}=f({t}_{ij};{\mathbf{\theta}}^{i})\xb7(1+{\u03f5}_{ij})+{\epsilon}_{ij}$  $f({t}_{ij};{\mathbf{\theta}}^{i})$  ${\left\{f({t}_{ij};{\mathbf{\theta}}^{i})\right\}}^{2}\xb7{\sigma}^{2}+{\varsigma}^{2}$ 
Additive and exponential  ${y}_{ij}=f({t}_{ij};{\mathbf{\theta}}^{i})\xb7exp\left({\u03f5}_{ij}\right)+{\epsilon}_{ij}$  $f({t}_{ij};{\mathbf{\theta}}^{i})\xb7exp({\sigma}^{2}/2)$  ${\left\{f({t}_{ij};{\mathbf{\theta}}^{i})\right\}}^{2}\xb7(exp\left({\sigma}^{2}\right)1)\xb7exp\left({\sigma}^{2}\right)+{\varsigma}^{2}$ 
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. 
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, S.Y. Bayesian Nonlinear Models for Repeated Measurement Data: An Overview, Implementation, and Applications. Mathematics 2022, 10, 898. https://doi.org/10.3390/math10060898
Lee SY. Bayesian Nonlinear Models for Repeated Measurement Data: An Overview, Implementation, and Applications. Mathematics. 2022; 10(6):898. https://doi.org/10.3390/math10060898
Chicago/Turabian StyleLee, Se Yoon. 2022. "Bayesian Nonlinear Models for Repeated Measurement Data: An Overview, Implementation, and Applications" Mathematics 10, no. 6: 898. https://doi.org/10.3390/math10060898