Abstract
A Bayesian semiparametric model framework is proposed to analyze multivariate longitudinal data. The new framework leads to simple explicit posterior distributions of model parameters. It results in easy implementation of the MCMC algorithm for estimation of model parameters and demonstrates fast convergence. The proposed model framework associated with the MCMC algorithm is validated by four covariance structures and a real-life dataset. A simple Monte Carlo study of the model under four covariance structures and an analysis of the real dataset show that the new model framework and its associated Bayesian posterior inferential method through the MCMC algorithm perform fairly well in the sense of easy implementation, fast convergence, and smaller root mean square errors compared with the same model without the specified autoregression structure.
Keywords:
Bayesian semiparametric method; covariance structure; Dirichlet process; linear mixed model; longitudinal data; MCMC algorithm MSC:
62F10; 62F15; 62F40; 65C40
1. Introduction
Longitudinal data arise from repeated observations from the same individual or group of individuals at different time points. The structure of longitudinal data is shown in the following Table 1. The basic tasks of analyzing longitudinal data can be summarized as (1) studying the trend in the covariance structure of the observed variables with respect to time; (2) discovering the influence of covariates on the observable variables; and (3) determining the within-group correlation structures [1].
Table 1.
Longitudinal data structures.
Longitudinal data are often highly unbalanced. It is usually difficult to apply traditional multiple regression techniques to analyze highly unbalanced data directly. Statisticians developed various Bayesian statistical inference models for longitudinal data analysis [2]. A parametric assumption may result in modeling bias and may relax the assumption about parametric structure. Nonparametric methods have the characteristics of robustness because they do not require model assumptions. Semiparametric models integrate the characteristics of parametric and nonparametric models and have the characteristics of flexibility and ease of interpretation. Nonparametric statistical methods and semiparametric statistical methods are not only hot spots in current statistical research but also widely used in many practical applications. Xiang et al. [3] summarize some common outcomes of nonparametric regression analysis of longitudinal data. Bayesian methods for parametric linear mixed models have been widely used in different areas. Assuming a normal random effect and using the standard Gibbs sampling to realize some simple posterior inference, Quintana et al. [4] extend the general class of mixed models for longitudinal data by generalizing the GP part to the nonparametric case. The GP is a probabilistic approach to learning nonparametric models. Cheng et al. [5] propose the so-called LonGP, which is a flexible and interpretable nonparametric modeling framework. It provides a versatile software implementation that can solve commonly faced challenges in longitudinal data analysis. It also develops a fully Bayesian, predictive inference for LonGP, which can be employed to carry out model selection. Kleinman and Ibrahim [6] relax the normal assumption by assuming a Dirichlet process prior with a Gaussian measure of zero mean, semiparametrically modeling the random effects distribution [7]. Although a parametric model may have limitations in some applications, it is simpler than a nonparametric model whose scope may be too wide to draw a concise conclusion. Nonparametric regression has a fatal weakness, known as the “curse of dimensionality”, which refers to the fact that when the independent variable X is multivariate, the estimation accuracy of the regression function becomes very poor as the dimension of X increases. To solve this problem, the semiparametric model is a good compromise maintaining some excellent characteristics of parametric and nonparametric models [8]. There is a lot of literature on the semiparametric modeling of longitudinal data. Most of the existing literature employs random effects to model within-group correlations [9].
Semiparametric linear mixed models generalize traditional linear mixed models by modeling a covariate effect with a nonparametric function and parametrically modeling other covariate effects. Semiparametric linear mixed models mainly use frequentist methods for statistical inference and assume normal random effects [10]. In this paper, we employ a Bayesian semiparametric framework for linear mixed models given by Quintana et al. [4] by imposing stronger conditions on the random effect to obtain an explicit solution to posterior distributions of the model parameters and fast convergence of the MCMC algorithm in the posterior inference of model parameters. The proposed framework generalizes the default priori of variance components and adjusts the inference of fixed effects associated with nonparametric random effects. The latter includes the extrapolation of nonparametric mean functions over time [11,12]. The stochastic process approach in [4] is a good choice for characterizing the intragroup correlation. The Gaussian process (GP) covariance has an exponential form and is uniquely determined by two parameters. GP can specify autoregressive correlation (the AR covariance structure, [13]). A nonparametric Dirichlet process (DP) priori assigned to the covariance parameter results in an Ornstein–Uhlenbeck process (OUP). A partial Dirichlet process mixture (DPM) is performed on the OUP. By imposing stronger conditions on the random effect and decomposing the OUP into some single Gaussian variables, we can substantially simplify the MCMC sampling in the posterior inference. The framework of our proposed semiparametric autoregression model (SPAR) is demonstrated in Figure 1.
Figure 1.
Bayesian semiparametric autoregression model (SPAR model).
This paper is organized as follows. Section 2 briefly introduces some basic theoretical background and principles. Section 3 introduces the model setup using a partial Dirichlet process mixture in terms of the OUP, and the Bayesian semiparametric autoregressive model is proposed with a recommended solution. In Section 4, the Dirichlet process and Dirichlet process mixture (DPM) are simply introduced. The formulation of the semiparametric autoregressive model is given in Section 5. Section 6 derives the marginal likelihood, prior determination, and posteriori inference. Section 7 gives a simple Monte Carlo study and a real dataset application based on the proposed model. Some concluding remarks are given in the last section.
2. Theoretical Basis
2.1. A General Linear Hybrid Model Containing an AR Structure
For an observation object , we denote the observation time points by . Let () be the observation vector from object i. At time point , we consider the possible time-dependent covariate vector . Let () be the observation vector from object i. At time point , we consider the possible time-dependent covariate vector . Let . Define the -dimensional fixed-effect design matrix . Assume . Define the corresponding -dimensional random effects design matrix , where . The general linear mixed model containing an AR covariance structure is
where stands for the fixed effect and for the random effect, is a stochastic process characterizing the correlation among the observations from object i, is the error term, . is an structural covariance matrix, and vectors and contain the variance–covariance parameters of the random vector and the stochastic process , respectively. stands for the identity matrix of dimension . The AR structure is usually specified by a GP with zero mean. The GP is uniquely determined by a covariance function containing parameter . The vector is a GP corresponding to the i-th object. is generated sequentially by the GP at the observation time points , i.e.,
To specify the AR structure, it is assumed that the above GP possesses stationarity:
When the model has a structured covariance function, the covariance matrix of is
2.2. The Principle for Bayesian Inference
The Bayesian method assumes a piori distribution on the unknown parameter and a joint distribution between an observable variable X and the parameter . The Bayesian method is based on the posterior distribution of the unknown parameter after observed data are available. The joint distribution, the priori distribution, and the posterior distribution are related to each other as follows:
where stands for the sample marginal density of . We use the posterior distribution to carry out statistical inference for the parameter :
Equation (4) is the well-known Bayesian formula. Bayesian statistical inference assumes that the posterior distribution of the unknown parameter , contains all the available information. As a result, the point estimation, interval estimation, hypothesis testing, and predicting inference of can be implemented as usual. Because
we construct the Bayesian estimate for after observing data by their conditional expectation:
In general, the Bayesian estimate for a function of , can be obtained as follows:
Obviously, the estimator is the usual expectation of with respect to the posterior distribution :
2.3. The MCMC Sampling and Its Convergence
When the posterior distribution in (6) is difficult to compute, estimating multiple parameters given by can be realized by the Markov Chain Monte Carlo (MCMC) method [14] following these steps: (1) establish a Markov chain, the stationary distribution for ; (2) use the Markov chain for the posterior distribution to carry out sampling to obtain an MCMC sample ; and (3) obtain the MCMC estimator for by
The MCMC sample can estimate the parameter function more and more effectively when the sample size n becomes larger and larger. The MCMC sample possesses some properties, such as stability, normal recurrence, periodicity, and irreducibility. The choice of the initial value has little impact on the estimate of . Gibbs sampling is one of the MCMC algorithms. It is used to construct a multivariate probability distribution of a random sample. The defect of the standard Gibbs sampling is that it cannot process the nonconjugated distribution. Because the prior distribution of parameter in the model in this paper is nonconjugated, the improved version of the Gibbs algorithm is adopted [15].
With regard to the convergence of the MCMC sampling, we consider three convergence criteria: (1) Combine the posterior sampling trend graph with the energy graph of the MCMC sampling process for convergence assessment. By observing the posterior sampling trend graph of a random variable, we can determine if the information of the sampling tends to be stable as the number of iterations increases. (2) Compare the overall distribution of energy levels in real data with energy changes between successive samples. If the two distributions are similar to each other, we can conclude that the algorithm converges. (3) Draw the trajectory of the negative evidence lower bound (ELBO, [16]) obtained in the optimization process to verify the convergence of the algorithm. Minimizing the KL (Kullback–Leibler) divergence is equivalent to minimizing the negative ELBO [17,18].
3. The OU Process
The random process in the general linear mixed model (1) is a Gaussian process (GP). A GP can be regarded as an infinite dimensional extension of the multivariate Gaussian distribution, and its probability characteristics are uniquely determined by the mean function and covariance function. In particular, a GP with zero mean is completely determined by its covariance function [18]. To give a simple review of GP and keep the notation specially used in model (1), we return to a general notation for GP to avoid a mix-up with the model parameter in model (1). in this section can be considered as a copy of in model (1). If random process is a GP, any finite observation vector has a multivariate Gaussian distribution. Let . The joint density function of the multivariate Gaussian random vector is
where , mean vector , and stands for the covariance matrix. Let . When the Gaussian process has a zero mean and it is smooth, we have and
where and stand for the GP autocorrelation function and the autocovariance function, respectively, which only depend on the time interval . Thus, , and . The correlation coefficient between and is denoted by . It is assumed that in this paper. A zero-mean smooth GP is an Ornstein–Uhlenbeck (OU) process [19]. An OU process can be regarded as a continuous analogy of the discrete-time first-order autoregressive AR(1) process.
To understand some properties of the AR(1) process, we express the AR(1) process as
where is a weight parameter to ensure stability, and are uncorrelated random variables satisfying
It is assumed that the random error satisfies . Therefore,
The autocorrelation function is assumed to follow the first-order difference equation
so that we have the following solution:
As shown in Figure 2 below, when , the absolute correlation between and of at two different time points and approaches 0 when the time is increasing.
Figure 2.
Example of an AR(1) process.
The above AR(1) process (7) is a special case of an OU process by taking and , that is,
Therefore, the exponential covariance matrix constructed by the OU process can be used to describe the correlation in longitudinal data. An OU process is showed in Figure 3. It is obvious that when moves toward its mean position with the increase in time t, the correlation between and at two different time points becomes weaker and weaker.
Figure 3.
Example of an OU process.
4. Dirichlet Process and Dirichlet Process Mixture
4.1. Dirichlet Process
A Dirichlet process (DP) is a class of stochastic processes whose trace is a probability distribution. DP is often used in Bayesian inference to describe the prior knowledge of random variables.
Definition 1.
Given a measurable set , a basis distribution , and a positive real number α, a Dirichlet process is a random process whose realization is a probability distribution on . For any measurable finite partition of , , if , then
where stands for the Dirichlet distribution. A DP is specified by the basis distribution and the concentration parameter α.
A DP can also be regarded as an infinite dimensional generalization of the n-dimensional Dirichlet distribution. A DP is the conjugate priori of an infinite, nonparametric discrete distribution. An important application of DP is to use it as a prior distribution for infinite mixture models. A statistically equivalent description of DP is based on the stick-breaking process, described as follows. Given a discrete distribution
where is an indicator function, namely
, the probability distribution G defined in this way is said to obey the Dirichlet process, denoted as . A DP can be constructed by a stochastic process. It possesses some advantages in its application in Bayesian model evaluation, density estimation, and mixed model clustering [20].
4.2. Dirichlet Process Mixture
We consider a set of i.i.d. sample data as a part of an infinitely exchangeable sequence. follows the probability distribution with parameter , that is, . It may be assumed that the prior distribution of is an unknown random probability measure G, and G can be constructed by DP, that is, , . Thus, a Dirichlet process can be obtained using the hybrid (DPM) model definition:
where and are the basis distribution and model parameter, respectively. In general, the distributions F and will depend on additional hyperparameters not mentioned above. Since the trace of DP is discretized with probability 1, the above model can be regarded as a countably infinite mixture. We can integrate the distribution G in the above DPM to obtain the prior distribution of :
where (, ) represents the point quality at . This is a mixture of two distributions with weights and .
Let be a family of density functions with parameter . We assume that the density function of the observed data follows the probability distribution family defined by
which is called the DPM density of , where is the distribution of . is the nonparametric family of mixed distributions. The distribution G can be made random by assuming that G comes from a DP. In practice, to obtain the DPM density functions of , it is necessary to introduce the latent variable related to . That is, after introducing , the joint density function of is , where is a sample from the distribution G. The joint density function of can be expressed as . We assign the prior distribution to the distribution G with . The Bayesian model is set up as follows:
where is a set of observations and is a set of latent variables. Based on the discretization of DP, we can obtain the DPM density function of by
A semiparametric model can be set up by replacing the DP mixture on with the DP mixture on any subset of (given by (9)), depending on . Let with a prior distribution . The above DP mixture is only performed on the parameter . The joint density function of can be obtained. The prior distribution of and G are prespecified. As a result, the semiparametric model can be determined [21].
5. Formulation of the Semiparametric Autoregressive Model
5.1. The Partial Dirichlet Process Mixture of Stochastic Process
The stochastic process in (1) is semiparameterized to generalize the general linear mixed model for longitudinal data. A nonparametric DP prior is assigned to the covariance parameter of . To reduce the number of unknown parameters, a partial Dirichlet process mixture is performed on , i.e., only a nonparametric DP prior is assigned to the parameter . The semiparametric process is created as follows. First, we consider the OU process associated with object i to have a covariance matrix , where . The -element of the matrix is given by . The matrix associated with the random process has the following form:
Second, we give the parameter a nonparametric DP prior:
where is a known probability distribution, is a constant, and the probability distribution G is generated by DP satisfying
where is the point quality of , and G is a random probability distribution.
Using the discretization of DP, for any parameter , we assume that is a density function depending on the parameter . Let , . The DPM density function can be obtained:
where . After embedding the above conditional prior into the distribution of the random process , we formulate the DPM model for as
Using the discretization of the distribution function G, we obtain the DPM density of the random process as follows.
This is an infinitely countable mixture of multivariate Gaussian density functions, where , represents a multivariate Gaussian density function with a zero-mean vector and a covariance matrix of . It can be seen that after the semiparametric treatment of the stochastic process , its distribution is a mixture of stochastic processes, which is a more general mixture of OU processes.
Since G is discretized with probability 1, it provides an automatic clustering effect on the autocorrelation structure of the objects. After the OU process is semiparameterized, the covariance between any two time points , can be obtained by
After semiparameterization of the OU process , it not only contains an AR structure but also has an automatic clustering effect. If the observations from model (1) are obtained at equal interval time points, the covariance matrix structure of becomes AR(1).
5.2. The Framework of a Hierarchical Model
We introduce potential parameters (corresponding to n objects) and semiparameterize the general linear mixed model (1) with observations satisfying the following model:
This is converted to a hierarchical model as follows:
where “ind” means independent only, and are independent of each other, and , , , and are the prior distributions of parameters , , , and , respectively. If is a scalar quantity corresponding to the random intercept of the model, we have . This model generalizes the exponential covariance function of the OU process. It realizes the automatic clustering effect between objects through parameter . We call model (18) the Bayesian semiparametric autoregressive model, or simply the SPAR (semiparametric autoregressive) model.
Note that the SPAR model (18) is a parallel version of Quintana et al.’s [4] with additional conditions that the random effects have a common variance component , and the autocorrelation parameters are assumed to be conditional i.i.d. with . This is different from model (3) in Quintana et al. [4], which assumes that the random-effect parameters are conditional i.i.d. with , where with in our notation. The simpler assumptions help simplify the posterior inference. Quintana et al.’s [4] model assumptions do not lead to explicit posterior distributions of model parameters. Their posterior inference on model parameters may have to be performed by nonparametric MCMC algorithms. Because no explicit expressions related to posterior inference on model parameters are given by [4], we are not able to conclude if Quintana et al.’s [4] Bayesian posterior inference is a complete semiparametric MCMC method or a combination of semiparametric and nonparametric MCMC methods. By imposing simpler assumptions on the parameters in model (18), we are able to obtain the explicit posterior distributions of all model parameters in the subsequent context and conclude that our approach to handling the SPAR model (18) is a complete semiparametric MCMC method.
In the SPAR model (18), the random process part is an OU process mixture. The correlation matrix possesses the following form:
Using the property of the OU process structure, can be analyzed backwards. The inverse matrix of is a tridiagonal. Denote by (). We have the -th element of given by
So, can be rewritten as
Using the correlation theory of the anticorrelation random variables [22], we can compute the elements of the inverse matrix of as follows:
turns out to be a tridiagonal matrix.
The random process corresponding to the i-th object satisfies the conditional distribution:
So, the conditional density function is
When the inverse matrix of the covariance matrix of the random process has the above tridiagonal form, based on the property of the multivariate normal distribution, the conditional density function of can be decomposed into the product of univariate Gaussian density functions as follows:
It is known that . Hence,
As a result, the conditional density function of in the random process can be decomposed as follows:
The above decomposition greatly simplifies the computation of the distribution of the random process , which can be obtained by direct computation of the distribution of a single Gaussian variable.
6. The Marginal Likelihood, Prior Determination, and Posteriori Inference
Let , , , and . Denote by , which represents the parameter vector in model (18). A random process that specifies the AR structure in the SPAR model (18) is
The marginal distribution of the covariance parameter can be obtained from the joint distribution of all terms in (26) as follows:
for . We assume that is a sample from the multivariate Gaussian distribution. It can be regarded as a part of an exchangeable sequence. By using the exchangeable property, the corresponding order i of the observation can be considered as the last one in all n observations from n objects. Then, is the corresponding vector . Or we can consider it as the last one that gives us . The conditional prior score of is
where represents all other ’s except . According to the product formula, we have
Based on the above conditional prior distributions, we can obtain the prior distribution of parameter . Compute the likelihood function of the SPAR model (18) as follows:
Let
This implies that the prior distributions of each parameter are independent of each other. The joint posterior of the SPAR model is computed as follows:
That is, the joint posterior distribution is the product of the likelihood function and the prior distribution. The first term of the likelihood function is the density function of the estimated n-dimensional Gaussian distribution at . The second term is in , the density function of the estimated q-dimensional Gaussian distribution . The third term is the product of the univariate Gaussian distribution density functions.
To estimate the joint posterior distribution of the SPAR model (18), it is necessary to use the Bayesian theorem to obtain the conditional distribution of parameters in the model:
The MCMC algorithm is employed to estimate these conditional distributions. The conditional distribution of a parameters is denoted by in the subsequent context.
7. A Monte Carlo Study
7.1. Simulation Design
To verify that the SPAR model (18) can effectively simulate the correlation structure in the longitudinal data, the empirical sample data were generated under the four situations of zero mean and covariance structure being compound symmetric (CS), autoregressive (AR), mixed CS and AR, and nonstructured, respectively. The MCMC method was employed to estimate the covariance matrix and the correlation matrix in the four different cases, respectively, and compared with the traditional Bayesian inverse-Wishart estimation method.
Consider a brief form of the SPAR model (18):
where represents a fixed intercept, e is an vector of ones, is a random intercept, and is an OU process corresponding to . Convert the above model into a hierarchical model:
For the special case of balanced sample design with () and for , the joint posterior distribution of model (18) can be easily computed by
To realize the Bayesian inference of the model, it is necessary to use the Bayesian theorem to obtain the conditional distribution of each parameter. The lengthy derivations of all conditional probability distributions are given in the Appendix A at the end of the paper.
Performing a posteriori inference on the covariance matrix () is equivalent to estimating the posteriori mean of :
where . The posteriori estimate for the correlation matrix can be obtained by using to calculate the estimated of the correlation matrix :
The mean square error loss function is used to evaluate the performance of the posterior mean:
Compared with the Bayesian estimates of the covariance matrix and correlation matrix under other loss functions [23], the most common one is the entropy loss function defined by
The quadratic loss function is given by
The Bayesian estimates of the covariance matrices based on the loss functions and are given by
respectively, where vec stands for the column vectorization of a matrix, and “⊗” denotes the Kronecker product of matrices. Similarly, the Bayesian estimate of correlation moment under various loss functions can be obtained.
7.2. Simulation Specification and Display of Empirical Results
In the simulation, we firstly set up the four different covariance structures as in the following Equations (41) and the priors as given in the following Equations (43)–(46). Then, we run the MCMC training trial 2000 times. After seeing the convergence trend approach relative stability after 2000 training trials, we generated 20 datasets consisting of 100 sample points of length 6 for each object. This is equivalent to the longitudinal data structure in Table 1, with , , and for each generated longitudinal dataset.
The four covariance matrices are designed as follows:
The root mean square error (RMSE) for estimating is computed by
The RMSE for estimating the correlation matrix is computed similarly.
For each of the four different covariance structures, we used the R-package (called Pandas, available upon request) to perform a preliminary analysis on the simulated data with a specified prior distribution, respectively. Then, we employed the MCMC algorithm to perform the sampling estimation on each parameter in the model. We used the three methods mentioned before to perform the convergence assessment on the sampling results. Finally, we obtained the estimates for the four types of covariance and correlation matrices. The estimation error was computed based on three types of loss functions. The inverse-Wishart estimation error was also obtained for comparison.
For the case of the CS covariance structure, the prior distribution of the parameter in the model is specified as follows:
Each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations were taken to draw the posterior sampling trend diagram, as shown in Figure 4.
Figure 4.
Sampling results under the CS structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed for variables in the simulation), each pair consisting of a DP generated from model (18) with the CS structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (43)). The orange line stands for sampling distribution under the SPAR model (18), and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Each graph in the right column shows the relationship between each estimated parameter (the ordinate) versus the number of random samples in the abscissa, ranging from 1 to 1000. Each graph in the left column presents the kernel-estimated density function of the parameter from the last 1000 samples.
The negative ELBO loss histogram and energy graph estimated by the model with the CS structure are shown in Figure 5 and Figure 6, respectively, where the energy graph in Figure 6 was generated by the Python package PyMC3 (https://www.pymc.io/projects/docs/en/v3/pymc-examples/examples/getting_started.html) (accessed on 26 April 2023), which displays two simulated density curves: the blue one stands for the energy value at each MCMC sample point subtracted by the average energy (like conducting data centerization); the green one stands for the difference of the energy function (like deriving the derivative of a differential function). A normal energy distribution from an MCMC sampling indicates the sampling process tends to a stable point. It implies convergence of the MCMC sampling. More details on energy computation in MCMC sampling by PyMC3 can be found on this website (https://www.pymc.io/projects/docs/en/v3/api/inference.html#module-pymc3.sampling). Figure 5 and Figure 6 incorporate both popular methods for evaluating the convergence of the MCMC sampling in our Monte Carlo study. All energy graphs in the subsequent context have the same interpretation as they do here. We skip the tedious interpretations for all other energy graphs to save space.
Figure 5.
Negative ELBO loss histogram: in Figure 5, the horizontal axis stands for the number of iterations in the MCMC sampling with size , the vertical axis for the negative ELBO loss.
Figure 6.
Energy graph: in Figure 6, the estimated distribution of energy is based on 1000 samples with size .
As can be seen from Figure 5 and Figure 6, after several iterations of the MCMC algorithm, the negative ELBO loss is stable between 0 and 25, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the sampling distribution and the trend graph, we can conclude that the MCMC algorithm is convergent.
For the case of the AR covariance structure, the prior distribution of each parameter in the model is specified as follows:
Each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations were taken to draw the posterior sampling trend diagram, as shown in Figure 7. Note that the DP-estimated density curves show different central locations from those in Figure 4, because they are generated from different prior distributions with different covariance structures (see the difference between Equations (43) and (44)).
Figure 7.
Sampling results under the AR structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed variables in the simulation), each pair consisting of a DP generated from model (18) with the AR structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (44)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 7 have the same structure in both axes for the two columns of graphs.
The histogram of negative ELBO loss and the energy graph for when the AR structure is estimated are shown in Figure 8 and Figure 9, respectively. They show that the histogram of the negative ELBO loss tends to be stable after several iterations, which is basically below 50, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the posteriori sampling and the trend graph, it can be concluded that the algorithm converges quickly.
Figure 8.
Negative ELBO loss histogram: in Figure 8, the horizontal axis stands for the number of iterations in the MCMC sampling with size , the vertical axis for the negative ELBO loss.
Figure 9.
Energy graph: In Figure 9, the estimated distribution of energy is based on 1000 samples with size .
For the covariance of the mixed structure of CS and AR, the prior distribution of each parameter in the model is specified as follows:
Each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations were taken to draw the posterior sampling trend graph, as shown in Figure 10. The negative ELBO loss histogram and the model energy graph are shown in Figure 11 and Figure 12, respectively.
Figure 10.
Sampling results under the mixed structure of CS and AR: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed variables in the simulation), each pair consisting of a DP generated from model (18) with the mixed structure of CS and AR and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (45)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 10 have the same structure in both axes for the two columns of graphs.
Figure 11.
Negative ELBO loss histogram: in Figure 11, the horizontal axis stands for the number of iterations in the MCMC sampling with size , the vertical axis for the negative ELBO loss.
Figure 12.
Energy graph: in Figure 12, the estimated distribution of energy is based on 1000 samples with size .
As can be seen from the histogram of the negative ELBO loss in Figure 11, the negative ELBO loss is basically under control between 0 and 30, after several iterations of the algorithm, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the posteriori sampling and the trend graph, we can conclude that the algorithm is convergent.
For the case of independent structure covariance, the prior distributions of the parameters in the model are specified as follows:
The MCMC method was used to sample and estimate each parameter in the model. The last 1000 iterations were taken to draw the posterior sampling trend graph shown in Figure 13.

Figure 13.
Sampling results under the independent structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed variables in the simulation), each pair consisting of a DP generated from model (18) with the independent structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (46)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 13 have the same structure in both axes for the two columns of graphs.
The negative ELBO loss histogram and the model energy graph are shown in Figure 14 and Figure 15, respectively. It can be seen that the negative ELBO loss approaches 0 quickly. Based on the posterior sampling and the trend diagram, we can conclude that the algorithm is convergent.
Figure 14.
Negative ELBO loss histogram: in Figure 14, the horizontal axis stands for the number of iterations in the MCMC sampling with size , the vertical axis for the negative ELBO loss.
Figure 15.
Energy graph: in Figure 15, the estimated distribution of energy is based on 1000 samples with size .
The above model is applied to the estimation of covariance matrices, correlation matrices, and model errors. We compare it with with the inverse-Wishart method. The outcomes of the estimation errors are shown in Table 2:
Table 2.
Estimated RMSE, , and based on covariance matrix .
Similarly, the estimation results of the four corresponding correlation matrices are shown in Table 3:
Table 3.
Estimated RMSE, , and based on correlation matrix .
In Table 2 and Table 3, the covariance models are compared with each other based on three types of loss functions. In estimating the four covariance structures, the SPAR model performs better than the traditional inverse-Wishart method for each covariance structure. The estimation error of the SPAR model is much smaller than that of the inverse-Wishart method. When estimating the correlation matrix, except for the relatively poor SPAR performance under the strict AR structure, all other models perform roughly the same based on the quadratic loss function . Based on the comparison of the estimation errors in Table 2 and Table 3, the SPAR model shows fairly good performance in estimating the covariance matrix and correlation matrix.
7.3. Analysis of a Real Wind Speed Dataset
To verify the effectiveness of the SPAR model built in this paper in practical application, we employ the Hongming data, which contain the ground meteorological data of Dingxin Station, Jinta County, Jiuquan City, Gansu Province, China. We use the SPAR model and the MCMC method to estimate the covariance matrix and correlation matrix under four different covariance structures (CS, AR, the mixture of CS and AR, and unstructured) and compare the estimates with those from the traditional Bayesian estimation by the inverse-Wishart method. The covariance structure of the four cases is shown in Equation (41), and the mean square error is shown in Equation (42). The real data are arranged into a longitudinal data structure, as shown in Table 1 with , , and . The following graphs are plotted in the same way as in the corresponding graphs in the Monte Carlo study in Section 7.2 but with real data input in the same Python program.
For the case of CS covariance structure, each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations are taken to draw the posterior sampling trend diagram, as shown in Figure 16.
Figure 16.
Sampling results under the CS structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed variables in the simulation), each pair consisting of a DP generated from model (18) with the CS structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (43)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 16 have the same structure in both axes for the two columns of graphs.
The negative ELBO loss histogram and the model energy graph are shown in Figure 17 and Figure 18, respectively. It can be seen that the negative ELBO loss approaches 0 quickly, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the posterior sampling and the trend diagram, we can conclude that the algorithm is convergent.
Figure 17.
Negative ELBO loss histogram: in Figure 17, the horizontal axis stands for the number of iterations in the MCMC algorithm with size , the vertical axis for the negative ELBO loss.
Figure 18.
Energy graph: In Figure 18, the estimated distribution of energy is based on 1000 samples with size .
For the case of the AR covariance structure, each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations are taken to draw the posterior sampling trend diagram, as shown in Figure 19:

Figure 19.
Sampling results under the AR structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed variables in the simulation), each pair consisting of a DP generated from model (18) with the AR structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (44)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 19 have the same structure in both axes for the two columns of graphs.
The negative ELBO loss histogram and the energy graph estimated by the model under the AR structure are shown in Figure 20 and Figure 21, respectively. It shows that the histogram of the negative ELBO loss approaches 0 quickly. Based on the posteriori sampling and the trend graph, it can be concluded that the algorithm converges quickly.
Figure 20.
Negative ELBO loss histogram: in Figure 20, the horizontal axis stands for the number of iterations in the MCMC algorithm with size , the vertical axis for the negative ELBO loss.
Figure 21.
Energy graph: in Figure 21, the estimated distribution of energy is based on 1000 samples with size .
For the covariance of the mixed structure of CS and AR, each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations are taken to draw the posterior sampling trend graph, as shown in Figure 22. The negative ELBO loss histogram and the model energy graph are shown in Figure 23 and Figure 24, respectively.
Figure 22.
Sampling results under the mixed structure of CS and AR: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed variables in the simulation), each pair consisting of a DP generated from model (18) with the mixed structure of CS and AR and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (45)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 22 have the same structure in both axes for the two columns of graphs.
Figure 23.
Negative ELBO loss histogram: in Figure 23,the horizontal axis stands for the number of iterations in the MCMC algorithm with size , the vertical axis for the negative ELBO loss.
Figure 24.
Energy graph: in Figure 24, the estimated distribution of energy is based on 1000 samples with size .
As can be seen from the histogram of the negative ELBO loss in Figure 23, the negative ELBO loss approaches 0 quickly after several iterations of the algorithm. Based on the posteriori sampling and the trend graph, we can conclude that the algorithm is convergent.
For the case of independent structure covariance, the MCMC method is used to sample and estimate each parameter in the model. The last 1000 iterations are taken to draw the posterior sampling trend graph shown in Figure 25.

Figure 25.
Sampling results under an independent structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed variables in the simulation), each pair consisting of a DP generated from model (18) with the independent structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (46)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 25 have the same structure in both axes for the two columns of graphs.
The negative ELBO loss histogram and the model energy graph are shown in Figure 26 and Figure 27, respectively. It can be seen that the negative ELBO loss approaches 0 quickly. Based on the posterior sampling and trend diagram, we can conclude that the algorithm is convergent.
Figure 26.
Negative ELBO loss histogram: in Figure 26, the horizontal axis stands for the number of iterations in the MCMC algorithm with size , the vertical axis for the negative ELBO loss.
Figure 27.
Energy graph: in Figure 27, the estimated distribution of energy is based on 1000 samples with size .
In Table 4 and Table 5, the covariance models are compared with each other based on three types of loss functions, RMSE, , and . When using the four covariance structures based on either the covariance matrix or the correlation matrix, the SPAR model always performs better than the traditional inverse-Wishart method—it always has a smaller value of RMSE, , or when comparing the SPAR model with the Inv-W model under the same covariance structure C1, C2, C3, or C4 in Table 4 and Table 5.
Table 4.
Estimated RMSE, , and based on covariance matrix .
Table 5.
Estimated RMSE, , and based on correlation matrix .
8. Concluding Remarks
The SPAR model (18) provides an explicitly complete semiparametric solution to the estimation of model parameters through the MCMC algorithm. Compared with the model formulation in Quintana et al. [4], the MCMC algorithm for posterior inference on model (18) is easier to implement and may converge faster because of the explicit simple posterior distributions of the model parameters. An effective and fast-converging MCMC algorithm plays an important role in Bayesian statistical inference. The SPAR model (18) gains some trade-off in easy implementation and fast convergence in the MCMC algorithm by imposing simpler assumptions on model parameters.
With regard to the option of choosing the initial values for estimating model parameters by the MCMC algorithm, we recommend using a numerical optimization method such as the maximum posteriori (MAP) estimation to obtain an estimator as the initial value of a parameter. It is likely to speed up the convergence speed of the sampling parameter. We employ the Gibbs sampling algorithm when estimating the parameters in the model. The convergence diagnosis of Markov chains generated by the MCMC algorithm is assessed by the posterior sampling trend plot, negative ELBO histogram, and the energy graph, which show the observation of fast convergence of the MCMC sampling process. By applying the SPAR model to four different covariance structures through both Monte Carlo study and a real dataset, we illustrate its effectiveness in handling nonstationary forms of covariance structures and its domination over the traditional inverse-Wishart method.
It should be pointed out that the effectiveness and fast convergence of the MCMC algorithm depend on both model assumption and the priors of model parameters. Our Monte Carlo study was carried out by choosing normal priors for the model parameters and the inverse Gamma distribution for the variance components. This choice led to the easy implementation of the MCMC algorithm. It will be an interesting future research direction to develop some meaningful criteria for model and algorithm comparison in the area of Bayesian nonparametric longitudinal data analysis. The main purpose of our paper is to give an easily implementable approach to this area with a practical illustration. We can conclude that the complete semiparametric approach to Bayesian longitudinal data analysis in this paper is a significant complement to the area studied by some influential peers, such as Mukhopadhyay and Gelfand (1997, [21]), Quintana et al. (2016, [4]), and others.
Author Contributions
G.J. developed the Bayesian semiparametric method for longitudinal data. J.L. helped validate the methodology and finalize the manuscript. F.W. conducted major data analysis. X.C. conducted initial research in the Bayesian semiparametric modeling study on longitudinal data analysis under the guidance of G.J. in her master’s thesis [24]. H.L. helped with data analysis. J.C. found data and reference resources. S.C. and F.Z. finished the simulation study and also helped with data analysis and edited all figures. G.J. and J.J. wrote the initial draft. G.J. and F.W. completed the final writing—review and editing. All authors have read and agreed to the final version of the manuscript.
Funding
The research was supported by National Natural Science Foundation of China Under Grant No.41271 038, Jiajuan Liang’s research is supported by a UIC New Faculty Start-up Research Fund R72021106, and in part by the Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College (UIC), project code 2022B1212010006.
Data Availability Statement
The real data and Python code presented in the present study are available on request from the corresponding author.
Conflicts of Interest
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Appendix A. Derivations of Conditional Probability Distributions in Section 7.1
The posterior distribution of is actually an inverse gamma distribution:
The posterior distribution of is obtained as follows:
which can be expressed as follows:
where is the probability density function of the base distribution , b is the constant satisfying the equation , and is the marginal distribution of the parameter based on the prior and the variable .
References
- Pullenayegum, E.M.; Lim, L.S. Longitudinal data subject to irregular observation: A review of methods with a focus on visit processes, assumptions, and study design. Statist. Methods Med. Res. 2016, 25, 2992–3014. [Google Scholar] [CrossRef] [PubMed]
- Chakraborty, R.; Banerjee, M.; Vemuri, B.C. Statistics on the space of trajectories for longitudinal data analysis. In Proceedings of the 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Melbourne, VIC, Australia, 18–21 April 2017; pp. 999–1002. [Google Scholar]
- Xiang, D.; Qiu, P.; Pu, X. Nonparametric regession analysis of multivariate longitudinal data. Stat. Sin. 2013, 23, 769–789. [Google Scholar]
- Quintana, F.A.; Johnson, W.O.; Waetjen, L.E.; BGold, E. Bayesian nonparametric longitudinal data analysis. J. Am. Statist. Assoc. 2016, 111, 1168–1181. [Google Scholar] [CrossRef]
- Cheng, L.; Ramchandran, S.; Vatanen, T.; Lietzén, N.; Lahesmaa, R.; Vehtari, A.; Lähdesmäki, H. An additive Gaussian process regression model for interpretable non-parametric analysis of longitudinal data. Nat. Commun. 2019, 10, 1798. [Google Scholar] [CrossRef]
- Kleinman, K.P.; Ibrahim, J.G. A semiparametric Bayesian approach to the random effects model. Biometrics 1998, 54, 921–938. [Google Scholar] [CrossRef] [PubMed]
- Gao, F.; Zeng, D.; Couper, D.; Lin, D.Y. Semiparametric regression analysis of multiple right- and interval-censored events. J. Am. Statist. Assoc. 2019, 114, 1232–1240. [Google Scholar] [CrossRef] [PubMed]
- Lee, Y. Semiparametric regression. J. Am. Statist. Assoc. 2006, 101, 1722–1723. [Google Scholar] [CrossRef]
- Sun, Y.; Sun, L.; Zhou, J. Profile local linear estimation of generalized semiparametric regression model for longitudinal data. Lifetime Data Anal. 2013, 19, 317–349. [Google Scholar] [CrossRef] [PubMed]
- Zeger, S.L.; Diggle, P.J. Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 1994, 50, 689–699. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Lin, X.; Müller, P. Bayesian inference in semiparametric mixed models for longitudinal data. Biometrics 2010, 66, 70–78. [Google Scholar] [CrossRef] [PubMed]
- Hunag, Y.X. Quantile regression-based Bayesian semiparametric mixed-effects models for longitudinal data with non-normal, missing and mismeasured covariate. J. Statist. Comput. Simul. 2016, 86, 1183–1202. [Google Scholar] [CrossRef]
- Li, J.; Zhou, J.; Zhang, B.; Li, X.R. Estimation of high dimensional covariance matrices by shrinkage algorithms. In Proceedings of the 2017 20th International Conference on Information Fusion (Fusion), Xi’an, China, 10–13 July 2017; pp. 955–962. [Google Scholar]
- Doss, H.; Park, Y. An MCMC approach to empirical Bayes inference and Bayesian sensitivity analysis via empirical processes. Ann. Statist. 2018, 46, 1630–1663. [Google Scholar] [CrossRef]
- Neal, R.M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 2000, 9, 249–265. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-encoding variational Bayes. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Csiszar, I. I-Divergence geometry of probability distributions and minimization problems. Ann. Probab. 1975, 3, 146–158. [Google Scholar] [CrossRef]
- Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; pp. 63–71. [Google Scholar]
- Barndorff-Nielsen, O.E.; Shephard, N. Non-Gaussian Ornstein–Uhlenbeck-based models and some of their uses in financial economics. J. Roy. Statist. Soc. (Ser. B) 2001, 63, 167–241. [Google Scholar] [CrossRef]
- Griffin, J.E.; Steel, M.F.J. Order-based dependent Dirichlet processes. J. Am. Statist. Assoc. 2006, 101, 179–194. [Google Scholar] [CrossRef]
- Mukhopadhyay, S.; Gelfand, A.E. Dirichlet process mixed generalized linear models. J. Am. Statist. Assoc. 1997, 92, 633–639. [Google Scholar] [CrossRef]
- Zimmerman, D.L.; Núñez-Antón, V.A. Antedependence Models for Longitudinal Data; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
- Hsu, C.W.; Sinay, M.S.; Hsu, J.S. Bayesian estimation of a covariance matrix with flexible prior specification. Ann. Inst. Statist. Math. 2012, 64, 319–342. [Google Scholar] [CrossRef]
- Chen, X. Longitudinal Data Analysis Based on Bayesian Semiparametric Method. Master’s Thesis, Lanzhou University, Lanzhou, China, 2019. (In Chinese). [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).