Longitudinal Data Analysis Based on Bayesian Semiparametric Method

Jiao, Guimei; Liang, Jiajuan; Wang, Fanjuan; Chen, Xiaoli; Chen, Shaokang; Li, Hao; Jin, Jing; Cai, Jiali; Zhang, Fangjie

doi:10.3390/axioms12050431

Open AccessArticle

Longitudinal Data Analysis Based on Bayesian Semiparametric Method

by

Guimei Jiao

^1,*,†

,

Jiajuan Liang

^2,3,*,†,

Fanjuan Wang

^1,†,

Xiaoli Chen

^1,†,

Shaokang Chen

¹,

Hao Li

¹,

Jing Jin

¹,

Jiali Cai

¹ and

Fangjie Zhang

¹

School of Mathematics and Statistics, Lanzhou University, Lanzhou 730000, China

²

Department of Statistics and Data Science, BNU-HKBU United International College, Zhuhai 519087, China

³

Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, Zhuhai 519087, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Axioms 2023, 12(5), 431; https://doi.org/10.3390/axioms12050431

Submission received: 31 December 2022 / Revised: 23 April 2023 / Accepted: 24 April 2023 / Published: 27 April 2023

(This article belongs to the Special Issue Computational Statistics & Data Analysis)

Download

Browse Figures

Review Reports Versions Notes

Abstract

A Bayesian semiparametric model framework is proposed to analyze multivariate longitudinal data. The new framework leads to simple explicit posterior distributions of model parameters. It results in easy implementation of the MCMC algorithm for estimation of model parameters and demonstrates fast convergence. The proposed model framework associated with the MCMC algorithm is validated by four covariance structures and a real-life dataset. A simple Monte Carlo study of the model under four covariance structures and an analysis of the real dataset show that the new model framework and its associated Bayesian posterior inferential method through the MCMC algorithm perform fairly well in the sense of easy implementation, fast convergence, and smaller root mean square errors compared with the same model without the specified autoregression structure.

Keywords:

Bayesian semiparametric method; covariance structure; Dirichlet process; linear mixed model; longitudinal data; MCMC algorithm

MSC:

62F10; 62F15; 62F40; 65C40

1. Introduction

Longitudinal data arise from repeated observations from the same individual or group of individuals at different time points. The structure of longitudinal data is shown in the following Table 1. The basic tasks of analyzing longitudinal data can be summarized as (1) studying the trend in the covariance structure of the observed variables with respect to time; (2) discovering the influence of covariates on the observable variables; and (3) determining the within-group correlation structures [1].

Longitudinal data are often highly unbalanced. It is usually difficult to apply traditional multiple regression techniques to analyze highly unbalanced data directly. Statisticians developed various Bayesian statistical inference models for longitudinal data analysis [2]. A parametric assumption may result in modeling bias and may relax the assumption about parametric structure. Nonparametric methods have the characteristics of robustness because they do not require model assumptions. Semiparametric models integrate the characteristics of parametric and nonparametric models and have the characteristics of flexibility and ease of interpretation. Nonparametric statistical methods and semiparametric statistical methods are not only hot spots in current statistical research but also widely used in many practical applications. Xiang et al. [3] summarize some common outcomes of nonparametric regression analysis of longitudinal data. Bayesian methods for parametric linear mixed models have been widely used in different areas. Assuming a normal random effect and using the standard Gibbs sampling to realize some simple posterior inference, Quintana et al. [4] extend the general class of mixed models for longitudinal data by generalizing the GP part to the nonparametric case. The GP is a probabilistic approach to learning nonparametric models. Cheng et al. [5] propose the so-called LonGP, which is a flexible and interpretable nonparametric modeling framework. It provides a versatile software implementation that can solve commonly faced challenges in longitudinal data analysis. It also develops a fully Bayesian, predictive inference for LonGP, which can be employed to carry out model selection. Kleinman and Ibrahim [6] relax the normal assumption by assuming a Dirichlet process prior with a Gaussian measure of zero mean, semiparametrically modeling the random effects distribution [7]. Although a parametric model may have limitations in some applications, it is simpler than a nonparametric model whose scope may be too wide to draw a concise conclusion. Nonparametric regression has a fatal weakness, known as the “curse of dimensionality”, which refers to the fact that when the independent variable X is multivariate, the estimation accuracy of the regression function becomes very poor as the dimension of X increases. To solve this problem, the semiparametric model is a good compromise maintaining some excellent characteristics of parametric and nonparametric models [8]. There is a lot of literature on the semiparametric modeling of longitudinal data. Most of the existing literature employs random effects to model within-group correlations [9].

Semiparametric linear mixed models generalize traditional linear mixed models by modeling a covariate effect with a nonparametric function and parametrically modeling other covariate effects. Semiparametric linear mixed models mainly use frequentist methods for statistical inference and assume normal random effects [10]. In this paper, we employ a Bayesian semiparametric framework for linear mixed models given by Quintana et al. [4] by imposing stronger conditions on the random effect to obtain an explicit solution to posterior distributions of the model parameters and fast convergence of the MCMC algorithm in the posterior inference of model parameters. The proposed framework generalizes the default priori of variance components and adjusts the inference of fixed effects associated with nonparametric random effects. The latter includes the extrapolation of nonparametric mean functions over time [11,12]. The stochastic process approach in [4] is a good choice for characterizing the intragroup correlation. The Gaussian process (GP) covariance has an exponential form and is uniquely determined by two parameters. GP can specify autoregressive correlation (the AR covariance structure, [13]). A nonparametric Dirichlet process (DP) priori assigned to the covariance parameter results in an Ornstein–Uhlenbeck process (OUP). A partial Dirichlet process mixture (DPM) is performed on the OUP. By imposing stronger conditions on the random effect and decomposing the OUP into some single Gaussian variables, we can substantially simplify the MCMC sampling in the posterior inference. The framework of our proposed semiparametric autoregression model (SPAR) is demonstrated in Figure 1.

This paper is organized as follows. Section 2 briefly introduces some basic theoretical background and principles. Section 3 introduces the model setup using a partial Dirichlet process mixture in terms of the OUP, and the Bayesian semiparametric autoregressive model is proposed with a recommended solution. In Section 4, the Dirichlet process and Dirichlet process mixture (DPM) are simply introduced. The formulation of the semiparametric autoregressive model is given in Section 5. Section 6 derives the marginal likelihood, prior determination, and posteriori inference. Section 7 gives a simple Monte Carlo study and a real dataset application based on the proposed model. Some concluding remarks are given in the last section.

2. Theoretical Basis

2.1. A General Linear Hybrid Model Containing an AR Structure

For an observation object

i (i = 1, 2, \dots, n)

, we denote the observation time points by

\{t_{i 1}, t_{i 2}, \dots, t_{i n_{i}}\}

. Let

y_{i} = {(y_{i 1}, y_{i 2}, \dots, y_{i n_{i}})}^{'}

(

n_{i} \times 1

) be the observation vector from object i. At time point

t_{i j}

, we consider the possible time-dependent covariate vector

x_{i j}^{'} = (1, x_{i 1} (t_{i j}), x_{i 2} (t_{i j}), \dots, x_{i p} (t_{i j}))

. Let

y_{i} = {(y_{i 1}, y_{i 2}, \dots, y_{i n_{i}})}^{'}

(

n_{i} \times 1

) be the observation vector from object i. At time point

t_{i j}

, we consider the possible time-dependent covariate vector

x_{i j}^{'} = (1, x_{i 1} (t_{i j}), x_{i 2} (t_{i j}), \dots, x_{i p} (t_{i j}))

. Let

E (y_{i j}) = x_{i j}^{'} β

. Define the

n_{i} \times (p + 1)

-dimensional fixed-effect design matrix

X_{i} = {(x_{i 1}, x_{i 2}, \dots, x_{i n_{i}})}^{'}

. Assume

E (y_{i}) = X_{i} β

. Define the corresponding

n_{i} \times q (q \leq p)

-dimensional random effects design matrix

Z_{i} = {(z_{i 1}, z_{i 2}, \dots, z_{i n_{i}})}^{'}

, where

z_{i j}^{'} = (1, z_{i 1} (t_{i j}), z_{i 2} (t_{i j}), \dots, z_{i q} (t_{i j}))

. The general linear mixed model containing an AR covariance structure is

\begin{matrix} y_{i} & = X_{i} β + Z_{i} b_{i} + S_{i} + ϵ_{i}, \\ b_{i} | τ & \sim N (0, D (τ)), \end{matrix} \begin{matrix} S_{i} | θ & \sim N (0, C_{i} (θ)), \\ ϵ_{i} & \sim N (0, σ^{2} I_{n_{i}}), \end{matrix}

(1)

where

X_{i} β

stands for the fixed effect and

Z_{i} b_{i}

for the random effect,

S_{i}

is a stochastic process characterizing the correlation among the

n_{i}

observations from object i,

ϵ_{i}

is the error term,

i = 1, 2, \dots, n_{i}

.

C_{i} (θ)

is an

n_{i} \times n_{i}

structural covariance matrix, and vectors

τ

and

θ

contain the variance–covariance parameters of the random vector

b_{i}

and the stochastic process

S_{i}

, respectively.

I_{n_{i}}

stands for the identity matrix of dimension

n_{i} \times n_{i}

. The AR structure is usually specified by a GP with zero mean. The GP is uniquely determined by a covariance function containing parameter

θ = (σ_{s}, ρ)

. The vector

S_{i}

is a GP corresponding to the i-th object.

S_{i}

is generated sequentially by the GP

{s_{i} (t) : t > 0}

at the observation time points

{t_{i 1}, t_{i 2}, \dots, t_{i n_{i}}}

, i.e.,

S_{i} = (s_{i} (t_{i 1}), s_{i} (t_{i 2}), \dots, s_{i} (t_{i n_{i}})) .

To specify the AR structure, it is assumed that the above GP possesses stationarity:

Cov (s_{i} (t_{i l}), s_{i} (t_{i k})) = σ_{s}^{2} ρ (| t_{i l} - t_{i k} |) .

(2)

When the model has a structured covariance function, the covariance matrix of

y_{i}

is

Cov (y_{i}) = Z_{i} D (τ) Z_{i}^{'} + C_{i} (θ) + σ^{2} I_{n_{i}} .

(3)

2.2. The Principle for Bayesian Inference

The Bayesian method assumes a piori distribution on the unknown parameter

θ

and a joint distribution

p (x, θ)

between an observable variable X and the parameter

θ

. The Bayesian method is based on the posterior distribution

π (θ | x)

of the unknown parameter

θ

after observed data are available. The joint distribution, the priori distribution, and the posterior distribution are related to each other as follows:

\begin{matrix} p (x, θ) & = π (θ | x) p_{X} (x), \\ p_{X} (x) = \int_{Θ} p (x, θ) & d θ = \int_{Θ} p (x | θ) π (θ) d θ, \end{matrix}

where

p_{X} (x)

stands for the sample marginal density of

x

. We use the posterior distribution

π (θ | x)

to carry out statistical inference for the parameter

θ

:

π (θ | x) = \frac{p (x, θ)}{p_{X} (x)} = \frac{p (x | θ) π (θ)}{\int_{Θ} p (x | θ) π (θ) d θ}

(4)

Equation (4) is the well-known Bayesian formula. Bayesian statistical inference assumes that the posterior distribution of the unknown parameter

θ

,

π (θ | x)

contains all the available information. As a result, the point estimation, interval estimation, hypothesis testing, and predicting inference of

θ

can be implemented as usual. Because

π (θ | x) = \frac{π (θ) p (x | θ)}{\int_{Θ} π (θ) p (x | θ) d θ} \propto π (θ) p (x | θ),

we construct the Bayesian estimate for

θ

after observing data

x

by their conditional expectation:

\hat{θ} = E (θ | x) = \frac{\int_{Θ} π (θ) p (x | θ) θ d θ}{\int_{Θ} π (θ) p (x | θ) d θ} \propto \int_{Θ} π (θ) p (x | θ) θ d θ .

(5)

In general, the Bayesian estimate

\hat{g} (θ)

for a function of

θ

,

g (θ)

can be obtained as follows:

\hat{g} (θ) = E [g (θ) | x] = \frac{\int_{Θ} π (θ) p (x | θ) g (θ) d θ}{\int_{Θ} π (θ) p (x | θ) d θ} \propto \int_{Θ} π (θ) p (x | θ) g (θ) d θ .

Obviously, the estimator

\hat{g} (θ)

is the usual expectation of

g (θ)

with respect to the posterior distribution

π (θ | x)

:

\hat{g} (θ) = \int_{Θ} g (θ) π (θ | x) d θ = E [g (θ) | x] .

(6)

2.3. The MCMC Sampling and Its Convergence

When the posterior distribution

π (θ | x)

in (6) is difficult to compute, estimating multiple parameters given by

g (θ)

can be realized by the Markov Chain Monte Carlo (MCMC) method [14] following these steps: (1) establish a Markov chain, the stationary distribution for

π (θ | x)

; (2) use the Markov chain for the posterior distribution

π (θ | x)

to carry out sampling to obtain an MCMC sample

{θ_{0}, θ_{1}, \dots, θ_{n}}

; and (3) obtain the MCMC estimator

\hat{g} (θ)

for

g (θ)

by

\hat{g} (θ) = \bar{g} (θ) = \frac{1}{n} \sum_{i = 1}^{n} g (θ_{i}) .

The MCMC sample can estimate the parameter function

g (θ)

more and more effectively when the sample size n becomes larger and larger. The MCMC sample

{θ_{0}, θ_{1}, θ_{2}, \dots,}

possesses some properties, such as stability, normal recurrence, periodicity, and irreducibility. The choice of the initial value

θ_{0}

has little impact on the estimate of

θ_{t}

. Gibbs sampling is one of the MCMC algorithms. It is used to construct a multivariate probability distribution of a random sample. The defect of the standard Gibbs sampling is that it cannot process the nonconjugated distribution. Because the prior distribution of parameter

ρ

in the model in this paper is nonconjugated, the improved version of the Gibbs algorithm is adopted [15].

With regard to the convergence of the MCMC sampling, we consider three convergence criteria: (1) Combine the posterior sampling trend graph with the energy graph of the MCMC sampling process for convergence assessment. By observing the posterior sampling trend graph of a random variable, we can determine if the information of the sampling tends to be stable as the number of iterations increases. (2) Compare the overall distribution of energy levels in real data with energy changes between successive samples. If the two distributions are similar to each other, we can conclude that the algorithm converges. (3) Draw the trajectory of the negative evidence lower bound (ELBO, [16]) obtained in the optimization process to verify the convergence of the algorithm. Minimizing the KL (Kullback–Leibler) divergence is equivalent to minimizing the negative ELBO [17,18].

3. The OU Process

The random process

S_{i}

in the general linear mixed model (1) is a Gaussian process (GP). A GP can be regarded as an infinite dimensional extension of the multivariate Gaussian distribution, and its probability characteristics are uniquely determined by the mean function and covariance function. In particular, a GP with zero mean is completely determined by its covariance function [18]. To give a simple review of GP and keep the notation

S_{i}

specially used in model (1), we return to a general notation

X (t)

for GP to avoid a mix-up with the model parameter in model (1).

X (t)

in this section can be considered as a copy of

S_{i}

in model (1). If random process

{X (t), t \in T}

is a GP, any finite observation vector

x = {(X (t_{1}), \dots, X (t_{n}))}^{'}

has a multivariate Gaussian distribution. Let

E [X (t)] = m_{X} (t), v a r [X (t)] = σ_{X}^{2} (t)

. The joint density function of the multivariate Gaussian random vector is

f (x (t_{1}), \dots x (t_{n})) = {(2 π)}^{- \frac{n}{2}} {|C_{X}|}^{- \frac{1}{2}} exp \{- \frac{1}{2} {(x - m_{X})}^{'} C_{X}^{- 1} (x - m_{X})\},

where

x = {(x (t_{1}), \dots, x (t_{n}))}^{'}

, mean vector

m_{X} = {(m_{X} (t_{1}), \dots, m_{X} (t_{n}))}^{'}

, and

C_{X}

stands for the covariance matrix. Let

c_{i j} = cov (X (t_{i}), X (t_{j})) = E [(X (t_{i}) - m_{X} (t_{i})) (X (t_{j}) - m_{X} (t_{j}))]

. When the Gaussian process

{X_{t} = X (t), t \in T}

has a zero mean and it is smooth, we have

c_{i j} = E [X (t_{i}) X (t_{j})]

and

\begin{matrix} E [X (t)] & = 0 \\ var [X (t)] & = σ_{X}^{2} \\ R_{X} (X_{t}, X_{s}) & = E [(X_{t}) (X_{s})] = R_{X} (t - s), t > s, \\ C_{X} (X_{t}, X_{s}) & = σ_{X}^{2} R_{X} (t - s) = C_{X} (t - s), t > s, \end{matrix}

where

R_{X} (X_{t}, X_{s})

and

C_{X} (X_{t}, X_{s})

stand for the GP autocorrelation function and the autocovariance function, respectively, which only depend on the time interval

t - s

. Thus,

R_{X} (t - s) = ρ (| t - s |)

, and

c_{i j} = cov (X (t_{i}), X (t_{j})) = σ_{X}^{2} ρ (| t_{i} - t_{j} |)

. The correlation coefficient between

X (t_{i})

and

X (t_{j})

is denoted by

ρ (| t_{i} - t_{j} |)

. It is assumed that

ρ (| t_{i} - t_{j} |) = ρ^{| t_{i} - t_{j} |}

in this paper. A zero-mean smooth GP

{X (t), t \in T}

is an Ornstein–Uhlenbeck (OU) process [19]. An OU process can be regarded as a continuous analogy of the discrete-time first-order autoregressive AR(1) process.

To understand some properties of the AR(1) process, we express the AR(1) process as

X_{t} = ϕ_{1} X_{t - 1} + ϵ_{t}, t \geq 1, X_{0} = ϵ_{0},

(7)

where

| ϕ_{1} | < 1

is a weight parameter to ensure stability, and

ϵ_{0}, ϵ_{1}, ϵ_{2}, \dots

are uncorrelated random variables satisfying

E (ϵ_{t}) = 0, t \geq 0, V a r (ϵ_{t}) = \{\begin{matrix} \frac{σ^{2}}{1 - ϕ_{1}^{2}}, & t = 0, \\ σ^{2}, & t \geq 1 . \end{matrix} .

It is assumed that the random error satisfies

cov (ϵ_{t}, ϵ_{t + k}) = 0, k \neq 0

. Therefore,

\begin{matrix} cov (X_{t}, X_{t + k}) = cov (\sum_{i = 0}^{t} ϕ^{t - i} ϵ_{i}, \sum_{i = 0}^{t + k} ϕ^{t + k - i} ϵ_{i}) \\ = & \sum_{i = 0}^{t} ϕ^{t - i} ϕ^{t + k - i} cov (ϵ_{i}, ϵ_{i}) \\ = & σ^{2} ϕ_{1}^{2 t + k} (\frac{1}{1 - ϕ_{1}^{2}} + \sum_{i = 1}^{t} ϕ_{1}^{- 2 i}) = \frac{σ^{2} ϕ_{1}^{k}}{1 - ϕ_{1}^{2}} . \end{matrix}

The autocorrelation function

ρ_{k}

is assumed to follow the first-order difference equation

\begin{matrix} ρ_{k} = ϕ_{1} ρ_{k - 1}, k \geq 1, ρ_{0} = 1, \end{matrix}

so that we have the following solution:

\begin{matrix} ρ_{k} = ϕ_{1}^{k}, k \geq 0 . \end{matrix}

As shown in Figure 2 below, when

| ϕ_{1} | < 1

, the absolute correlation between

X (t_{i})

and

X (t_{j})

of

X (t)

at two different time points

t_{i}

and

t_{j}

approaches 0 when the time is increasing.

The above AR(1) process (7) is a special case of an OU process by taking

ϕ_{1} = 1 - θ

and

μ = 0

, that is,

X_{t + 1} = X_{t} + θ (μ - X_{t}) + ϵ_{t + 1}, X_{0} = ϵ_{0} .

Therefore, the exponential covariance matrix constructed by the OU process can be used to describe the correlation in longitudinal data. An OU process is showed in Figure 3. It is obvious that when

X (t)

moves toward its mean position with the increase in time t, the correlation between

X (t_{i})

and

X (t_{j})

at two different time points becomes weaker and weaker.

4. Dirichlet Process and Dirichlet Process Mixture

4.1. Dirichlet Process

A Dirichlet process (DP) is a class of stochastic processes whose trace is a probability distribution. DP is often used in Bayesian inference to describe the prior knowledge of random variables.

Definition 1.

Given a measurable set

C

, a basis distribution

G_{0}

, and a positive real number α, a Dirichlet process

DP (G_{0}, α)

is a random process whose realization is a probability distribution on

C

. For any measurable finite partition of

C

,

{\{B_{i}\}}_{i = 1}^{n}

, if

G \sim DP (G_{0}, α)

, then

(G (B_{1}), \dots, G (B_{n})) \sim D i r (α G_{0} (B_{1}), \dots, α G_{0} (B_{n})) .

where

D i r

stands for the Dirichlet distribution. A DP is specified by the basis distribution

G_{0}

and the concentration parameter α.

A DP can also be regarded as an infinite dimensional generalization of the n-dimensional Dirichlet distribution. A DP is the conjugate priori of an infinite, nonparametric discrete distribution. An important application of DP is to use it as a prior distribution for infinite mixture models. A statistically equivalent description of DP is based on the stick-breaking process, described as follows. Given a discrete distribution

G (ρ) = \sum_{i = 1}^{\infty} w_{k} δ_{ρ_{k}} (ρ)

where

δ_{ρ k}

is an indicator function, namely

δ_{ρ_{k}} = \{\begin{matrix} 1, & θ = ρ_{k}; \\ 0, & θ \neq ρ_{k} . \end{matrix}

ρ_{k} \overset{i i d}{\sim} G_{0}, w_{k} = v_{k} \prod_{i = 1}^{k - 1} (1 - v_{i}), v_{i} \overset{i i d}{\sim} Beta (1, α)

, the probability distribution G defined in this way is said to obey the Dirichlet process, denoted as

G \sim D P (G_{0}, α)

. A DP can be constructed by a stochastic process. It possesses some advantages in its application in Bayesian model evaluation, density estimation, and mixed model clustering [20].

4.2. Dirichlet Process Mixture

We consider a set of i.i.d. sample data

{y_{1}, \dots, y_{n}}

as a part of an infinitely exchangeable sequence.

y_{i}

follows the probability distribution

F (y; θ)

with parameter

θ \in Θ

, that is,

y_{i} | θ \sim F (y; θ)

. It may be assumed that the prior distribution of

θ

is an unknown random probability measure G, and G can be constructed by DP, that is,

θ | G \sim G

,

G \sim D P (G_{0}, α)

. Thus, a Dirichlet process can be obtained using the hybrid (DPM) model definition:

y_{i} | θ \sim F (y; θ), θ | G \sim G, G \sim DP (G_{0}, α),

where

G_{0}

and

α

are the basis distribution and model parameter, respectively. In general, the distributions F and

G_{0}

will depend on additional hyperparameters not mentioned above. Since the trace of DP is discretized with probability 1, the above model can be regarded as a countably infinite mixture. We can integrate the distribution G in the above DPM to obtain the prior distribution of

θ

:

\begin{matrix} θ_{i} | (θ_{1}, \dots, θ_{i - 1}) & \sim & \frac{1}{α + i - 1} \sum_{j = 1}^{i - 1} δ_{θ_{j}} + \frac{α}{α + i - 1} G_{0} \\ = & \frac{i - 1}{α + i - 1} (\frac{1}{i - 1} \sum_{j = 1}^{i - 1} δ_{θ_{j}}) + \frac{α}{α + i - 1} G_{0}, \end{matrix}

where

δ_{θ_{j}}

(

j = 1, \dots, i - 1

,

i = 2, 3, \dots

) represents the point quality at

θ_{j}

. This is a mixture of two distributions with weights

p = (i - 1) / (α + i - 1)

and

1 - p = α / (α + i - 1)

.

Let

\{f (\cdot | θ) : θ \in Θ \subset ℜ^{d}\}

be a family of density functions with parameter

θ

. We assume that the density function of the observed data

y_{i}

follows the probability distribution family

F = \{f (\cdot | G) : G \in P\}

defined by

f (y_{i} | G) = \int_{} f (y_{i} | θ) d G (θ),

(8)

which is called the DPM density of

y_{i}

, where

G (θ)

is the distribution of

θ

.

F

is the nonparametric family of mixed distributions. The distribution G can be made random by assuming that G comes from a DP. In practice, to obtain the DPM density functions of

y_{1}, \dots, y_{n}

, it is necessary to introduce the latent variable

θ_{i}

related to

y_{i}

. That is, after introducing

θ_{i}

, the joint density function of

{y_{i} : i = 1, \dots, n}

is

\prod_{i = 1}^{n} f (y_{i} | θ_{i})

, where

{θ_{i} : i = 1, \dots, n}

is a sample from the distribution G. The joint density function of

{y_{i} : i = 1, \dots, n}

can be expressed as

\prod_{i = 1}^{n} f (y_{i} | G)

. We assign the prior distribution

D P (G_{0}, α)

to the distribution G with

G \sim D P (G_{0}, α)

. The Bayesian model is set up as follows:

\begin{matrix} y_{i} | (β, S_{i}, b_{i}, θ, σ) & \sim F (θ, β, S_{i}, b_{i}, σ), \\ θ_{k} | G & \sim G, k = 1, 2, \dots \\ G (\cdot) & = \sum_{k = 1}^{\infty} w_{k} δ_{θ_{k}} (\cdot), \end{matrix} \begin{matrix} w_{k} & = ν_{k} \prod_{i = 1}^{k - 1} (1 - ν_{i}), \\ ν_{i} & \overset{i i d}{\sim} B e t a (1, α), \\ θ_{k} & \overset{i i d}{\sim} G_{0}, k = 1, 2, \dots \end{matrix}

(9)

where

{y_{1}, \dots, y_{n}}

is a set of observations and

{θ_{1}, \dots, θ_{n}}

is a set of latent variables. Based on the discretization of DP, we can obtain the DPM density function of

S_{i}

by

p (S_{i} | G) = \int_{}^{} p (S_{i} | θ) G (d θ) = \sum_{k = 1}^{\infty} w_{k} p (S_{i} | θ_{k}) .

A semiparametric model can be set up by replacing the DP mixture on

θ

with the DP mixture on any subset of

w = {w_{k} : k \geq 1}

(given by (9)), depending on

θ

. Let

θ_{i} = (w_{i}, η)

with a prior distribution

η

. The above DP mixture is only performed on the parameter

w_{i}

. The joint density function

\prod_{i = 1}^{n} f (y_{i} | η, G)

of

{y_{i} : i = 1, \dots, n}

can be obtained. The prior distribution of

η

and G are prespecified. As a result, the semiparametric model can be determined [21].

5. Formulation of the Semiparametric Autoregressive Model

5.1. The Partial Dirichlet Process Mixture of Stochastic Process

The stochastic process in (1) is semiparameterized to generalize the general linear mixed model for longitudinal data. A nonparametric DP prior is assigned to the covariance parameter

(σ_{s}^{2}, ρ)

of

S_{i}

. To reduce the number of unknown parameters, a partial Dirichlet process mixture is performed on

S_{i}

, i.e., only a nonparametric DP prior is assigned to the parameter

ρ

. The semiparametric process is created as follows. First, we consider the OU process

S_{i}

associated with object i to have a covariance matrix

C_{i} (θ) = σ_{s}^{2} {\tilde{C}}_{i} (ρ)

, where

θ = (σ_{s}^{2}, ρ)

. The

(k, l)

-element of the matrix

{\tilde{C}}_{i} (ρ)

is given by

ρ^{| t_{i l} - t_{i k} |}

. The matrix

{\tilde{C}}_{i} (ρ)

associated with the random process

S_{i}

has the following form:

\begin{matrix} [\begin{matrix} 1 & ρ^{|t_{i 2} - t_{i 1}|} & ρ^{|t_{i 3} - t_{i 1}|} & \dots & ρ^{|t_{i n_{i}} - t_{i 1}|} \\ ρ^{|t_{i 1} - t_{i 2}|} & 1 & ρ^{|t_{i 3} - t_{i 2}|} & \dots & ρ^{|t_{i n_{i}} - t_{i 2}|} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ ρ^{|t_{i 1} - t_{i n_{i}}|} & ρ^{|t_{i 2} - t_{i n_{i}}|} & ρ^{|t_{i 3} - t_{i n_{i}}|} & \dots & 1 \end{matrix}]; \end{matrix}

(10)

Second, we give the parameter

ρ

a nonparametric DP prior:

ρ | G \sim G, G \sim DP (G_{0}, α),

(11)

where

G_{0}

is a known probability distribution,

α

is a constant, and the probability distribution G is generated by DP

(G_{0}, α)

satisfying

\begin{matrix} G (\cdot) & = \sum_{k = 1}^{\infty} w_{k} δ_{ϕ_{k}} (\cdot), \\ w_{k} & = ν_{k} \prod_{i = 1}^{k - 1} (1 - ν_{i}), \end{matrix} \begin{matrix} ν_{i} & \overset{i i d}{\sim} B e t a (1, α), \\ ϕ_{k} & \overset{i i d}{\sim} G_{0}, \end{matrix}

(12)

where

δ_{ϕ} (\cdot)

is the point quality of

ϕ

, and G is a random probability distribution.

Using the discretization of DP, for any parameter

ϕ

, we assume that

f (\cdot | ϕ)

is a density function depending on the parameter

ϕ

. Let

ϕ | G \sim G

,

G \sim DP (G_{0}, α)

. The DPM density function can be obtained:

p (\cdot | G) = \int_{}^{} p (\cdot | ϕ) G (d ϕ) = \sum_{k = 1}^{\infty} w_{k} p (\cdot | ϕ_{k}),

(13)

where

ϕ_{k} \overset{i i d}{\sim} G_{0}

. After embedding the above conditional prior into the distribution of the random process

S_{i}

, we formulate the DPM model for

S_{i}

as

S_{i} | (σ_{s}^{2}, ρ) \sim N (0, σ_{s}^{2} {\tilde{C}}_{i} (ρ)), ρ | G \sim G, G \sim DP (G_{0}, α) .

(14)

Using the discretization of the distribution function G, we obtain the DPM density of the random process

S_{i}

as follows.

p (S_{i} | σ_{s}^{2}, G) = \int_{}^{} N (S_{i} | 0, σ_{s}^{2} {\tilde{C}}_{i} (ρ)) d G (ρ) = \sum_{k = 1}^{\infty} w_{k} N (S_{i} | 0, σ_{s}^{2} {\tilde{C}}_{i} ({\tilde{ρ}}_{k})) .

(15)

This is an infinitely countable mixture of multivariate Gaussian density functions, where

{\tilde{ρ}}_{k} \overset{i i d}{\sim} G_{0}

,

N (\cdot | 0, σ_{s}^{2} {\tilde{C}}_{i} (ρ))

represents a multivariate Gaussian density function with a zero-mean vector and a covariance matrix of

σ_{s}^{2} {\tilde{C}}_{i} (ρ)

. It can be seen that after the semiparametric treatment of the stochastic process

S_{i}

, its distribution is a mixture of stochastic processes, which is a more general mixture of OU processes.

Since G is discretized with probability 1, it provides an automatic clustering effect on the autocorrelation structure of the objects. After the OU process

S_{i}

is semiparameterized, the covariance between any two time points

t_{i l}

,

t_{i k}

can be obtained by

\begin{matrix} cov (s_{i} (t_{i l}), s_{i} (t_{i k}) | σ_{s}^{2}, G) = \int cov (s_{i} (t_{i l}), s_{i} (t_{i k}) | σ_{s}^{2}, ρ) d G (ρ) \\ = & \int_{}^{} σ_{s}^{2} ρ^{| t_{i l} - t_{i k} |} d G (ρ) = \sum_{k = 1}^{\infty} w_{k} σ_{s}^{2} {\tilde{ρ}}_{k}^{| t_{i l} - t_{i k} |} . \end{matrix}

(16)

After semiparameterization of the OU process

S_{i}

, it not only contains an AR structure but also has an automatic clustering effect. If the observations

{y_{i} : i = 1, \dots, n}

from model (1) are obtained at equal interval time points, the covariance matrix structure of

y_{i}

becomes AR(1).

5.2. The Framework of a Hierarchical Model

We introduce potential parameters

ρ_{1}, ρ_{2}, \dots, ρ_{n}

(corresponding to n objects) and semiparameterize the general linear mixed model (1) with observations

y_{1}, y_{2}, \dots, y_{n}

satisfying the following model:

\begin{matrix} y_{i} & = X_{i} β + Z_{i} b_{i} + S_{i} + ϵ_{i}, \\ b_{i} | τ & \sim N (0, D (τ)), \\ S_{i} | (σ_{s}^{2}, ρ_{i}) & \sim N (0, σ_{s}^{2} {\tilde{C}}_{i} (ρ_{i})), \end{matrix} \begin{matrix} ρ_{i} | G & \sim G, \\ G & \sim DP (G_{0}, α), \\ ϵ_{i} & \sim N (0, σ^{2} I_{n_{i}}) . \end{matrix}

(17)

This is converted to a hierarchical model as follows:

\begin{matrix} y_{i} | (β, b_{i}, S_{i}, σ^{2}) & \overset{i n d}{\sim} N (X_{i} β + Z_{i} b_{i} + S_{i}, σ^{2} I), \\ S_{i} | (σ_{s}^{2}, ρ_{i}) & \overset{i n d}{\sim} N (0, σ_{s}^{2} {\tilde{C}}_{i} (ρ_{i})), \\ {ρ_{1}, ρ_{2}, \dots, ρ_{n}} | G & \overset{i i d}{\sim} G, \\ G & \sim DP (G_{0}, α), \\ b_{i} | τ & \overset{i i d}{\sim} N (0, D (τ)), \\ (σ^{2}, β, τ, σ_{s}^{2}) & \sim p (σ^{2}) \times N (β_{0}, B) \times p (τ) \times p (σ_{s}^{2}), \end{matrix}

(18)

where “ind” means independent only,

S_{i}

and

b_{i} (1 \leq i \leq n)

are independent of each other, and

p (σ^{2})

,

N (β_{0}, B)

,

p (τ)

, and

p (σ_{s}^{2})

are the prior distributions of parameters

σ^{2}

,

β

,

τ

, and

σ_{s}^{2}

, respectively. If

b_{i}

is a scalar quantity corresponding to the random intercept of the model, we have

τ = σ_{b}^{2}

. This model generalizes the exponential covariance function of the OU process. It realizes the automatic clustering effect between objects through parameter

ρ_{i}

. We call model (18) the Bayesian semiparametric autoregressive model, or simply the SPAR (semiparametric autoregressive) model.

Note that the SPAR model (18) is a parallel version of Quintana et al.’s [4] with additional conditions that the random effects

{S_{i} : i = 1, \dots, n}

have a common variance component

σ_{S}^{2}

, and the autocorrelation parameters

(ρ_{1}, ρ_{2}, \dots, ρ_{n})

are assumed to be conditional i.i.d. with

D P (G_{0}, α)

. This is different from model (3) in Quintana et al. [4], which assumes that the random-effect parameters

(ϕ_{1}, \dots, ϕ_{n})

are conditional i.i.d. with

D P (G_{0}, α)

, where

ϕ_{i} = (σ_{i}^{2}, ρ_{i})

with

v a r (S_{i}) = σ_{i}^{2} {\tilde{C}}_{i} (ρ_{i})

in our notation. The simpler assumptions help simplify the posterior inference. Quintana et al.’s [4] model assumptions do not lead to explicit posterior distributions of model parameters. Their posterior inference on model parameters may have to be performed by nonparametric MCMC algorithms. Because no explicit expressions related to posterior inference on model parameters are given by [4], we are not able to conclude if Quintana et al.’s [4] Bayesian posterior inference is a complete semiparametric MCMC method or a combination of semiparametric and nonparametric MCMC methods. By imposing simpler assumptions on the parameters in model (18), we are able to obtain the explicit posterior distributions of all model parameters in the subsequent context and conclude that our approach to handling the SPAR model (18) is a complete semiparametric MCMC method.

In the SPAR model (18), the random process part is an OU process mixture. The correlation matrix

{\tilde{C}}_{i} (ρ_{i})

possesses the following form:

\begin{matrix} {\tilde{C}}_{i} (ρ_{i}) = [\begin{matrix} 1 & ρ_{i}^{| t_{i 2} - t_{i 1} |} & ρ_{i}^{| t_{i 3} - t_{i 1} |} & \dots & ρ_{i}^{| t_{i l} - t_{i 1} |} & \dots & ρ_{i}^{| t_{i n_{i}} - t_{i 1} |} \\ ρ_{i}^{| t_{i 1} - t_{i 2} |} & 1 & ρ_{i}^{| t_{i 3} - t_{i 2} |} & \dots & ρ_{i}^{| t_{i l} - t_{i 2} |} & \dots & ρ_{i}^{| t_{i n_{i}} - t_{i 2} |} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ ρ_{i}^{| t_{i 1} - t_{i k} |} & ρ_{i}^{| t_{i 2} - t_{i k} |} & ρ_{i}^{| t_{k 3} - t_{i k} |} & \dots & ρ_{i}^{| t_{i l} - t_{i k} |} & \dots & ρ_{i}^{| t_{i n_{i}} - t_{i k} |} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ ρ_{i}^{| t_{i 1} - t_{i n_{i}} |} & ρ_{i}^{| t_{i 2} - t_{i n_{i}} |} & ρ_{i}^{| t_{i 3} - t_{i n_{i}} |} & \dots & ρ_{i}^{| t_{i l} - t_{i n_{i}} |} & \dots & 1 \end{matrix}] . \end{matrix}

(19)

Using the property of the OU process structure,

{\tilde{C}}_{i} (ρ_{i})

can be analyzed backwards. The inverse matrix of

{\tilde{C}}_{i} (ρ_{i})

is a tridiagonal. Denote by

r_{i k} = ρ_{i}^{| t_{i, k + 1} - t_{i k} |}

(

k = 1, 2, \dots, n_{i} - 1

). We have the

(k, l)

-th element of

{\tilde{C}}_{i} (ρ_{i})

given by

{\{{\tilde{C}}_{i} (ρ_{i})\}}_{k l} = ρ_{i}^{| t_{i l} - t_{i k} |} = ρ_{i}^{| t_{i, k + 1} - t_{i k} | + | t_{i, k + 2} - t_{i, k + 1} | + \dots + | t_{i l} - t_{i, l - 1} |} = r_{i k} r_{i, k + 1} \dots r_{i, l - 1} .

(20)

So,

{\tilde{C}}_{i} (ρ_{i})

can be rewritten as

\begin{matrix} [\begin{matrix} 1 & r_{i 1} & r_{i 1} r_{i 2} & \dots & r_{i 1} \dots r_{i, l - 1} & \dots & r_{i 1} \dots r_{i, n_{i} - 1} \\ r_{i 1} & 1 & r_{i 2} & \dots & r_{i 2} \dots r_{i, l - 1} & \dots & r_{i 2} \dots r_{i, n_{i} - 1} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ r_{i 1} \dots r_{i, k - 1} & r_{i 2} \dots r_{i, k - 1} & r_{i 3} \dots r_{i, k - 1} & \dots & r_{i k} \dots r_{i, l - 1} & \dots & r_{i k} \dots r_{i, n_{i} - 1} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ r_{i 1} \dots r_{i, n_{i} - 1} & r_{i 2} \dots r_{i, n_{i} - 1} & r_{i 3} \dots r_{i, n_{i} - 1} & \dots & r_{i l} \dots r_{i, n_{i} - 1} & \dots & 1 \end{matrix}] . \end{matrix}

Using the correlation theory of the anticorrelation random variables [22], we can compute the elements of the inverse matrix

{\tilde{C}}_{i}^{- 1} (ρ_{i})

of

{\tilde{C}}_{i} (ρ_{i})

as follows:

\begin{matrix} {\{{\tilde{C}}_{i}^{- 1} (ρ_{i})\}}_{11} & = & \frac{1}{1 - r_{i 1}^{2}}, \\ {\{{\tilde{C}}_{i}^{- 1} (ρ_{i})\}}_{k k} & = & \frac{1 - r_{i, k - 1}^{2} r_{i k}^{2}}{1 - r_{i, k - 1}^{2} - r_{i k}^{2} + r_{i, k - 1}^{2} r_{i k}^{2}}, k = 2, 3, \dots, n_{i} - 1, \\ {\{{\tilde{C}}_{i}^{- 1} (ρ_{i})\}}_{k k + 1} & = & - \frac{r_{i k}}{1 - r_{i k}^{2}}, k = 1, 2, \dots, n_{i} - 1, \\ {\{{\tilde{C}}_{i}^{- 1} (ρ_{i})\}}_{n_{i} n_{i}} & = & \frac{1}{1 - r_{i, n_{i} - 1}^{2}} . \end{matrix}

(21)

{\tilde{C}}_{i}^{- 1} (ρ_{i})

turns out to be a tridiagonal matrix.

The random process corresponding to the i-th object

S_{i} = {(s_{i 1}, s_{i 2}, \dots, s_{i n_{i}})}^{'}

satisfies the conditional distribution:

S_{i} | σ_{s}^{2}, ρ_{i} \sim N_{n_{i}} (0, σ_{s}^{2} {\tilde{C}}_{i} (ρ_{i})) .

So, the conditional density function is

f (S_{i} | σ_{s}^{2}, ρ_{i}) = f (s_{i 1}, \dots, s_{i n_{i}} | σ_{s}^{2}, ρ_{i}) = f (s_{i 1} | σ_{s}^{2}, ρ_{i}) \dots f (s_{i n_{i}} | s_{i 1}, \dots, s_{i, n_{i} - 1}, σ_{s}^{2}, ρ_{i}) .

(22)

When the inverse matrix of the covariance matrix

{\tilde{C}}_{i} (ρ_{i})

of the random process

S_{i}

has the above tridiagonal form, based on the property of the multivariate normal distribution, the conditional density function

f (S_{i} | σ_{s}^{2}, ρ_{i})

of

S_{i}

can be decomposed into the product of

n_{i}

univariate Gaussian density functions as follows:

\begin{matrix} s_{i 1} & \sim N (0, σ_{s}^{2}), \\ s_{i 2} | (s_{i 1} = {\tilde{s}}_{1}) & \sim N ({\tilde{s}}_{1} r_{i 1}, σ_{s}^{2} (1 - r_{i 1}^{2})), \\ ⋮ \\ s_{i k} | (s_{i 1} = {\tilde{s}}_{1}, \dots, s_{i, k - 1} = {\tilde{s}}_{k - 1}) & \sim N ({\tilde{s}}_{k - 1} r_{i, k - 1}, σ_{s}^{2} (1 - r_{i, k - 1}^{2})), \\ ⋮ \\ s_{i n_{i}} | (s_{i 1} = {\tilde{s}}_{1} \dots, s_{i, n_{i} - 1} = {\tilde{s}}_{n_{i} - 1}) & \sim N ({\tilde{s}}_{n_{i} - 1} r_{i, n_{i} - 1}, σ_{s}^{2} (1 - r_{i, n_{i} - 1}^{2})) . \end{matrix}

(23)

It is known that

r_{i k} = ρ_{i}^{| t_{i, k + 1} - t_{i k} |}, k = 1, 2, \dots, n_{i} - 1

. Hence,

\begin{matrix} s_{i 1} & \sim N (0, σ_{s}^{2}), \\ s_{i 2} | (s_{i 1} = {\tilde{s}}_{1}) & \sim N ({\tilde{s}}_{1} ρ^{| t_{i 2} - t_{i 1} |}, σ_{s}^{2} (1 - ρ^{2 | t_{i 2} - t_{i 1} |})), \\ ⋮ \\ s_{i k} | (s_{i, k - 1} = {\tilde{s}}_{k - 1}) & \sim N ({\tilde{s}}_{k - 1} ρ_{i}^{| t_{i k} - t_{i, k - 1} |}, σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i k} - t_{i, k - 1} |})), \\ ⋮ \\ s_{i n_{i}} | (s_{i, n_{i} - 1} = {\tilde{s}}_{n_{i} - 1}) & \sim N ({\tilde{s}}_{n_{i} - 1} ρ_{i}^{| t_{i n_{i}} - t_{i, n_{i} - 1} |}, σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i n_{i}} - t_{i, n_{i} - 1} |})) . \end{matrix}

(24)

As a result, the conditional density function of

S_{i}

in the random process

f (S_{i} | σ_{s}^{2}, ρ_{i})

can be decomposed as follows:

\begin{matrix} f (S_{i} | σ_{s}^{2}, ρ_{i}) = & N (s_{i 1} | 0, σ_{s}^{2}) \dots N (s_{i k} | {\tilde{s}}_{k - 1} ρ_{i}^{| t_{i k} - t_{i, k - 1} |}, σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i k} - t_{i, k - 1} |})) \dots \\ N (s_{i n_{i}} | {\tilde{s}}_{n_{i} - 1} ρ_{i}^{| t_{i n_{i}} - t_{i, n_{i} - 1} |}, σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i n_{i}} - t_{i, n_{i} - 1} |})) . \end{matrix}

(25)

The above decomposition greatly simplifies the computation of the distribution of the random process

S_{i}

, which can be obtained by direct computation of the distribution of a single Gaussian variable.

6. The Marginal Likelihood, Prior Determination, and Posteriori Inference

Let

y = (y_{1}, y_{2}, \dots, y_{n})

,

b = (b_{1}, b_{2}, \dots, b_{n})

,

S = (S_{1}, S_{2}, \dots, S_{n})

, and

ρ = (ρ_{1}, ρ_{2}, \dots, ρ_{n})

. Denote by

Ψ = (β, σ^{2}, σ_{s}^{2}, ρ, τ)

, which represents the parameter vector in model (18). A random process that specifies the AR structure in the SPAR model (18) is

S_{i} | σ_{s}^{2}, ρ_{i} \sim N (0, σ_{s}^{2} {\tilde{C}}_{i} (ρ_{i})), ρ_{i} | G \sim G_{0}, G \sim DP (G_{0}, α) .

(26)

The marginal distribution of the covariance parameter

ρ_{i}

can be obtained from the joint distribution of all terms in (26) as follows:

ρ_{1} \sim G_{0}, ρ_{i} | (ρ_{1}, ρ_{2}, \dots, ρ_{i - 1}) \sim \frac{α}{α + i - 1} G_{0} + \frac{1}{α + i - 1} \sum_{k = 1}^{i - 1} δ_{ρ_{k}}

(27)

for

i = 2, \dots, n

. We assume that

{S_{1}, S_{2}, \dots, S_{n}}

is a sample from the multivariate Gaussian distribution. It can be regarded as a part of an exchangeable sequence. By using the exchangeable property, the corresponding order i of the observation

y_{i}

can be considered as the last one in all n observations from n objects. Then,

y_{i}

is the corresponding vector

S_{i}

. Or we can consider it as the last one that gives us

ρ_{(- i)}

. The conditional prior score of

ρ_{i}

is

\begin{matrix} ρ_{i} | ρ_{(- i)} \sim \frac{α}{α + i - 1} G_{0} + \frac{1}{α + i - 1} \sum_{k \neq i} δ_{ρ_{k}}, \end{matrix}

(28)

where

ρ_{(- i)}

represents all other

ρ

’s except

ρ_{i}

. According to the product formula, we have

p (ρ) = p (ρ_{1}, ρ_{2}, \dots, ρ_{n}) = p (ρ_{1}) p (ρ_{2} | ρ_{1}) \dots p (ρ_{n} | ρ_{1}, ρ_{2}, \dots, ρ_{n - 1}) .

Based on the above conditional prior distributions, we can obtain the prior distribution

p (ρ)

of parameter

ρ

. Compute the likelihood function of the SPAR model (18) as follows:

\begin{matrix} L (Ψ | y, b, S) & = f (y, b, S | Ψ) \\ = f (y | b, S, Ψ) f (b | Ψ) f (S | Ψ) \\ = f (y | b, S, β, σ^{2}) f (b | τ) f (S | σ_{s}^{2}, ρ) . \end{matrix}

(29)

Let

p (Ψ) = p (β, σ^{2}, σ_{s}^{2}, ρ, τ) = p (β) p (σ^{2}) p (σ_{s}^{2}) p (ρ) p (τ),

This implies that the prior distributions of each parameter are independent of each other. The joint posterior of the SPAR model is computed as follows:

\begin{matrix} p (Ψ, b, S | y) & \propto p (y, b, S | Ψ) p (Ψ) \\ = \{\prod_{i = 1}^{n} f (y_{i}, b_{i}, S_{i} | Ψ)\} p (Ψ) \\ = \{\prod_{i = 1}^{n} f (y_{i} | β, S_{i}, b_{i}, σ^{2}) f (b_{i} | τ) f (S_{i} | σ_{s}^{2}, ρ_{i})\} p (Ψ) . \end{matrix}

(30)

That is, the joint posterior distribution is the product of the likelihood function and the prior distribution. The first term of the likelihood function is the density function of the estimated n-dimensional Gaussian distribution

N (X_{i} β + Z_{i} b_{i} + S_{i}, σ^{2} I)

at

y_{i}

. The second term is in

b_{i}

, the density function of the estimated q-dimensional Gaussian distribution

N (0, D (τ))

. The third term is the product of the

n_{i}

univariate Gaussian distribution density functions.

To estimate the joint posterior distribution of the SPAR model (18), it is necessary to use the Bayesian theorem to obtain the conditional distribution of parameters in the model:

\begin{matrix} p (β | y, b, S, σ^{2}, σ_{s}^{2}, ρ, τ), \\ p (b_{i} | y, β, b_{- i}, S, σ^{2}, σ_{s}^{2}, ρ, τ), \\ p (S_{i} | y, β, b, S_{- i}, σ^{2}, σ_{s}^{2}, ρ, τ), \end{matrix} \begin{matrix} p (σ^{2} | y, β, b, S, σ_{s}^{2}, ρ, τ), \\ p (σ_{s}^{2} | y, β, b, S, σ^{2}, ρ, τ), \\ p (ρ_{i} | y, β, b, S, σ^{2}, σ_{s}^{2}, ρ_{- i}, τ), \\ p (τ | y, β, b, S, σ^{2}, σ_{s}^{2}, ρ) . \end{matrix}

(31)

The MCMC algorithm is employed to estimate these conditional distributions. The conditional distribution of a parameters is denoted by

p (\cdot | *)

in the subsequent context.

7. A Monte Carlo Study

7.1. Simulation Design

To verify that the SPAR model (18) can effectively simulate the correlation structure in the longitudinal data, the empirical sample data were generated under the four situations of zero mean and covariance structure being compound symmetric (CS), autoregressive (AR), mixed CS and AR, and nonstructured, respectively. The MCMC method was employed to estimate the covariance matrix and the correlation matrix in the four different cases, respectively, and compared with the traditional Bayesian inverse-Wishart estimation method.

Consider a brief form of the SPAR model (18):

\begin{matrix} y_{i} = (β + b_{i}) e + S_{i} + ϵ_{i}, \end{matrix}

(32)

where

β

represents a fixed intercept, e is an

n_{i} \times 1

vector of ones,

b_{i} \sim N (0, σ_{b}^{2})

is a random intercept, and

S_{i}

is an OU process corresponding to

y_{i}

. Convert the above model into a hierarchical model:

\begin{matrix} y_{i} | (β, b_{i}, S_{i}, σ^{2}) & \overset{i n d}{\sim} N ((β + b_{i}) e + S_{i}, σ^{2} I), \\ S_{i} | (σ_{s}^{2}, ρ_{i}) & \overset{i n d}{\sim} N (0, σ_{s}^{2} \tilde{C} (ρ_{i})), \\ {ρ_{1}, ρ_{2}, \dots, ρ_{n}} | G & \overset{i i d}{\sim} G, \\ G & \sim DP (G_{0}, α), \end{matrix} \begin{matrix} b_{i} & \overset{i i d}{\sim} N (0, σ_{b}^{2}), \\ β & \sim N (μ_{0}, σ_{0}^{2}), \\ σ^{2} & \sim I G (α_{0}, β_{0}), \\ σ_{s}^{2} & \sim I G (α_{1}, β_{1}) . \end{matrix}

(33)

For the special case of balanced sample design with

n_{i} = m

(

i = 1, 2, \dots, n

) and

t_{i j} = t_{j}

for

i = 1, 2, \dots, n

, the joint posterior distribution of model (18) can be easily computed by

\begin{matrix} p (β, σ^{2}, σ_{s}^{2}, ρ, b, S | y) = \frac{p (y | β, b, S, σ^{2}) p (b) p (S | σ_{s}^{2}, ρ) p (β, σ^{2}, σ_{s}^{2}, ρ)}{p (y)} \\ \propto & p (y | β, b, S, σ^{2}) p (b) p (S | σ_{s}^{2}, ρ) p (β, σ^{2}, σ_{s}^{2}, ρ) \\ = & \{\prod_{i = 1}^{n} f (y_{i} | β, b_{i}, S_{i}, σ^{2}) f (b_{i}) f (S_{i} | σ_{s}^{2}, ρ_{i})\} p (β) p (σ^{2}) p (σ_{s}^{2}) p (ρ) . \end{matrix}

(34)

To realize the Bayesian inference of the model, it is necessary to use the Bayesian theorem to obtain the conditional distribution of each parameter. The lengthy derivations of all conditional probability distributions are given in the Appendix A at the end of the paper.

Performing a posteriori inference on the covariance matrix

Σ = (σ_{i j})

(

m \times m

) is equivalent to estimating the posteriori mean of

Σ

:

\hat{Σ} = E \{Σ | y\} = E \{σ_{b}^{2} A + σ_{s}^{2} \tilde{C} (ρ) + σ^{2} I | y\}, m \times m

(35)

where

A = e e^{'}

. The posteriori estimate for the correlation matrix

R

can be obtained by using

\hat{Σ}

to calculate the estimated

\hat{R}

of the correlation matrix

R

:

\hat{R} = R_{0} \hat{Σ} R_{0}, R_{0} = d i a g ({\hat{σ}}_{11}^{- \frac{1}{2}}, \dots, {\hat{σ}}_{m m}^{- \frac{1}{2}}), \hat{Σ} = ({\hat{σ}}_{i j}) : m \times m .

(36)

The mean square error loss function is used to evaluate the performance of the posterior mean:

M S E (Σ) = {(\frac{1}{m^{2}} \sum_{i = 1}^{m} \sum_{j = 1}^{m} {({\hat{σ}}_{i j} - σ_{i j})}^{2})}^{\frac{1}{2}} .

(37)

Compared with the Bayesian estimates of the covariance matrix and correlation matrix under other loss functions [23], the most common one is the entropy loss function defined by

\begin{matrix} L_{1} (\hat{Σ}, Σ) = tr (\hat{Σ} Σ^{- 1}) - \log | \hat{Σ} Σ^{- 1} | - m . \end{matrix}

(38)

The quadratic loss function is given by

\begin{matrix} L_{2} (\hat{Σ}, Σ) = tr {(\hat{Σ} Σ^{- 1} - I)}^{2} . \end{matrix}

(39)

The Bayesian estimates of the covariance matrices based on the loss functions

L_{1} (\hat{Σ}, Σ)

and

L_{2} (\hat{Σ}, Σ)

are given by

\hat{Σ} = {[E (Σ^{- 1} | y)]}^{- 1} and vec (\hat{Σ}) = {[E (Σ^{- 1} \otimes Σ^{- 1} | y)]}^{- 1} vec [E (Σ^{- 1} | y)],

(40)

respectively, where vec stands for the column vectorization of a matrix, and “⊗” denotes the Kronecker product of matrices. Similarly, the Bayesian estimate of correlation moment

R

under various loss functions can be obtained.

7.2. Simulation Specification and Display of Empirical Results

In the simulation, we firstly set up the four different covariance structures as in the following Equations (41) and the priors as given in the following Equations (43)–(46). Then, we run the MCMC training trial 2000 times. After seeing the convergence trend approach relative stability after 2000 training trials, we generated 20 datasets consisting of 100 sample points of length 6 for each object. This is equivalent to the longitudinal data structure in Table 1, with

n = 100

,

p = 6

, and

k = 1

for each generated longitudinal dataset.

The four covariance matrices are designed as follows:

\begin{matrix} Σ_{1} (i, j) & = 10 I (i = j) + 7 I (i \neq j) \\ Σ_{2} (i, j) & = 10 \times 0 . 4^{| i - j |} \\ Σ_{3} (i, j) & = 0.3 Σ_{1} (i, j) + 0.7 Σ_{2} (i, j) \\ Σ_{4} (i, j) & = \frac{10}{\sqrt{1 + | i - j |}} . \end{matrix}

(41)

The root mean square error (RMSE) for estimating

Σ

is computed by

R M S E = \frac{1}{20} \sum_{k = 1}^{20} MSE, M S E = \sqrt{\frac{1}{36} \sum_{i = 1}^{6} \sum_{j = 1}^{6} {(Σ_{i j} - {\hat{Σ}}_{i j})}^{2}} .

(42)

The RMSE for estimating the correlation matrix

R

is computed similarly.

For each of the four different covariance structures, we used the R-package (called Pandas, available upon request) to perform a preliminary analysis on the simulated data with a specified prior distribution, respectively. Then, we employed the MCMC algorithm to perform the sampling estimation on each parameter in the model. We used the three methods mentioned before to perform the convergence assessment on the sampling results. Finally, we obtained the estimates for the four types of covariance and correlation matrices. The estimation error was computed based on three types of loss functions. The inverse-Wishart estimation error was also obtained for comparison.

For the case of the CS covariance structure, the prior distribution of the parameter in the model is specified as follows:

\begin{matrix} G_{0} = N & (0, 10), α = 0.75, \\ b_{i} & \overset{i i d}{\sim} N (0, 7), \end{matrix} \begin{matrix} β & \sim N (0, 25), \\ σ^{2} & \sim I G (3, 2), \\ σ_{s}^{2} & \sim I G (6, 10) . \end{matrix}

(43)

Each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations were taken to draw the posterior sampling trend diagram, as shown in Figure 4.

The negative ELBO loss histogram and energy graph estimated by the model with the CS structure are shown in Figure 5 and Figure 6, respectively, where the energy graph in Figure 6 was generated by the Python package PyMC3 (https://www.pymc.io/projects/docs/en/v3/pymc-examples/examples/getting_started.html) (accessed on 26 April 2023), which displays two simulated density curves: the blue one stands for the energy value at each MCMC sample point subtracted by the average energy (like conducting data centerization); the green one stands for the difference of the energy function (like deriving the derivative of a differential function). A normal energy distribution from an MCMC sampling indicates the sampling process tends to a stable point. It implies convergence of the MCMC sampling. More details on energy computation in MCMC sampling by PyMC3 can be found on this website (https://www.pymc.io/projects/docs/en/v3/api/inference.html#module-pymc3.sampling). Figure 5 and Figure 6 incorporate both popular methods for evaluating the convergence of the MCMC sampling in our Monte Carlo study. All energy graphs in the subsequent context have the same interpretation as they do here. We skip the tedious interpretations for all other energy graphs to save space.

As can be seen from Figure 5 and Figure 6, after several iterations of the MCMC algorithm, the negative ELBO loss is stable between 0 and 25, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the sampling distribution and the trend graph, we can conclude that the MCMC algorithm is convergent.

For the case of the AR covariance structure, the prior distribution of each parameter in the model is specified as follows:

\begin{matrix} G_{0} = N & (0.4, 10), α = 0.75, \\ b_{i} & \overset{i i d}{\sim} N (0, 0.01), \end{matrix} \begin{matrix} β & \sim N (0, 10) \\ σ^{2} & \sim I G (1.25, 0.01) \\ σ_{s}^{2} & \sim I G (3.76, 9.566) \end{matrix}

(44)

Each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations were taken to draw the posterior sampling trend diagram, as shown in Figure 7. Note that the DP-estimated density curves show different central locations from those in Figure 4, because they are generated from different prior distributions with different covariance structures (see the difference between Equations (43) and (44)).

The histogram of negative ELBO loss and the energy graph for when the AR structure is estimated are shown in Figure 8 and Figure 9, respectively. They show that the histogram of the negative ELBO loss tends to be stable after several iterations, which is basically below 50, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the posteriori sampling and the trend graph, it can be concluded that the algorithm converges quickly.

For the covariance of the mixed structure of CS and AR, the prior distribution of each parameter in the model is specified as follows:

\begin{matrix} G_{0} = N & (0.4, 10), α = 0.75, \\ b_{i} & \overset{i i d}{\sim} N (0, 2.1), \end{matrix} \begin{matrix} β & \sim N (0, 10), \\ σ^{2} & \sim I G (15.2, 14), \\ σ_{s}^{2} & \sim I G (3.82, 6.867) . \end{matrix}

(45)

Each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations were taken to draw the posterior sampling trend graph, as shown in Figure 10. The negative ELBO loss histogram and the model energy graph are shown in Figure 11 and Figure 12, respectively.

As can be seen from the histogram of the negative ELBO loss in Figure 11, the negative ELBO loss is basically under control between 0 and 30, after several iterations of the algorithm, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the posteriori sampling and the trend graph, we can conclude that the algorithm is convergent.

For the case of independent structure covariance, the prior distributions of the parameters in the model are specified as follows:

\begin{matrix} G_{0} = N & (0.672, 10), α = 0.75, \\ b_{i} & \overset{i i d}{\sim} N (0, 1.106), \end{matrix} \begin{matrix} β & \sim N (0, 20), \\ σ^{2} & \sim I G (2.22, 7.48), \\ σ_{s}^{2} & \sim I G (3.20, 3.443) . \end{matrix}

(46)

The MCMC method was used to sample and estimate each parameter in the model. The last 1000 iterations were taken to draw the posterior sampling trend graph shown in Figure 13.

The negative ELBO loss histogram and the model energy graph are shown in Figure 14 and Figure 15, respectively. It can be seen that the negative ELBO loss approaches 0 quickly. Based on the posterior sampling and the trend diagram, we can conclude that the algorithm is convergent.

The above model is applied to the estimation of covariance matrices, correlation matrices, and model errors. We compare it with with the inverse-Wishart method. The outcomes of the estimation errors are shown in Table 2:

Similarly, the estimation results of the four corresponding correlation matrices are shown in Table 3:

In Table 2 and Table 3, the covariance models are compared with each other based on three types of loss functions. In estimating the four covariance structures, the SPAR model performs better than the traditional inverse-Wishart method for each covariance structure. The estimation error of the SPAR model is much smaller than that of the inverse-Wishart method. When estimating the correlation matrix, except for the relatively poor SPAR performance under the strict AR structure, all other models perform roughly the same based on the quadratic loss function

L_{2} (\hat{Σ}, Σ)

. Based on the comparison of the estimation errors in Table 2 and Table 3, the SPAR model shows fairly good performance in estimating the covariance matrix and correlation matrix.

7.3. Analysis of a Real Wind Speed Dataset

To verify the effectiveness of the SPAR model built in this paper in practical application, we employ the Hongming data, which contain the ground meteorological data of Dingxin Station, Jinta County, Jiuquan City, Gansu Province, China. We use the SPAR model and the MCMC method to estimate the covariance matrix and correlation matrix under four different covariance structures (CS, AR, the mixture of CS and AR, and unstructured) and compare the estimates with those from the traditional Bayesian estimation by the inverse-Wishart method. The covariance structure of the four cases is shown in Equation (41), and the mean square error is shown in Equation (42). The real data are arranged into a longitudinal data structure, as shown in Table 1 with

n = 100

,

p = 6

, and

k = 1

. The following graphs are plotted in the same way as in the corresponding graphs in the Monte Carlo study in Section 7.2 but with real data input in the same Python program.

For the case of CS covariance structure, each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations are taken to draw the posterior sampling trend diagram, as shown in Figure 16.

The negative ELBO loss histogram and the model energy graph are shown in Figure 17 and Figure 18, respectively. It can be seen that the negative ELBO loss approaches 0 quickly, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the posterior sampling and the trend diagram, we can conclude that the algorithm is convergent.

For the case of the AR covariance structure, each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations are taken to draw the posterior sampling trend diagram, as shown in Figure 19:

The negative ELBO loss histogram and the energy graph estimated by the model under the AR structure are shown in Figure 20 and Figure 21, respectively. It shows that the histogram of the negative ELBO loss approaches 0 quickly. Based on the posteriori sampling and the trend graph, it can be concluded that the algorithm converges quickly.

For the covariance of the mixed structure of CS and AR, each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations are taken to draw the posterior sampling trend graph, as shown in Figure 22. The negative ELBO loss histogram and the model energy graph are shown in Figure 23 and Figure 24, respectively.

As can be seen from the histogram of the negative ELBO loss in Figure 23, the negative ELBO loss approaches 0 quickly after several iterations of the algorithm. Based on the posteriori sampling and the trend graph, we can conclude that the algorithm is convergent.

For the case of independent structure covariance, the MCMC method is used to sample and estimate each parameter in the model. The last 1000 iterations are taken to draw the posterior sampling trend graph shown in Figure 25.

The negative ELBO loss histogram and the model energy graph are shown in Figure 26 and Figure 27, respectively. It can be seen that the negative ELBO loss approaches 0 quickly. Based on the posterior sampling and trend diagram, we can conclude that the algorithm is convergent.

In Table 4 and Table 5, the covariance models are compared with each other based on three types of loss functions, RMSE,

L_{1}

, and

L_{2}

. When using the four covariance structures based on either the covariance matrix or the correlation matrix, the SPAR model always performs better than the traditional inverse-Wishart method—it always has a smaller value of RMSE,

L_{1}

, or

L_{2}

when comparing the SPAR model with the Inv-W model under the same covariance structure C1, C2, C3, or C4 in Table 4 and Table 5.

8. Concluding Remarks

The SPAR model (18) provides an explicitly complete semiparametric solution to the estimation of model parameters through the MCMC algorithm. Compared with the model formulation in Quintana et al. [4], the MCMC algorithm for posterior inference on model (18) is easier to implement and may converge faster because of the explicit simple posterior distributions of the model parameters. An effective and fast-converging MCMC algorithm plays an important role in Bayesian statistical inference. The SPAR model (18) gains some trade-off in easy implementation and fast convergence in the MCMC algorithm by imposing simpler assumptions on model parameters.

With regard to the option of choosing the initial values for estimating model parameters by the MCMC algorithm, we recommend using a numerical optimization method such as the maximum posteriori (MAP) estimation to obtain an estimator as the initial value of a parameter. It is likely to speed up the convergence speed of the sampling parameter. We employ the Gibbs sampling algorithm when estimating the parameters in the model. The convergence diagnosis of Markov chains generated by the MCMC algorithm is assessed by the posterior sampling trend plot, negative ELBO histogram, and the energy graph, which show the observation of fast convergence of the MCMC sampling process. By applying the SPAR model to four different covariance structures through both Monte Carlo study and a real dataset, we illustrate its effectiveness in handling nonstationary forms of covariance structures and its domination over the traditional inverse-Wishart method.

It should be pointed out that the effectiveness and fast convergence of the MCMC algorithm depend on both model assumption and the priors of model parameters. Our Monte Carlo study was carried out by choosing normal priors for the model parameters and the inverse Gamma distribution for the variance components. This choice led to the easy implementation of the MCMC algorithm. It will be an interesting future research direction to develop some meaningful criteria for model and algorithm comparison in the area of Bayesian nonparametric longitudinal data analysis. The main purpose of our paper is to give an easily implementable approach to this area with a practical illustration. We can conclude that the complete semiparametric approach to Bayesian longitudinal data analysis in this paper is a significant complement to the area studied by some influential peers, such as Mukhopadhyay and Gelfand (1997, [21]), Quintana et al. (2016, [4]), and others.

Author Contributions

G.J. developed the Bayesian semiparametric method for longitudinal data. J.L. helped validate the methodology and finalize the manuscript. F.W. conducted major data analysis. X.C. conducted initial research in the Bayesian semiparametric modeling study on longitudinal data analysis under the guidance of G.J. in her master’s thesis [24]. H.L. helped with data analysis. J.C. found data and reference resources. S.C. and F.Z. finished the simulation study and also helped with data analysis and edited all figures. G.J. and J.J. wrote the initial draft. G.J. and F.W. completed the final writing—review and editing. All authors have read and agreed to the final version of the manuscript.

Funding

The research was supported by National Natural Science Foundation of China Under Grant No.41271 038, Jiajuan Liang’s research is supported by a UIC New Faculty Start-up Research Fund R72021106, and in part by the Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College (UIC), project code 2022B1212010006.

Data Availability Statement

The real data and Python code presented in the present study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Derivations of Conditional Probability Distributions in Section 7.1

\begin{matrix} p (β | *) \propto p (y | β, b, S, σ^{2}) p (β) = \prod_{i = 1}^{n} \prod_{j = 1}^{m} f (y_{i j} | β, b_{i}, s_{i j}, σ^{2}) p (β) \\ = & exp \{- \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{m} {(y_{i j} - β - b_{i} - s_{i j})}^{2} - \frac{{(β - μ_{0})}^{2}}{2 σ_{0}^{2}}\} \cdot {(2 π)}^{- \frac{(n m + 1)}{2}} σ^{- n m} σ_{0}^{- 1} \\ \propto & exp \{- \frac{1}{2} [\sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{{(y_{i j} - β - b_{i} - s_{i j})}^{2}}{σ^{2}} + \frac{{(β - μ_{0})}^{2}}{2 σ_{0}^{2}}]\}, \end{matrix}

(A1)

where

\begin{matrix} \sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{{(y_{i j} - β - b_{i} - s_{i j})}^{2}}{σ^{2}} + \frac{{(β - μ_{0})}^{2}}{σ_{0}^{2}} \\ = & (\frac{n m}{σ^{2}} + \frac{1}{σ_{0}^{2}}) β^{2} - 2 [\frac{1}{σ^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{m} (y_{i j} - b_{i} - s_{i j}) + \frac{μ_{0}}{σ_{0}^{2}}] β \\ + \frac{1}{σ^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{m} {(y_{i j} - b_{i} - s_{i j})}^{2} - \frac{2 β μ_{0}}{σ_{0}^{2}} . \end{matrix}

(A2)

Combining (A1) with (A2), we have

p (β | *) \propto \exp \{- \frac{1}{2} [(\frac{n m}{σ^{2}} + \frac{1}{σ_{0}^{2}}) β^{2} - 2 (\frac{1}{σ^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{m} (y_{i j} - b_{i} - s_{i j}) + \frac{μ_{0}}{σ_{0}^{2}}) β]\} .

(A3)

Let

σ_{β}^{- 2} = \frac{n m}{σ^{2}} + \frac{1}{σ_{0}^{2}}, ߓ μ_{β} = σ_{β}^{2} (\frac{1}{σ^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{m} (y_{i j} - b_{i} - s_{i j}) + \frac{μ_{0}}{σ_{0}^{2}}) .

(A4)

The complete posterior distribution of

β

can be expressed as

β \sim N (μ_{β}, σ_{β}^{2}) .

(A5)

The conditional distribution of

σ^{2}

is computed as follows:

\begin{matrix} p (σ^{2} | *) \propto p (y | β, b, S, σ^{2}) p (σ^{2}) \\ = & exp \{- \frac{1}{σ^{2}} [\frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{m} {(y_{i j} - β - b_{i} - s_{i j})}^{2} + β_{0}]\} \cdot \\ {(2 π)}^{- \frac{n m}{2}} \frac{β_{0}^{α_{0}}}{Γ (α_{0})} {(σ^{2})}^{- \frac{n m + 2 α_{0} + 2}{2}} \\ \propto & {(σ^{2})}^{- \frac{n m + 2 α_{0} + 2}{2}} \exp \{- \frac{1}{σ^{2}} [\frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1} m {(y_{i j} - β - b_{i} - s_{i j})}^{2} + β_{0}]\} . \end{matrix}

(A6)

This shows that the posterior distribution of

σ^{2}

is an inverse gamma distribution. We use the following notation:

\begin{matrix} σ^{2} \sim I G (\frac{n m}{2} + α_{0}, \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{m} {(y_{i j} - β - b_{i} - s_{i j})}^{2} + β_{0}) . \end{matrix}

(A7)

The conditional distribution of

b_{i}

is computed as follows:

\begin{matrix} p (b_{i} | *) \propto f (y_{i} | β, b_{i}, S_{i}, σ^{2}) p (b_{i}) \\ = & {(2 π)}^{- \frac{m + 1}{2}} σ^{- m} σ_{b}^{- 1} \exp \{- \frac{1}{2 σ^{2}} \sum_{j = 1}^{m} {(y_{i j} - β - b_{i} - s_{i j})}^{2} - \frac{b_{i}^{2}}{2 σ_{b}^{2}}\} \\ \propto & exp \{- \frac{1}{2} [\sum_{j = 1}^{m} \frac{{(y_{i j} - β - b_{i} - s_{i j})}^{2}}{σ^{2}} + \frac{b_{i}^{2}}{σ_{b}^{2}}]\}, \end{matrix}

(A8)

where

\begin{matrix} \sum_{j = 1}^{m} \frac{{(y_{i j} - β - b_{i} - s_{i j})}^{2}}{σ^{2}} + \frac{b_{i}^{2}}{σ_{b}^{2}} \\ = & \frac{1}{σ^{2}} \sum_{j = 1}^{m} {(y_{i j} - β - b_{i} - s_{i j})}^{2} + \frac{m b_{i}^{2}}{σ^{2}} - \frac{2 b_{i}}{σ^{2}} \sum_{j = 1}^{m} (y_{i j} - β - s_{i j}) + \frac{b_{i}^{2}}{σ_{b}^{2}} \\ = & (\frac{m}{σ^{2}} + \frac{1}{σ_{b}^{2}}) b_{i}^{2} - \frac{2}{σ^{2}} \sum_{j = 1}^{m} (y_{i j} - β - b_{i} - s_{i j}) b_{i} . \end{matrix}

(A9)

Combining (A8) with (A9), we have

\begin{matrix} exp \{- \frac{1}{2} [\sum_{j = 1}^{m} \frac{{(y_{i j} - β - b_{i} - s_{i j})}^{2}}{σ^{2}} + \frac{b_{i}^{2}}{σ_{b}^{2}}]\} \\ \propto & exp \{- \frac{1}{2} [(\frac{m}{σ^{2}} + \frac{1}{σ_{b}^{2}}) b_{i}^{2} - 2 \frac{1}{σ^{2}} \sum_{j = 1}^{m} (y_{i j} - β - b_{i} - s_{i j}) b_{i}]\} . \end{matrix}

(A10)

Let

σ_{b_{i}}^{- 2} = \frac{m}{σ^{2}} + \frac{1}{σ_{b}^{2}}, μ_{b} = \frac{σ_{b}^{2}}{σ^{2}} \sum_{j = 1}^{m} (y_{i j} - β - b_{i} - s_{i j}) .

(A11)

The posterior distribution of

b_{i}

is given by

b_{i} \sim N (μ_{b}, σ_{b}^{2}) .

(A12)

/The conditional distribution of

S_{i} = {(s_{i 1}, \dots, s_{i m})}^{'}

is obtained by

p (S_{i} | *) \propto p (y_{i} | β, b_{i}, S_{i}, σ^{2}) p (S_{i} | σ_{s}^{2}, ρ_{i}) .

(A13)

It is necessary to compute the conditional distribution of each component of

S_{i}

separately:

\begin{matrix} p (s_{i 1} | y, β, b, σ^{2}, σ_{s}^{2}, ρ) \\ = & \frac{p (y | β, b, s_{i 1}, σ^{2}) p (b) p (s_{i 1} | σ_{s}^{2}, ρ) p (β, σ^{2}, σ_{s}^{2}, ρ)}{p (y | β, b, σ^{2}) p (b) p (β, σ^{2}, σ_{s}^{2}, ρ)} \\ = & \prod_{i = 1}^{n} f (y_{i 1} | β, b, s_{i 1}, σ^{2}) p (s_{i 1} | σ_{s}^{2}, ρ) \\ = & {(2 π)}^{- \frac{n + 1}{2}} σ^{- n} σ_{s}^{- 1} exp \{- \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} {(y_{i 1} - β - b_{i} - s_{i 1})}^{2} - \frac{s_{i 1}^{2}}{2 σ_{s}^{2}}\} \\ \propto & exp \{- \frac{1}{2} [\sum_{i = 1}^{n} \frac{{(y_{i 1} - β - b_{i} - s_{i 1})}^{2}}{σ^{2}} + \frac{s_{i 1}^{2}}{2 σ_{s}^{2}}]\} \\ \propto & exp \{- \frac{1}{2} [(\frac{n}{σ^{2}} + \frac{1}{σ_{s}^{2}}) s_{i 1}^{2} - \frac{2}{σ^{2}} \sum_{i = 1}^{n} (y_{i 1} - β - b_{i}) s_{i 1}]\} . \end{matrix}

(A14)

Let

σ_{s_{i 1}}^{- 2} = \frac{n}{σ^{2}} + \frac{1}{σ_{s}^{2}}, μ_{s_{i 1}} = \frac{σ_{s_{i 1}}^{2}}{σ^{2}} \sum_{i = 1}^{n} (y_{i 1} - β - b_{i}) .

(A15)

We obtain the posteriori distribution for

s_{i 1}

:

s_{i 1} \sim N (μ_{s_{i 1}}, σ_{s_{i 1}}^{2}) .

(A16)

Similarly, the posterior distribution of

s_{i 2}

is obtained by

\begin{matrix} p (s_{i 2} | *) \propto \prod_{i = 1}^{n} p (y_{i 2} | β, b, s_{i 2}, σ^{2}) p (s_{i 2} | {\hat{s}}_{i 1}, σ_{s}^{2}, ρ) \\ = & exp \{- \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} {(y_{i 2} - β - b_{i} - s_{i 2})}^{2}\} exp \{- \frac{{(s_{i 2} - {\hat{s}}_{i 1} ρ_{i}^{| t_{i 2} - t_{i 1} |})}^{2}}{2 σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i 2} - t_{i 1} |})}\} \cdot \\ {(2 π)}^{- \frac{n + 1}{2}} σ^{- n} σ_{s}^{- 1} {(1 - ρ_{i}^{| t_{i 2} - t_{i 1} |})}^{- 1} \\ \propto & exp \{- \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} {(y_{i 2} - β - b_{i} - s_{i 2})}^{2}\} exp \{- \frac{{(s_{i 2} - {\hat{s}}_{i 1} ρ_{i}^{| t_{i 2} - t_{i 1} |})}^{2}}{2 σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i 2} - t_{i 1} |})}\} \\ = & exp \{- \frac{1}{2} [\frac{1}{σ^{2}} \sum_{i = 1}^{n} {(y_{i 2} - β - b_{i})}^{2} + \frac{n s_{i 2}^{2}}{σ^{2}} - \frac{2 s_{i 2}}{σ^{2}} \sum_{i = 1}^{n} (y_{i 2} - β - b_{i})]\} \cdot \\ exp \{- \frac{1}{2} [\frac{s_{i 2}^{2} + {({\hat{s}}_{i 1} ρ_{i}^{| t_{i 2} - t_{i 1} |})}^{2} - 2 s_{i 2} {\hat{s}}_{i 1} ρ_{i}^{| t_{i 2} - t_{i 1} |}}{σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i 2} - t_{i 1} |})}]\} . \end{matrix}

(A17)

After removing the constant term, the posterior distribution of

s_{i 2}

is proportional to

exp \{- \frac{1}{2} [(\frac{n}{σ^{2}} + \frac{1}{σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i 2} - t_{i 1} |})}) s_{i 2}^{2} - 2 [\frac{1}{σ^{2}} \sum_{i = 1}^{n} (y_{i 2} - β - b_{i}) + \frac{{\hat{s}}_{i 1} ρ_{i}^{| t_{i 2} - t_{i 1} |}}{σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i 2} - t_{i 1} |})}] s_{i 2}]\} .

Let

\begin{matrix} σ_{s_{i 2}}^{- 2} & = & \frac{n}{σ^{2}} + \frac{1}{σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i 2} - t_{i 1} |})} \\ μ_{s_{i 2}} & = & σ_{s_{i 2}}^{2} [\frac{σ_{s}^{2}}{σ^{2}} \sum_{i = 1}^{n} (y_{i 2} - β - b_{i}) + \frac{{\hat{s}}_{i 1} ρ_{i}^{| t_{i 2} - t_{i 1} |}}{σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i 2} - t_{i 1} |})}] . \end{matrix}

(A18)

We obtain the posteriori distribution for

s_{i 2}

:

\begin{matrix} s_{i 2} \sim N (μ_{s_{i 2}}, σ_{s_{i 2}}^{2}) . \end{matrix}

(A19)

Similarly, the posteriori distribution of

s_{i j} (j = 1, \dots, m)

can be obtained:

\begin{matrix} μ_{s_{i j}} & = & σ_{s_{i j}}^{2} [\frac{1}{σ^{2}} \sum_{i = 1}^{n} (y_{i j} - β - b_{i}) + \frac{{\hat{s}}_{i, j - 1} ρ_{i}^{| t_{i j} - t_{i, j - 1} |}}{σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i j} - t_{i, j - 1} |})}] \\ σ_{s_{i j}}^{- 2} & = & \frac{n}{σ^{2}} + \frac{1}{σ_{s}^{2} (1 - ρ_{i}^{2 | t_{i j} - t_{i, j - 1} |})} \\ s_{i j} & \sim & N (μ_{s_{i j}}, σ_{s_{i j}}^{2}) . \end{matrix}

(A20)

Combining the above derivations, we can obtain the conditional distribution of

σ_{s}^{2}

as follows:

\begin{matrix} p (σ_{s}^{2} | *) \propto f (S | σ_{s}^{2}, ρ) p (σ_{s}^{2}) \\ = & \prod_{i = 1}^{n} f (s_{i 1}) f (s_{i 2} | s_{i 1}) \dots f (s_{i m} | s_{i 1}, s_{i 2}, \dots, s_{i, m - 1}) \\ = & {(2 π)}^{- \frac{m}{2}} σ_{s}^{- m} \prod_{j = 2}^{m} \sqrt{1 - ρ_{i}^{2 | t_{i j} - t_{i, j - 1} |}} \frac{β_{1}^{α_{1}}}{Γ (α_{1})} {(σ_{s}^{2})}^{- α_{1} - 1} exp \{- \frac{β_{1}}{σ_{s}^{2}}\} \cdot \\ exp \{- \frac{1}{2 σ_{s}^{2}} [s_{i 1}^{2} + \sum_{j = 2}^{m} \frac{{(s_{i j} - {\hat{s}}_{i, j - 1} ρ_{i}^{| t_{i j} - t_{i, j - 1} |})}^{2}}{1 - ρ_{i}^{2 | t_{i j} - t_{i, j - 1} |}}]\} \\ = & exp \{- \frac{1}{σ_{s}^{2}} [\frac{1}{2} (s_{i 1}^{2} + \sum_{j = 2}^{m} \frac{{(s_{i j} - {\hat{s}}_{i, j - 1} ρ_{i}^{| t_{i j} - t_{i, j - 1} |})}^{2}}{1 - ρ_{i}^{2 | t_{i j} - t_{i, j - 1} |}}) + β_{1}]\} \cdot \\ {(2 π)}^{- \frac{m}{2}} \frac{β_{1}^{α_{1}}}{Γ (α_{1})} \prod_{j = 2}^{m} \sqrt{1 - ρ_{i}^{2 | t_{i j} - t_{i, j - 1} |}} {(σ_{s}^{2})}^{- \frac{m}{2} - α_{1} - 1} \\ \propto & {(σ_{s}^{2})}^{- \frac{m}{2} - α_{1} - 1} exp \{- \frac{1}{σ_{s}^{2}} [\frac{1}{2} (s_{i 1}^{2} + \sum_{j = 2}^{m} \frac{{(s_{i j} - {\hat{s}}_{i, j - 1} ρ_{i}^{| t_{i j} - t_{i, j - 1} |})}^{2}}{1 - ρ_{i}^{2 | t_{i j} - t_{i, j - 1} |}}) + β_{1}]\} . \end{matrix}

(A21)

The posterior distribution of

σ_{s}^{2}

is actually an inverse gamma distribution:

σ_{s}^{2} \sim I G (\frac{m}{2} + α_{1}, \frac{1}{2} (s_{i 1}^{2} + \sum_{j = 2}^{m} \frac{{(s_{i j} - {\hat{s}}_{i, j - 1} ρ_{i}^{| t_{i j} - t_{i, j - 1} |})}^{2}}{1 - ρ_{i}^{2 | t_{i j} - t_{i, j - 1} |}}) + β_{1}) .

(A22)

The posterior distribution of

ρ_{i}

is obtained as follows:

\begin{matrix} p (ρ_{i} | *) = p (ρ_{i} | S, σ_{s}^{2}, ρ_{- i}) = p (ρ_{i} | S_{i}, σ_{s}^{2}, ρ_{- i}) \\ = & \frac{p (S_{i} | σ_{s}^{2}, ρ_{i}) p (ρ_{i} | ρ_{- i}) p (σ_{s}^{2}, ρ_{- i})}{p (S_{i} | σ_{s}^{2}) p (σ_{s}^{2}, ρ_{- i})} \propto f (S_{i} | σ_{s}^{2}, ρ_{i}) p (ρ_{i} | ρ_{- i}), \end{matrix}

(A23)

which can be expressed as follows:

\begin{matrix} p (ρ_{i} | S_{i}, σ_{s}^{2}, ρ_{- i}) \sim \sum_{j \neq i}^{} q_{i j} δ_{ρ_{j}} + r_{i} H_{i}, q_{i j} = b f (S_{i} | σ_{s}^{2}, ρ_{j}) \\ r_{i} = b α \int_{}^{} f (S_{i} | σ_{s}^{2}, ρ) g_{0} (ρ) d ρ, H_{i} \sim \frac{f (S_{i} | σ_{s}^{2}, ρ) g_{0} (ρ)}{\int_{}^{} f (S_{i} | σ_{s}^{2}, ρ) g_{0} (ρ) d ρ}, \end{matrix}

(A24)

where

g_{0}

is the probability density function of the base distribution

G_{0}

, b is the constant satisfying the equation

\sum_{j \neq i}^{} q_{i j} + r_{i} = 1

, and

H_{i}

is the marginal distribution of the parameter

ρ

based on the prior

G_{0}

and the variable

S_{i}

.

References

Pullenayegum, E.M.; Lim, L.S. Longitudinal data subject to irregular observation: A review of methods with a focus on visit processes, assumptions, and study design. Statist. Methods Med. Res. 2016, 25, 2992–3014. [Google Scholar] [CrossRef] [PubMed]
Chakraborty, R.; Banerjee, M.; Vemuri, B.C. Statistics on the space of trajectories for longitudinal data analysis. In Proceedings of the 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Melbourne, VIC, Australia, 18–21 April 2017; pp. 999–1002. [Google Scholar]
Xiang, D.; Qiu, P.; Pu, X. Nonparametric regession analysis of multivariate longitudinal data. Stat. Sin. 2013, 23, 769–789. [Google Scholar]
Quintana, F.A.; Johnson, W.O.; Waetjen, L.E.; BGold, E. Bayesian nonparametric longitudinal data analysis. J. Am. Statist. Assoc. 2016, 111, 1168–1181. [Google Scholar] [CrossRef]
Cheng, L.; Ramchandran, S.; Vatanen, T.; Lietzén, N.; Lahesmaa, R.; Vehtari, A.; Lähdesmäki, H. An additive Gaussian process regression model for interpretable non-parametric analysis of longitudinal data. Nat. Commun. 2019, 10, 1798. [Google Scholar] [CrossRef]
Kleinman, K.P.; Ibrahim, J.G. A semiparametric Bayesian approach to the random effects model. Biometrics 1998, 54, 921–938. [Google Scholar] [CrossRef] [PubMed]
Gao, F.; Zeng, D.; Couper, D.; Lin, D.Y. Semiparametric regression analysis of multiple right- and interval-censored events. J. Am. Statist. Assoc. 2019, 114, 1232–1240. [Google Scholar] [CrossRef] [PubMed]
Lee, Y. Semiparametric regression. J. Am. Statist. Assoc. 2006, 101, 1722–1723. [Google Scholar] [CrossRef]
Sun, Y.; Sun, L.; Zhou, J. Profile local linear estimation of generalized semiparametric regression model for longitudinal data. Lifetime Data Anal. 2013, 19, 317–349. [Google Scholar] [CrossRef] [PubMed]
Zeger, S.L.; Diggle, P.J. Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 1994, 50, 689–699. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Lin, X.; Müller, P. Bayesian inference in semiparametric mixed models for longitudinal data. Biometrics 2010, 66, 70–78. [Google Scholar] [CrossRef] [PubMed]
Hunag, Y.X. Quantile regression-based Bayesian semiparametric mixed-effects models for longitudinal data with non-normal, missing and mismeasured covariate. J. Statist. Comput. Simul. 2016, 86, 1183–1202. [Google Scholar] [CrossRef]
Li, J.; Zhou, J.; Zhang, B.; Li, X.R. Estimation of high dimensional covariance matrices by shrinkage algorithms. In Proceedings of the 2017 20th International Conference on Information Fusion (Fusion), Xi’an, China, 10–13 July 2017; pp. 955–962. [Google Scholar]
Doss, H.; Park, Y. An MCMC approach to empirical Bayes inference and Bayesian sensitivity analysis via empirical processes. Ann. Statist. 2018, 46, 1630–1663. [Google Scholar] [CrossRef]
Neal, R.M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 2000, 9, 249–265. [Google Scholar]
Kingma, D.P.; Welling, M. Auto-encoding variational Bayes. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
Csiszar, I. I-Divergence geometry of probability distributions and minimization problems. Ann. Probab. 1975, 3, 146–158. [Google Scholar] [CrossRef]
Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; pp. 63–71. [Google Scholar]
Barndorff-Nielsen, O.E.; Shephard, N. Non-Gaussian Ornstein–Uhlenbeck-based models and some of their uses in financial economics. J. Roy. Statist. Soc. (Ser. B) 2001, 63, 167–241. [Google Scholar] [CrossRef]
Griffin, J.E.; Steel, M.F.J. Order-based dependent Dirichlet processes. J. Am. Statist. Assoc. 2006, 101, 179–194. [Google Scholar] [CrossRef]
Mukhopadhyay, S.; Gelfand, A.E. Dirichlet process mixed generalized linear models. J. Am. Statist. Assoc. 1997, 92, 633–639. [Google Scholar] [CrossRef]
Zimmerman, D.L.; Núñez-Antón, V.A. Antedependence Models for Longitudinal Data; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
Hsu, C.W.; Sinay, M.S.; Hsu, J.S. Bayesian estimation of a covariance matrix with flexible prior specification. Ann. Inst. Statist. Math. 2012, 64, 319–342. [Google Scholar] [CrossRef]
Chen, X. Longitudinal Data Analysis Based on Bayesian Semiparametric Method. Master’s Thesis, Lanzhou University, Lanzhou, China, 2019. (In Chinese). [Google Scholar]

Figure 1. Bayesian semiparametric autoregression model (SPAR model).

Figure 2. Example of an AR(1) process.

Figure 3. Example of an OU process.

Figure 4. Sampling results under the CS structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed for

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the CS structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (43)). The orange line stands for sampling distribution under the SPAR model (18), and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Each graph in the right column shows the relationship between each estimated parameter (the ordinate) versus the number of random samples in the abscissa, ranging from 1 to 1000. Each graph in the left column presents the kernel-estimated density function of the parameter from the last 1000 samples.

Figure 4. Sampling results under the CS structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed for

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the CS structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (43)). The orange line stands for sampling distribution under the SPAR model (18), and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Each graph in the right column shows the relationship between each estimated parameter (the ordinate) versus the number of random samples in the abscissa, ranging from 1 to 1000. Each graph in the left column presents the kernel-estimated density function of the parameter from the last 1000 samples.

Figure 5. Negative ELBO loss histogram: in Figure 5, the horizontal axis stands for the number of iterations in the MCMC sampling with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 5. Negative ELBO loss histogram: in Figure 5, the horizontal axis stands for the number of iterations in the MCMC sampling with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 6. Energy graph: in Figure 6, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 6. Energy graph: in Figure 6, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 7. Sampling results under the AR structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the AR structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (44)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 7 have the same structure in both axes for the two columns of graphs.

Figure 7. Sampling results under the AR structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the AR structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (44)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 7 have the same structure in both axes for the two columns of graphs.

Figure 8. Negative ELBO loss histogram: in Figure 8, the horizontal axis stands for the number of iterations in the MCMC sampling with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 8. Negative ELBO loss histogram: in Figure 8, the horizontal axis stands for the number of iterations in the MCMC sampling with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 9. Energy graph: In Figure 9, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 9. Energy graph: In Figure 9, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 10. Sampling results under the mixed structure of CS and AR: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the mixed structure of CS and AR and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (45)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 10 have the same structure in both axes for the two columns of graphs.

Figure 10. Sampling results under the mixed structure of CS and AR: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the mixed structure of CS and AR and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (45)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 10 have the same structure in both axes for the two columns of graphs.

Figure 11. Negative ELBO loss histogram: in Figure 11, the horizontal axis stands for the number of iterations in the MCMC sampling with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 11. Negative ELBO loss histogram: in Figure 11, the horizontal axis stands for the number of iterations in the MCMC sampling with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 12. Energy graph: in Figure 12, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 12. Energy graph: in Figure 12, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 13. Sampling results under the independent structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the independent structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (46)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 13 have the same structure in both axes for the two columns of graphs.

Figure 13. Sampling results under the independent structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the independent structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (46)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 13 have the same structure in both axes for the two columns of graphs.

Figure 14. Negative ELBO loss histogram: in Figure 14, the horizontal axis stands for the number of iterations in the MCMC sampling with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 14. Negative ELBO loss histogram: in Figure 14, the horizontal axis stands for the number of iterations in the MCMC sampling with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 15. Energy graph: in Figure 15, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 15. Energy graph: in Figure 15, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 16. Sampling results under the CS structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the CS structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (43)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 16 have the same structure in both axes for the two columns of graphs.

Figure 16. Sampling results under the CS structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the CS structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (43)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 16 have the same structure in both axes for the two columns of graphs.

Figure 17. Negative ELBO loss histogram: in Figure 17, the horizontal axis stands for the number of iterations in the MCMC algorithm with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 17. Negative ELBO loss histogram: in Figure 17, the horizontal axis stands for the number of iterations in the MCMC algorithm with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 18. Energy graph: In Figure 18, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 18. Energy graph: In Figure 18, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 19. Sampling results under the AR structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the AR structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (44)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 19 have the same structure in both axes for the two columns of graphs.

Figure 19. Sampling results under the AR structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the AR structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (44)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 19 have the same structure in both axes for the two columns of graphs.

Figure 20. Negative ELBO loss histogram: in Figure 20, the horizontal axis stands for the number of iterations in the MCMC algorithm with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 20. Negative ELBO loss histogram: in Figure 20, the horizontal axis stands for the number of iterations in the MCMC algorithm with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 21. Energy graph: in Figure 21, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 21. Energy graph: in Figure 21, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 22. Sampling results under the mixed structure of CS and AR: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the mixed structure of CS and AR and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (45)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 22 have the same structure in both axes for the two columns of graphs.

Figure 22. Sampling results under the mixed structure of CS and AR: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the mixed structure of CS and AR and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (45)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 22 have the same structure in both axes for the two columns of graphs.

Figure 23. Negative ELBO loss histogram: in Figure 23,the horizontal axis stands for the number of iterations in the MCMC algorithm with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 23. Negative ELBO loss histogram: in Figure 23,the horizontal axis stands for the number of iterations in the MCMC algorithm with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 24. Energy graph: in Figure 24, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 24. Energy graph: in Figure 24, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 25. Sampling results under an independent structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the independent structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (46)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 25 have the same structure in both axes for the two columns of graphs.

Figure 25. Sampling results under an independent structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed

p = 6

variables in the simulation), each pair consisting of a DP generated from model (18) with the independent structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (46)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figure 4 and Figure 25 have the same structure in both axes for the two columns of graphs.

Figure 26. Negative ELBO loss histogram: in Figure 26, the horizontal axis stands for the number of iterations in the MCMC algorithm with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 26. Negative ELBO loss histogram: in Figure 26, the horizontal axis stands for the number of iterations in the MCMC algorithm with size

n = 100

, the vertical axis for the negative ELBO loss.

Figure 27. Energy graph: in Figure 27, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Figure 27. Energy graph: in Figure 27, the estimated distribution of energy is based on 1000 samples with size

n = 100

.

Table 1. Longitudinal data structures.

Observation Object	Number of Observations
Observation Object	1	⋯	$k$
1	$t_{11}, x_{1, 11}, \dots, x_{p, 11}, y_{11}$	⋯	$t_{1 k}, x_{1, 1 k}, \dots, x_{p, 1 k}, y_{1 k}$
⋮	⋮		⋮
i	$t_{i 1}, x_{1, i 1}, \dots, x_{p, i 1}, y_{i 1}$	…	$t_{i k}, x_{1, i k}, \dots, x_{p, i k}, y_{i k}$
⋮	⋮		⋮
n	$t_{n 1}, x_{1, n 1}, \dots, x_{p, n 1}, y_{n 1}$	…	$t_{n k}, x_{1, n k}, \dots, x_{p, n k}, y_{n k}$

Note: In Table 1,

t_{i j}

stands for the j-th observation time of the individual i;

(x_{1, i j}, \dots, x_{p, i j})

for the p-dimensional covariate of individual i at

t_{i j}

; and

y_{i j}

for the j-th observation of individual i.

Table 2. Estimated RMSE,

L_{1}

, and

L_{2}

based on covariance matrix

Σ

.

Table 2. Estimated RMSE,

L_{1}

, and

L_{2}

based on covariance matrix

Σ

.

Loss Function	Model	C1	C2	C3	C4
RMSE	SPAR	0.9960	0.1037	0.8995	1.5358
	Inv-W	2.9691	2.9352	3.4804	3.3285
$L_{1}$	SPAR	0.1810	0.1201	1.3217	0.9558
	Inv-W	1.8389	1.4632	3.0516	1.2926
$L_{2}$	SPAR	0.1289	0.1345	1.7575	0.6044
	Inv-W	0.2405	0.9576	2.8083	1.7565

C1 represents the covariance of the complex symmetry (CS) structure; C2 represents the AR structure covariance; C3 represents the mixed structure covariance of CS and AR; C4 represents independent structure covariance; and Inv-W = inverse-Wishart.

Table 3. Estimated RMSE,

L_{1}

, and

L_{2}

based on correlation matrix

R

.

Table 3. Estimated RMSE,

L_{1}

, and

L_{2}

based on correlation matrix

R

.

Loss Function	Model	C1	C2	C3	C4
RMSE	SPAR	0.1628	1.1417	0.3775	0.1542
	Inv-W	0.2308	2.4152	0.9115	0.2771
$L_{1}$	SPAR	0.0956	0.5482	0.0631	1.2354
	Inv-W	0.1519	1.1424	0.1756	1.9137
$L_{2}$	SPAR	0.1013	0.9809	0.4937	0.5854
	Inv-W	0.1238	1.1935	0.5205	0.6484

C1 represents the covariance of the complex symmetry (CS) structure; C2 represents the AR structure covariance; C3 represents the mixed structure covariance of CS and AR; and C4 represents independent structure covariance.

Table 4. Estimated RMSE,

L_{1}

, and

L_{2}

based on covariance matrix

Σ

.

Table 4. Estimated RMSE,

L_{1}

, and

L_{2}

based on covariance matrix

Σ

.

Loss Function	Model	C1	C2	C3	C4
RMSE	SPAR	0.0831	0.0962	0.08837	0.0981
	Inv-W	0.4120	0.6826	0.3181	0.4064
$L_{1}$	SPAR	0.1360	0.1996	0.1465	0.2508
	Inv-W	46.6743	16.8967	19.0260	14.3543
$L_{2}$	SPAR	0.0870	0.0826	0.2084	0.3517
	Inv-W	661.4643	142.3157	450.6686	340.6994

C1 represents the covariance of the complex symmetry (CS) structure; C2 represents the AR structure covariance; C3 represents the mixed structure covariance of CS and AR; C4 represents independent structure covariance; and Inv-W = inverse-Wishart.

Table 5. Estimated RMSE,

L_{1}

, and

L_{2}

based on correlation matrix

R

.

Table 5. Estimated RMSE,

L_{1}

, and

L_{2}

based on correlation matrix

R

.

Loss Function	Model	C1	C2	C3	C4
RMSE	SPAR	0.0187	0.5359	0.1976	0.1985
	Inv-W	0.6585	0.6486	0.6515	0.7580
$L_{1}$	SPAR	0.0066	0.1670	0.0662	0.1846
	Inv-W	0.0968	4.8245	0.3174	0.8035
$L_{2}$	SPAR	0.0036	0.0132	0.0080	0.0939
	Inv-W	0.0079	1.3683	0.1202	0.3488

C1 represents the covariance of the complex symmetry (CS) structure; C2 represents the AR structure covariance; C3 represents the mixed structure covariance of CS and AR; and C4 represents independent structure covariance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiao, G.; Liang, J.; Wang, F.; Chen, X.; Chen, S.; Li, H.; Jin, J.; Cai, J.; Zhang, F. Longitudinal Data Analysis Based on Bayesian Semiparametric Method. Axioms 2023, 12, 431. https://doi.org/10.3390/axioms12050431

AMA Style

Jiao G, Liang J, Wang F, Chen X, Chen S, Li H, Jin J, Cai J, Zhang F. Longitudinal Data Analysis Based on Bayesian Semiparametric Method. Axioms. 2023; 12(5):431. https://doi.org/10.3390/axioms12050431

Chicago/Turabian Style

Jiao, Guimei, Jiajuan Liang, Fanjuan Wang, Xiaoli Chen, Shaokang Chen, Hao Li, Jing Jin, Jiali Cai, and Fangjie Zhang. 2023. "Longitudinal Data Analysis Based on Bayesian Semiparametric Method" Axioms 12, no. 5: 431. https://doi.org/10.3390/axioms12050431

APA Style

Jiao, G., Liang, J., Wang, F., Chen, X., Chen, S., Li, H., Jin, J., Cai, J., & Zhang, F. (2023). Longitudinal Data Analysis Based on Bayesian Semiparametric Method. Axioms, 12(5), 431. https://doi.org/10.3390/axioms12050431

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Longitudinal Data Analysis Based on Bayesian Semiparametric Method

Abstract

1. Introduction

2. Theoretical Basis

2.1. A General Linear Hybrid Model Containing an AR Structure

2.2. The Principle for Bayesian Inference

2.3. The MCMC Sampling and Its Convergence

3. The OU Process

4. Dirichlet Process and Dirichlet Process Mixture

4.1. Dirichlet Process

4.2. Dirichlet Process Mixture

5. Formulation of the Semiparametric Autoregressive Model

5.1. The Partial Dirichlet Process Mixture of Stochastic Process

5.2. The Framework of a Hierarchical Model

6. The Marginal Likelihood, Prior Determination, and Posteriori Inference

7. A Monte Carlo Study

7.1. Simulation Design

7.2. Simulation Specification and Display of Empirical Results

7.3. Analysis of a Real Wind Speed Dataset

8. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Derivations of Conditional Probability Distributions in Section 7.1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI